parquet2csv
Synopsis
starlake parquet2csv [options]
Description
Convert parquet files to CSV. The folder hierarchy should be in the form /input_folder/domain/schema/part*.parquet Once converted the csv files are put in the folder /output_folder/domain/schema.csv file When the specified number of output partitions is 1 then /output_folder/domain/schema.csv is the file containing the data otherwise, it is a folder containing the part*.csv files. When output_folder is not specified, then the input_folder is used a the base output folder.
starlake parquet2csv
         --input_dir /tmp/datasets/accepted/
         --output_dir /tmp/datasets/csv/
         --domain sales
         --schema orders
         --option header=true
         --option separator=,
         --partitions 1
         --write_mode overwrite
Parameters
| Parameter | Cardinality | Description | 
|---|---|---|
| --input_dir: <value> | Required | Full Path to input directory | 
| --output_dir: <value> | Optional | Full Path to output directory, if not specified, input_dir is used as output dir | 
| --domain: <value> | Optional | Domain name to convert. All schemas in this domain are converted. If not specified, all schemas of all domains are converted | 
| --schema: <value> | Optional | Schema name to convert. If not specified, all schemas are converted. | 
| --delete_source: <value> | Optional | Should we delete source parquet files after conversion ? | 
| --write_mode: <value> | Optional | One of Set(OVERWRITE, APPEND) | 
| --options: k1=v1,k2=v2... | Optional | Any Spark option to use (sep, delimiter, quote, quoteAll, escape, header ...) | 
| --partitions: <value> | Optional | How many output partitions |