Export Transform Results
Starlake transforms can write results to files in CSV, JSON, Parquet, or Avro format, or directly to another database. You can target local storage, cloud buckets (GCS, S3, ADLS), or a different database engine by configuring the sink section of the YAML file. Each transform writes to a single destination. To produce multiple outputs from the same query, create separate transforms reading the same source.
Sink Configuration Properties
| Property | Description | Example |
|---|---|---|
sink.format | Output file format | csv, json, parquet, avro |
sink.extension | File name extension | csv, parquet |
sink.path | Custom output path (relative to root) | mnt/data/output.csv |
sink.connectionRef | Target database connection name | my_postgres_db |
Export to Files
Set sink.format and sink.extension in the transform YAML file. Starlake supports CSV, JSON, Parquet, Avro, and any file format supported by Apache Spark.
task:
sink:
format: csv
extension: csv
By default, the file is saved in the datasets/<domain>/ folder of the project. The file name is derived from the transform name and the configured extension.
Custom Output Path
Specify a custom path relative to the root defined in application.sl.yml:
task:
sink:
format: csv
path: mnt/data/output.csv
Cloud Storage (GCS, S3, ADLS)
On cloud storage, Starlake prepends the bucket name from the root configuration to the path:
...
root: gs://my-bucket/folder1/folder2
...
With this root, a sink.path of mnt/data/output.csv resolves to gs://my-bucket/folder1/folder2/mnt/data/output.csv.
With Spark, CSV and JSON exports may produce multiple partitioned files. With DuckDB or a single-node engine, a single file is produced.
Export to Another Database
Write transform results directly to a database by setting sink.connectionRef. The connection must be defined in application.sl.yml.
task:
...
sink:
connectionRef: my_database
...
This is useful for cross-database ETL patterns -- for example, reading from BigQuery and writing to PostgreSQL.
Next Steps
- Transform YAML Configuration -- full sink configuration reference
- SQL Transform Tutorial -- end-to-end walkthrough with DuckDB
- Orchestrate Transform Jobs -- schedule exports with Airflow, Dagster, or Snowflake Tasks
Frequently Asked Questions
What file formats are supported for export in Starlake?
Starlake supports export to CSV, JSON, Parquet, Avro, and any file format supported by Apache Spark via the sink.format property.
Where are exported files stored by default?
Files are saved in the datasets/<domain>/ folder of the project, named after the transform and the extension configured in the YAML file.
How do you export to a custom path or a cloud bucket?
Use the sink.path property to specify a path relative to the root defined in application.sl.yml. On cloud storage, the bucket name is automatically prepended.
How do you export to another database?
Specify the connection name in sink.connectionRef of the transform YAML file. The connection must be defined in application.sl.yml.
Can you export to both a file and a database at the same time?
A single transform writes to one destination. To write to two targets, create two transforms that read the same source.
Does CSV export produce a single file or multiple files?
The behavior depends on the engine. With Spark, multiple partitioned files may be generated. With DuckDB or a single-node engine, a single file is produced.