Skip to main content

Export Transform Results

Starlake transforms can write results to files in CSV, JSON, Parquet, or Avro format, or directly to another database. You can target local storage, cloud buckets (GCS, S3, ADLS), or a different database engine by configuring the sink section of the YAML file. Each transform writes to a single destination. To produce multiple outputs from the same query, create separate transforms reading the same source.

Sink Configuration Properties

PropertyDescriptionExample
sink.formatOutput file formatcsv, json, parquet, avro
sink.extensionFile name extensioncsv, parquet
sink.pathCustom output path (relative to root)mnt/data/output.csv
sink.connectionRefTarget database connection namemy_postgres_db

Export to Files

Set sink.format and sink.extension in the transform YAML file. Starlake supports CSV, JSON, Parquet, Avro, and any file format supported by Apache Spark.

metadata/transform/<domain>/<transform>.sl.yml
task:
sink:
format: csv
extension: csv

By default, the file is saved in the datasets/<domain>/ folder of the project. The file name is derived from the transform name and the configured extension.

Custom Output Path

Specify a custom path relative to the root defined in application.sl.yml:

metadata/transform/<domain>/<transform>.sl.yml
task:
sink:
format: csv
path: mnt/data/output.csv

Cloud Storage (GCS, S3, ADLS)

On cloud storage, Starlake prepends the bucket name from the root configuration to the path:

metadata/application.sl.yml
...
root: gs://my-bucket/folder1/folder2
...

With this root, a sink.path of mnt/data/output.csv resolves to gs://my-bucket/folder1/folder2/mnt/data/output.csv.

note

With Spark, CSV and JSON exports may produce multiple partitioned files. With DuckDB or a single-node engine, a single file is produced.

Export to Another Database

Write transform results directly to a database by setting sink.connectionRef. The connection must be defined in application.sl.yml.

metadata/transform/<domain>/<transform>.sl.yml
task:
...
sink:
connectionRef: my_database
...

This is useful for cross-database ETL patterns -- for example, reading from BigQuery and writing to PostgreSQL.

Next Steps

Frequently Asked Questions

What file formats are supported for export in Starlake?

Starlake supports export to CSV, JSON, Parquet, Avro, and any file format supported by Apache Spark via the sink.format property.

Where are exported files stored by default?

Files are saved in the datasets/<domain>/ folder of the project, named after the transform and the extension configured in the YAML file.

How do you export to a custom path or a cloud bucket?

Use the sink.path property to specify a path relative to the root defined in application.sl.yml. On cloud storage, the bucket name is automatically prepended.

How do you export to another database?

Specify the connection name in sink.connectionRef of the transform YAML file. The connection must be defined in application.sl.yml.

Can you export to both a file and a database at the same time?

A single transform writes to one destination. To write to two targets, create two transforms that read the same source.

Does CSV export produce a single file or multiple files?

The behavior depends on the engine. With Spark, multiple partitioned files may be generated. With DuckDB or a single-node engine, a single file is produced.