Export Transform Results

Q: Where are exported files stored by default?

Files are saved in the datasets/ / folder of the project, named after the transform and the extension configured in the YAML file.

Q: How do you export to a custom path or a cloud bucket?

Use the sink.path property to specify a path relative to the root defined in application.sl.yml. On cloud storage, the bucket name is automatically prepended.

Q: How do you export to another database?

Specify the connection name in sink.connectionRef of the transform YAML file. The connection must be defined in application.sl.yml.

Starlake transforms can write results to files in CSV, JSON, Parquet, or Avro format, or directly to another database. You can target local storage, cloud buckets (GCS, S3, ADLS), or a different database engine by configuring the sink section of the YAML file. Each transform writes to a single destination. To produce multiple outputs from the same query, create separate transforms reading the same source.

Sink Configuration Properties

Property	Description	Example
`sink.format`	Output file format	`csv`, `json`, `parquet`, `avro`
`sink.extension`	File name extension	`csv`, `parquet`
`sink.path`	Custom output path (relative to root)	`mnt/data/output.csv`
`sink.connectionRef`	Target database connection name	`my_postgres_db`

Export to Files

Set sink.format and sink.extension in the transform YAML file. Starlake supports CSV, JSON, Parquet, Avro, and any file format supported by Apache Spark.

metadata/transform/<domain>/<transform>.sl.yml
task:
  sink:
    format: csv
    extension: csv

By default, the file is saved in the datasets/<domain>/ folder of the project. The file name is derived from the transform name and the configured extension.

Custom Output Path

Specify a custom path relative to the root defined in application.sl.yml:

metadata/transform/<domain>/<transform>.sl.yml
task:
  sink:
    format: csv
    path: mnt/data/output.csv

Cloud Storage (GCS, S3, ADLS)

On cloud storage, Starlake prepends the bucket name from the root configuration to the path:

metadata/application.sl.yml
...
root: gs://my-bucket/folder1/folder2
...

With this root, a sink.path of mnt/data/output.csv resolves to gs://my-bucket/folder1/folder2/mnt/data/output.csv.

note

With Spark, CSV and JSON exports may produce multiple partitioned files. With DuckDB or a single-node engine, a single file is produced.

Export to Another Database

Write transform results directly to a database by setting sink.connectionRef. The connection must be defined in application.sl.yml.

metadata/transform/<domain>/<transform>.sl.yml
task:
  ...
  sink:
    connectionRef: my_database
    ...

This is useful for cross-database ETL patterns -- for example, reading from BigQuery and writing to PostgreSQL.

Next Steps

Transform YAML Configuration -- full sink configuration reference
SQL Transform Tutorial -- end-to-end walkthrough with DuckDB
Orchestrate Transform Jobs -- schedule exports with Airflow, Dagster, or Snowflake Tasks

Frequently Asked Questions

What file formats are supported for export in Starlake?

Starlake supports export to CSV, JSON, Parquet, Avro, and any file format supported by Apache Spark via the sink.format property.

Where are exported files stored by default?

Files are saved in the datasets/<domain>/ folder of the project, named after the transform and the extension configured in the YAML file.

How do you export to a custom path or a cloud bucket?

Use the sink.path property to specify a path relative to the root defined in application.sl.yml. On cloud storage, the bucket name is automatically prepended.

How do you export to another database?

Specify the connection name in sink.connectionRef of the transform YAML file. The connection must be defined in application.sl.yml.

Can you export to both a file and a database at the same time?

A single transform writes to one destination. To write to two targets, create two transforms that read the same source.

Does CSV export produce a single file or multiple files?

The behavior depends on the engine. With Spark, multiple partitioned files may be generated. With DuckDB or a single-node engine, a single file is produced.

Sink Configuration Properties​

Export to Files​

Custom Output Path​

Cloud Storage (GCS, S3, ADLS)​

Export to Another Database​

Next Steps​

Frequently Asked Questions​

What file formats are supported for export in Starlake?​

Where are exported files stored by default?​

How do you export to a custom path or a cloud bucket?​

How do you export to another database?​

Can you export to both a file and a database at the same time?​

Does CSV export produce a single file or multiple files?​