Customize the Transform Configuration
Every Starlake SQL or Python transform can be fine-tuned through a companion YAML configuration file placed next to the source file. This configuration controls write strategies (append, overwrite, upsert), table partitioning and clustering, access control (ACL, row-level security), data quality expectations, and cross-database routing. When no YAML file is provided, Starlake applies sensible defaults: append mode, schema inferred from the directory name, and table inferred from the file name.
Configuration Properties Overview
| Property | Description | Default |
|---|---|---|
writeStrategy.type | How results are written to the target table | append |
sink.partition | Columns used for table partitioning | none |
sink.clustering | Columns used for table clustering | none |
sink.connectionRef | Target database connection for cross-database writes | project default |
connectionRef | Source database connection | project default |
acl | Table-level access grants | none |
rls | Row-level security predicates | none |
expectations | Data quality assertions | none |
domain | Override the target schema name | directory name |
table | Override the target table name | file name |
Domain and Schema Terminology
Data warehouses organize tables into schemas. Depending on the platform, a schema is called schema (PostgreSQL, Snowflake), dataset (BigQuery), or catalog (Databricks). Starlake uses the term domain as a unified abstraction.
Directory Structure and Naming Conventions
The transform directory follows this structure:
metadata/transform
└── <domain>
├── _config.sl.yml # domain-level configuration (shared settings for all tasks in this domain)
├── <table>.sl.yml
└── <table>.sql
By default, the transform task name matches the file name. The SQL result writes to a table named <table> in the schema named <domain>. The companion YAML file configures how that materialization happens.
Override the Target Schema and Table
Use the domain and table properties to write to a different schema or table than the defaults:
task:
domain: mydomain # writes to the mydomain schema regardless of the folder name
table: mytable # writes to the mytable table regardless of the file name
name: custom # task name used in CLI commands
Write Strategies
By default, the SELECT result appends to the target table. Override this with writeStrategy.type:
task:
writeStrategy:
type: OVERWRITE ## or any strategy defined in the write strategies reference
Transform write strategies work the same way as load write strategies. Available types include append, overwrite, upsert, and others documented in that reference.
Clustering and Partitioning
Partition and cluster the target table to improve query performance:
task:
...
sink:
partition:
- column1
- column2
clustering:
- column3
- column4
See the sink configuration reference for detailed options.
Access Control
Control who can access the target table. Starlake supports table-level grants (ACL), row-level security (RLS), and column-level access policies.
task:
...
acl:
- role: SELECT
grants:
- [email protected]
- group
rls:
- name: "USA only"
predicate: "country = 'USA'"
grants:
- "group:mygroup"
attributesDesc:
- name: "code0"
accessPolicy: <column_access_policy>
...
See the security configuration reference for more details on access control options.
Read from a Different Database
By default, transforms run against the database configured in metadata/application.sl.yml. Override the source database with connectionRef:
task:
...
connectionRef: myConnection
...
Write to a Different Database
Route the transform output to a different target database by setting sink.connectionRef:
task:
...
sink:
connectionRef: myOtherConnection
...
...
This lets you read from one database and write to another in a single transform.
Data Quality Expectations
Add assertions that Starlake evaluates after the transform runs. If failOnError is true, the job fails when an assertion is not satisfied.
task:
...
expectations:
- expect: "is_col_value_not_unique('id') => result(0) == 1"
failOnError: true
See the expectations reference for the full assertion syntax.
Next Steps
- SQL Transform Tutorial -- end-to-end walkthrough with DuckDB
- SQL Transform Syntax -- SELECT statements, incremental models, and custom SQL
- Export Transform Results -- write results to files or another database
Frequently Asked Questions
What is the default write strategy for a Starlake transform?
By default, the result of the SELECT statement is appended to the target table. You can change this with the writeStrategy.type property in the YAML file (overwrite, upsert, etc.).
How do you partition a table created by a Starlake transform?
Add a sink.partition section in the transform YAML file listing the partition columns. Clustering is configured similarly with sink.clustering.
Can Starlake read from one database and write to another?
Yes. Use connectionRef at the task level to change the source database, and sink.connectionRef to change the target database.
How do you apply row-level security on a transformed table?
Use the rls section of the YAML file to define filter predicates associated with groups or users.
What are expectations in a Starlake transform?
Expectations are data quality assertions evaluated after the transform runs. If failOnError is set to true, the job fails when the assertion is not satisfied.
How do you rename the target table of a transform?
Use the domain and table properties in the YAML file to override the schema and table names derived from the folder and file names.
What is the difference between domain, schema, catalog, and dataset in Starlake?
Starlake uses the term domain as a unified abstraction. Depending on the data warehouse, it maps to a schema (PostgreSQL, Snowflake), a dataset (BigQuery), or a catalog (Databricks).