Skip to main content

Customize the Transform Configuration

Every Starlake SQL or Python transform can be fine-tuned through a companion YAML configuration file placed next to the source file. This configuration controls write strategies (append, overwrite, upsert), table partitioning and clustering, access control (ACL, row-level security), data quality expectations, and cross-database routing. When no YAML file is provided, Starlake applies sensible defaults: append mode, schema inferred from the directory name, and table inferred from the file name.

Configuration Properties Overview

PropertyDescriptionDefault
writeStrategy.typeHow results are written to the target tableappend
sink.partitionColumns used for table partitioningnone
sink.clusteringColumns used for table clusteringnone
sink.connectionRefTarget database connection for cross-database writesproject default
connectionRefSource database connectionproject default
aclTable-level access grantsnone
rlsRow-level security predicatesnone
expectationsData quality assertionsnone
domainOverride the target schema namedirectory name
tableOverride the target table namefile name

Domain and Schema Terminology

note

Data warehouses organize tables into schemas. Depending on the platform, a schema is called schema (PostgreSQL, Snowflake), dataset (BigQuery), or catalog (Databricks). Starlake uses the term domain as a unified abstraction.

Directory Structure and Naming Conventions

The transform directory follows this structure:

transform directory structure
metadata/transform
└── <domain>
├── _config.sl.yml # domain-level configuration (shared settings for all tasks in this domain)
├── <table>.sl.yml
└── <table>.sql

By default, the transform task name matches the file name. The SQL result writes to a table named <table> in the schema named <domain>. The companion YAML file configures how that materialization happens.

Override the Target Schema and Table

Use the domain and table properties to write to a different schema or table than the defaults:

metadata/transform/<domain>/custom.sl.yml
task:
domain: mydomain # writes to the mydomain schema regardless of the folder name
table: mytable # writes to the mytable table regardless of the file name
name: custom # task name used in CLI commands

Write Strategies

By default, the SELECT result appends to the target table. Override this with writeStrategy.type:

metadata/transform/<domain>/<table>.sl.yml
task:
writeStrategy:
type: OVERWRITE ## or any strategy defined in the write strategies reference

Transform write strategies work the same way as load write strategies. Available types include append, overwrite, upsert, and others documented in that reference.

Clustering and Partitioning

Partition and cluster the target table to improve query performance:

metadata/transform/<domain>/<table>.sl.yml
task:
...
sink:
partition:
- column1
- column2
clustering:
- column3
- column4

See the sink configuration reference for detailed options.

Access Control

Control who can access the target table. Starlake supports table-level grants (ACL), row-level security (RLS), and column-level access policies.

metadata/transform/<domain>/<table>.sl.yml
task:
...
acl:
- role: SELECT
grants:
- [email protected]
- group
rls:
- name: "USA only"
predicate: "country = 'USA'"
grants:
- "group:mygroup"
attributesDesc:
- name: "code0"
accessPolicy: <column_access_policy>
...

See the security configuration reference for more details on access control options.

Read from a Different Database

By default, transforms run against the database configured in metadata/application.sl.yml. Override the source database with connectionRef:

metadata/transform/<domain>/<table>.sl.yml
task:
...
connectionRef: myConnection
...

Write to a Different Database

Route the transform output to a different target database by setting sink.connectionRef:

metadata/transform/<domain>/<table>.sl.yml
task:
...
sink:
connectionRef: myOtherConnection
...
...

This lets you read from one database and write to another in a single transform.

Data Quality Expectations

Add assertions that Starlake evaluates after the transform runs. If failOnError is true, the job fails when an assertion is not satisfied.

metadata/transform/<domain>/<table>.sl.yml
task:
...
expectations:
- expect: "is_col_value_not_unique('id') => result(0) == 1"
failOnError: true

See the expectations reference for the full assertion syntax.

Next Steps

Frequently Asked Questions

What is the default write strategy for a Starlake transform?

By default, the result of the SELECT statement is appended to the target table. You can change this with the writeStrategy.type property in the YAML file (overwrite, upsert, etc.).

How do you partition a table created by a Starlake transform?

Add a sink.partition section in the transform YAML file listing the partition columns. Clustering is configured similarly with sink.clustering.

Can Starlake read from one database and write to another?

Yes. Use connectionRef at the task level to change the source database, and sink.connectionRef to change the target database.

How do you apply row-level security on a transformed table?

Use the rls section of the YAML file to define filter predicates associated with groups or users.

What are expectations in a Starlake transform?

Expectations are data quality assertions evaluated after the transform runs. If failOnError is set to true, the job fails when the assertion is not satisfied.

How do you rename the target table of a transform?

Use the domain and table properties in the YAML file to override the schema and table names derived from the folder and file names.

What is the difference between domain, schema, catalog, and dataset in Starlake?

Starlake uses the term domain as a unified abstraction. Depending on the data warehouse, it maps to a schema (PostgreSQL, Snowflake), a dataset (BigQuery), or a catalog (Databricks).