Write Strategies
Starlake provides eight write strategies that control how incoming data merges into target tables on BigQuery, Snowflake and Databricks. You declare the strategy in the metadata.writeStrategy property of your table YAML file. Available strategies range from simple APPEND and OVERWRITE to key-based UPSERT, SCD2 history tracking and runtime-adaptive mode.
Strategy overview
| Strategy | Description |
|---|---|
| APPEND | Insert all incoming rows. Existing data stays untouched. |
| OVERWRITE | Replace all existing rows with the incoming rows. |
| UPSERT_BY_KEY | Merge by key: update existing rows, insert new ones. |
| UPSERT_BY_KEY_AND_TIMESTAMP | Merge by key. Overwrite only when the incoming timestamp is newer. |
| OVERWRITE_BY_PARTITION | Overwrite only the partitions present in the incoming data. Other partitions remain intact. Requires a partition in the sink configuration. |
| DELETE_THEN_INSERT | Delete target rows matching the incoming keys, then insert all incoming rows. |
| SCD2 | Slowly Changing Dimension Type 2. Preserves the full history of each record with start/end timestamps. |
| ADAPTATIVE | Selects the strategy at runtime based on file size, date, day of week or file name pattern. |
The strategy can be set at the table level or at the domain level (in _config.sl.yml). A domain-level strategy propagates to all tables unless a table overrides it.
APPEND
Creates the table if it does not exist. Inserts all incoming rows without modifying existing data.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "APPEND"
attributes:
- ...
OVERWRITE
Creates the table if it does not exist. Replaces all existing rows with the incoming data.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE"
attributes:
- ...
UPSERT_BY_KEY
Creates the table if it does not exist. Performs a merge: rows with a matching key are updated; new rows are inserted.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "UPSERT_BY_KEY"
key: ["id"]
on: TARGET
attributes:
- ...
UPSERT_BY_KEY_AND_TIMESTAMP
Creates the table if it does not exist. Performs a merge by key. Overwrites the existing row only when the incoming timestamp is more recent. Use this when the source can send multiple versions of the same record.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "UPSERT_BY_KEY_AND_TIMESTAMP"
key: ["id"]
timestamp: "valid_from"
on: TARGET
attributes:
- ...
OVERWRITE_BY_PARTITION
Creates the table if it does not exist. Overwrites only the partitions present in the incoming data. All other partitions remain untouched. New partitions are appended.
This strategy requires a partition configuration in the sink section.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE_BY_PARTITION"
on: TARGET
attributes:
- ...
DELETE_THEN_INSERT
Creates the table if it does not exist. Deletes all target rows whose keys match the incoming data, then inserts every incoming row. Unlike UPSERT_BY_KEY (which performs a single merge operation), DELETE_THEN_INSERT executes two steps: delete then insert.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "DELETE_THEN_INSERT"
key: ["id", "name" ...]
attributes:
- ...
SCD2
Creates the table if it does not exist. Implements Slowly Changing Dimension Type 2 to preserve the full history of each record. Each version carries start and end timestamps.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "SCD2"
key: ["id"]
timestamp: "date"
startTs: "valid_from"
endTs: "valid_to"
on: BOTH
attributes:
- ...
The startTs and endTs properties define the column names that store the start and end timestamps of each version.
They are optional. When omitted, Starlake uses sl_start_ts and sl_end_ts as defaults. You can change these defaults in metadata/application.sl.yml:
application:
...
scd2StartTimestamp: "sl_start_ts"
scd2EndTimestamp: "sl_end_ts"
ADAPTATIVE write strategy
The ADAPTATIVE strategy selects the write mode at runtime based on criteria such as day of the week, file size or file name pattern. This removes the need for manual intervention when the loading behavior must change periodically.
Example: APPEND on weekdays, OVERWRITE on Sundays. Set this at the domain level to propagate to all tables:
# _config.sl.yml
load:
name: "DOMAIN"
metadata:
...
writeStrategy:
types:
APPEND: 'dayOfWeek != 7'
OVERWRITE: 'dayOfWeek == 7'
Example: strategy based on file name pattern. Use capture groups in the table pattern and reference them with group():
# _config.sl.yml
load:
name: "DOMAIN"
metadata:
...
writeStrategy:
types:
OVERWRITE: 'group("mode") == "FULL"'
APPEND: 'group("mode") == "APPEND"'
#my_table.sl.yml
table:
...
pattern: ".*-(?<mode>FULL|APPEND).csv$"
Combine criteria with boolean operators !, && and ||. Use parentheses as needed.
When using strings in expressions, wrap them with double quotes ".
Available criteria
| Criteria | Description | Example |
|---|---|---|
| group(index or name) | File pattern must use (named) capture groups | pattern: `my-file-(F |
| fileSize | Current file size in bytes | fileSize > 1000 |
| fileSizeB | Current file size in bytes. Alias of fileSize | |
| fileSizeKo | Current file size in Ko | |
| fileSizeMo | Current file size in Mo | |
| fileSizeGo | Current file size in Go | |
| fileSizeTo | Current file size in To | |
| isFirstDayOfMonth | Current day is first day of month | |
| isLastDayOfMonth | Current day is last day of month | |
| dayOfWeek | Integer representing day of week. Monday = 1, ..., Sunday = 7 | |
| isFileFirstDayOfMonth | File modification date is first day of month | |
| isFileLastDayOfMonth | File modification date is last day of month | |
| fileDayOfWeek | Integer representing file modification day of week. Monday = 1, ..., Sunday = 7 |
For a criteria relying on datetime, you can change its timezone with timezone application settings in application.sl.yml
Frequently Asked Questions
What write strategies are available in Starlake?
Starlake provides eight strategies: APPEND, OVERWRITE, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, OVERWRITE_BY_PARTITION, DELETE_THEN_INSERT, SCD2 and ADAPTATIVE. Configure them via metadata.writeStrategy in the table YAML file.
When should I use UPSERT_BY_KEY vs SCD2?
Use UPSERT_BY_KEY when you only need the latest version of each record identified by a key. Use SCD2 (Slowly Changing Dimension Type 2) when you need to preserve the full history of changes to each record, with start and end timestamps tracking each version.
What is the difference between APPEND and OVERWRITE?
APPEND inserts all new rows into the table without modifying existing data. OVERWRITE replaces all existing rows in the table with the new rows, effectively deleting everything and inserting fresh data.
What is the ADAPTATIVE write strategy?
The ADAPTATIVE write strategy allows you to change the write mode at runtime based on criteria such as day of the week, file size, or file name pattern. For example, you can APPEND data on weekdays and OVERWRITE on Sundays.
How does DELETE_THEN_INSERT differ from UPSERT_BY_KEY?
DELETE_THEN_INSERT first deletes all rows in the target table that have matching keys in the incoming data, then inserts all incoming rows. UPSERT_BY_KEY performs a merge operation, updating existing rows and inserting new ones in a single step.
When should I use UPSERT_BY_KEY_AND_TIMESTAMP instead of UPSERT_BY_KEY?
Use UPSERT_BY_KEY_AND_TIMESTAMP when the source can send multiple versions of the same record. Only the most recent version (based on the timestamp column) overwrites the existing row.
How does OVERWRITE_BY_PARTITION work?
Only the partitions present in the incoming data are overwritten. Partitions absent from the incoming batch remain untouched. This strategy requires a partition configured in the sink section.
Can the write strategy be defined at the domain level?
Yes. The strategy defined in _config.sl.yml at the domain level propagates to all tables in the domain, unless a table overrides it locally.
Related
- Load Tutorial -- end-to-end walkthrough for loading files into your warehouse
- Load Strategies -- control the order in which files are processed
- Transform Configuration -- apply write strategies to SQL transform output
- Unit Test Load Pipelines -- validate write strategy behavior locally