Skip to main content

Write Strategies

Starlake provides eight write strategies that control how incoming data merges into target tables on BigQuery, Snowflake and Databricks. You declare the strategy in the metadata.writeStrategy property of your table YAML file. Available strategies range from simple APPEND and OVERWRITE to key-based UPSERT, SCD2 history tracking and runtime-adaptive mode.

Strategy overview

StrategyDescription
APPENDInsert all incoming rows. Existing data stays untouched.
OVERWRITEReplace all existing rows with the incoming rows.
UPSERT_BY_KEYMerge by key: update existing rows, insert new ones.
UPSERT_BY_KEY_AND_TIMESTAMPMerge by key. Overwrite only when the incoming timestamp is newer.
OVERWRITE_BY_PARTITIONOverwrite only the partitions present in the incoming data. Other partitions remain intact. Requires a partition in the sink configuration.
DELETE_THEN_INSERTDelete target rows matching the incoming keys, then insert all incoming rows.
SCD2Slowly Changing Dimension Type 2. Preserves the full history of each record with start/end timestamps.
ADAPTATIVESelects the strategy at runtime based on file size, date, day of week or file name pattern.

The strategy can be set at the table level or at the domain level (in _config.sl.yml). A domain-level strategy propagates to all tables unless a table overrides it.

APPEND

Creates the table if it does not exist. Inserts all incoming rows without modifying existing data.

Append to the table.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "APPEND"
attributes:
- ...

OVERWRITE

Creates the table if it does not exist. Replaces all existing rows with the incoming data.

Overwrite the table.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE"
attributes:
- ...

UPSERT_BY_KEY

Creates the table if it does not exist. Performs a merge: rows with a matching key are updated; new rows are inserted.

Upsert by key. The column 'id' is used as key.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "UPSERT_BY_KEY"
key: ["id"]
on: TARGET
attributes:
- ...

UPSERT_BY_KEY_AND_TIMESTAMP

Creates the table if it does not exist. Performs a merge by key. Overwrites the existing row only when the incoming timestamp is more recent. Use this when the source can send multiple versions of the same record.

Upsert by key and timestamp. The column 'id' is used as key.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "UPSERT_BY_KEY_AND_TIMESTAMP"
key: ["id"]
timestamp: "valid_from"
on: TARGET
attributes:
- ...

OVERWRITE_BY_PARTITION

Creates the table if it does not exist. Overwrites only the partitions present in the incoming data. All other partitions remain untouched. New partitions are appended.

This strategy requires a partition configuration in the sink section.

Overwrite by partition. Requires the partition property in metadata.sink.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE_BY_PARTITION"
on: TARGET
attributes:
- ...

DELETE_THEN_INSERT

Creates the table if it does not exist. Deletes all target rows whose keys match the incoming data, then inserts every incoming row. Unlike UPSERT_BY_KEY (which performs a single merge operation), DELETE_THEN_INSERT executes two steps: delete then insert.

Delete then insert.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "DELETE_THEN_INSERT"
key: ["id", "name" ...]
attributes:
- ...

SCD2

Creates the table if it does not exist. Implements Slowly Changing Dimension Type 2 to preserve the full history of each record. Each version carries start and end timestamps.

SCD2. The column 'id' is the key, 'date' is the timestamp.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "SCD2"
key: ["id"]
timestamp: "date"
startTs: "valid_from"
endTs: "valid_to"
on: BOTH
attributes:
- ...

The startTs and endTs properties define the column names that store the start and end timestamps of each version.

They are optional. When omitted, Starlake uses sl_start_ts and sl_end_ts as defaults. You can change these defaults in metadata/application.sl.yml:

application:
...
scd2StartTimestamp: "sl_start_ts"
scd2EndTimestamp: "sl_end_ts"

ADAPTATIVE write strategy

The ADAPTATIVE strategy selects the write mode at runtime based on criteria such as day of the week, file size or file name pattern. This removes the need for manual intervention when the loading behavior must change periodically.

Example: APPEND on weekdays, OVERWRITE on Sundays. Set this at the domain level to propagate to all tables:

# _config.sl.yml
load:
name: "DOMAIN"
metadata:
...
writeStrategy:
types:
APPEND: 'dayOfWeek != 7'
OVERWRITE: 'dayOfWeek == 7'

Example: strategy based on file name pattern. Use capture groups in the table pattern and reference them with group():

# _config.sl.yml
load:
name: "DOMAIN"
metadata:
...
writeStrategy:
types:
OVERWRITE: 'group("mode") == "FULL"'
APPEND: 'group("mode") == "APPEND"'
#my_table.sl.yml
table:
...
pattern: ".*-(?<mode>FULL|APPEND).csv$"

Combine criteria with boolean operators !, && and ||. Use parentheses as needed.

note

When using strings in expressions, wrap them with double quotes ".

Available criteria

CriteriaDescriptionExample
group(index or name)File pattern must use (named) capture groupspattern: `my-file-(F
fileSizeCurrent file size in bytesfileSize > 1000
fileSizeBCurrent file size in bytes. Alias of fileSize
fileSizeKoCurrent file size in Ko
fileSizeMoCurrent file size in Mo
fileSizeGoCurrent file size in Go
fileSizeToCurrent file size in To
isFirstDayOfMonthCurrent day is first day of month
isLastDayOfMonthCurrent day is last day of month
dayOfWeekInteger representing day of week. Monday = 1, ..., Sunday = 7
isFileFirstDayOfMonthFile modification date is first day of month
isFileLastDayOfMonthFile modification date is last day of month
fileDayOfWeekInteger representing file modification day of week. Monday = 1, ..., Sunday = 7
note

For a criteria relying on datetime, you can change its timezone with timezone application settings in application.sl.yml

Frequently Asked Questions

What write strategies are available in Starlake?

Starlake provides eight strategies: APPEND, OVERWRITE, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, OVERWRITE_BY_PARTITION, DELETE_THEN_INSERT, SCD2 and ADAPTATIVE. Configure them via metadata.writeStrategy in the table YAML file.

When should I use UPSERT_BY_KEY vs SCD2?

Use UPSERT_BY_KEY when you only need the latest version of each record identified by a key. Use SCD2 (Slowly Changing Dimension Type 2) when you need to preserve the full history of changes to each record, with start and end timestamps tracking each version.

What is the difference between APPEND and OVERWRITE?

APPEND inserts all new rows into the table without modifying existing data. OVERWRITE replaces all existing rows in the table with the new rows, effectively deleting everything and inserting fresh data.

What is the ADAPTATIVE write strategy?

The ADAPTATIVE write strategy allows you to change the write mode at runtime based on criteria such as day of the week, file size, or file name pattern. For example, you can APPEND data on weekdays and OVERWRITE on Sundays.

How does DELETE_THEN_INSERT differ from UPSERT_BY_KEY?

DELETE_THEN_INSERT first deletes all rows in the target table that have matching keys in the incoming data, then inserts all incoming rows. UPSERT_BY_KEY performs a merge operation, updating existing rows and inserting new ones in a single step.

When should I use UPSERT_BY_KEY_AND_TIMESTAMP instead of UPSERT_BY_KEY?

Use UPSERT_BY_KEY_AND_TIMESTAMP when the source can send multiple versions of the same record. Only the most recent version (based on the timestamp column) overwrites the existing row.

How does OVERWRITE_BY_PARTITION work?

Only the partitions present in the incoming data are overwritten. Partitions absent from the incoming batch remain untouched. This strategy requires a partition configured in the sink section.

Can the write strategy be defined at the domain level?

Yes. The strategy defined in _config.sl.yml at the domain level propagates to all tables in the domain, unless a table overrides it locally.