Skip to main content

Schema Reference

All Starlake YAML configuration files are validated against the official JSON Schema.

JSON Schema

https://www.schemastore.org/starlake.json

Use this schema in your IDE for autocompletion and validation:

# VS Code settings.json
{
"yaml.schemas": {
"https://www.schemastore.org/starlake.json": "**/*.sl.yml"
}
}

Configuration Objects

AppConfigV1

Top-level application configuration in application.sl.yml:

application:
connections: {} # Database connections
loader: native # Default engine (native, spark)
privacy: {} # Global privacy settings
metadata: {} # Global metadata defaults
audit: {} # Audit logging configuration

LoadConfigV1

Domain and table configuration for data ingestion:

load:
metadata:
mode: FILE # FILE or STREAM
format: DSV # DSV, JSON, XML, PARQUET, etc.
withHeader: true
separator: ","
encoding: UTF-8
writeStrategy:
type: APPEND # APPEND, OVERWRITE, UPSERT_BY_KEY, etc.
sink:
connectionRef: my-connection

TableConfigV1

Table schema definition:

table:
name: my_table
pattern: "my_table-.*.csv"
attributes:
- name: id
type: long
required: true
- name: name
type: string
- name: created_at
type: timestamp
expectations: []

TransformConfigV1

Transformation task configuration:

transform:
name: my_transform
database: "{{SL_DATABASE}}"
domain: analytics
tasks:
- name: task_name
sql: task_name.sql
writeStrategy:
type: OVERWRITE
sink:
connectionRef: my-connection

ConnectionV1

Database connection definition:

connections:
connection_name:
type: BQ # BQ, JDBC, FS, ES, KAFKA
options:
key: value

ExtractConfigV1

Schema and data extraction configuration:

extract:
connectionRef: my-connection
jdbcSchemas:
- schema: SCHEMA_NAME
tables:
- name: TABLE_NAME
columns:
- name: "*"

Data Types

Standard Starlake types defined in metadata/types/default.sl.yml:

TypeDescriptionExample
stringText dataNames, emails
long64-bit integerIDs, counts
integer32-bit integerSmall numbers
double64-bit floatMeasurements
decimalArbitrary precisionCurrency
booleanTrue/falseFlags
dateDate without time2024-01-15
timestampDate with time2024-01-15T10:30:00
bytesBinary dataFiles, images
structNested structureComplex objects
arrayList of valuesTags, categories

Write Strategies

StrategyDescription
APPENDInsert new records, no dedup
OVERWRITEReplace entire table
UPSERT_BY_KEYUpdate by primary key, insert new
UPSERT_BY_KEY_AND_TIMESTAMPUpdate by key + timestamp
SCD2Slowly changing dimension type 2
DELETE_THEN_INSERTDelete matching, then insert
OVERWRITE_BY_PARTITIONReplace specific partitions only

External Resources