Customization
Starlake comes with built-in DAG templates that work out of the box. This page covers how to customize these templates, inject parameters at runtime, and manage task dependencies.
You just need to be comfortable with Jinja2 templating and Python to customize DAG generation.
Architecture overview
The orchestration library is organized around a factory pattern:
IStarlakeJob (generic interface)
├── StarlakeAirflowJob (Airflow base)
│ ├── StarlakeAirflowBashJob ← shell execution
│ ├── StarlakeAirflowCloudRunJob ← GCP Cloud Run
│ ├── StarlakeAirflowDataprocJob ← GCP Dataproc
│ └── StarlakeAirflowFargateJob ← AWS Fargate
├── StarlakeDagsterJob (Dagster base)
│ ├── StarlakeDagsterShellJob ← shell execution
│ ├── StarlakeDagsterCloudRunJob ← GCP Cloud Run
│ ├── StarlakeDagsterDataprocJob ← GCP Dataproc
│ └── StarlakeDagsterFargateJob ← AWS Fargate
└── StarlakeSnowflakeJob ← Snowflake native SQL
The appropriate class is instantiated automatically based on the template you choose. The StarlakeJobFactory discovers and registers all implementations via a plugin-based registry.
Core methods
Every factory class provides these methods for generating orchestrator tasks:
| Method | Description | Starlake command |
|---|---|---|
sl_import(task_id, domain, tables) | Import/stage data from the landing area | starlake stage |
sl_pre_load(domain, tables, pre_load_strategy) | Check pre-load conditions | starlake preload |
sl_load(task_id, domain, table, spark_config, dataset) | Load a table | starlake load |
sl_transform(task_id, transform_name, transform_options, spark_config, dataset) | Run a transformation | starlake transform |
sl_job(task_id, arguments, spark_config, dataset, task_type) | Generic command (used internally by the above) | any starlake command |