Orchestrate Transform Jobs

Q: How does Starlake resolve dependencies between SQL transforms?

Starlake analyzes the FROM and JOIN clauses of each SQL file to build a dependency graph. Upstream tables are always executed before downstream ones.

Starlake analyzes the FROM and JOIN clauses in your SQL transforms to build a dependency graph and generates ready-to-use DAGs for Airflow, Dagster, or Snowflake Tasks. The generated DAGs include both load and transform jobs in the correct execution order. After any SQL change that modifies table references, re-run starlake dag-generate to keep the DAGs in sync.

How Dependency Resolution Works

Starlake parses every SQL file in metadata/transform/ and extracts table references from FROM and JOIN clauses. It builds a directed acyclic graph (DAG) where each node is a transform task and each edge is a dependency. Upstream tables are always computed before downstream ones.

Example Dependency Graph

Consider three transforms: revenue_summary, product_summary, and order_summary. The order_summary query joins results from the other two:

order_summary
  product_summary
    starbake.products
    starbake.order_lines
    starbake.orders
  revenue_summary
    starbake.orders
    starbake.order_lines

Starlake ensures product_summary and revenue_summary run before order_summary. Source tables from the starbake schema are loaded first via the load jobs.

Generate DAGs

Run the dag-generate command to produce DAG files for your configured orchestrator:

starlake dag-generate

The output includes both load and transform jobs in the correct execution order. Starlake generates files compatible with the target orchestrator:

Airflow -- Python DAG files ready for the dags/ directory
Dagster -- Asset definitions with dependency metadata
Snowflake Tasks -- SQL task definitions with scheduling

Deploy the Generated DAGs

Copy the generated files to your orchestrator's expected directory. For Airflow, this is typically the dags/ folder:

cp generated/dags/*.py $AIRFLOW_HOME/dags/

Regenerate After Changes

After any SQL modification that changes table references (adding a table, modifying a JOIN, renaming a transform), re-run starlake dag-generate. The dependency graph is recomputed from the SQL files each time.

Next Steps

Orchestration Tutorial -- complete walkthrough for setting up orchestration
Customizing DAG Generation -- configure scheduling, retries, parallelism, and orchestrator-specific parameters
SQL Transform Tutorial -- end-to-end transform walkthrough with lineage visualization

Frequently Asked Questions

How does Starlake resolve dependencies between SQL transforms?

Starlake analyzes the FROM and JOIN clauses of each SQL file to build a dependency graph. Upstream tables are always executed before downstream ones.

Which orchestrators are supported by Starlake?

Starlake generates DAGs for Airflow, Dagster, and Snowflake Tasks. Other orchestrators can be integrated via CLI commands.

What does the starlake dag-generate command do?

It analyzes all SQL and Python transforms in the project and generates the corresponding DAG files for the configured orchestrator (Airflow, Dagster, or Snowflake Tasks).

Do generated DAGs include load jobs as well?

Yes. The generated DAGs include both load and transform jobs in the correct execution order.

Do you need to regenerate DAGs after every SQL change?

Yes. After any change that modifies dependencies (adding a table, modifying a JOIN), re-run starlake dag-generate to update the DAGs.

Can you customize the generated DAGs?

Yes. See the DAG customization page to configure scheduling, retries, parallelism, and other orchestrator-specific parameters.

How Dependency Resolution Works​

Example Dependency Graph​

Generate DAGs​

Deploy the Generated DAGs​

Regenerate After Changes​

Next Steps​

Frequently Asked Questions​

How does Starlake resolve dependencies between SQL transforms?​

Which orchestrators are supported by Starlake?​

What does the starlake dag-generate command do?​

Do generated DAGs include load jobs as well?​

Do you need to regenerate DAGs after every SQL change?​

Can you customize the generated DAGs?​