Skip to main content

Orchestrate Transform Jobs

Starlake analyzes the FROM and JOIN clauses in your SQL transforms to build a dependency graph and generates ready-to-use DAGs for Airflow, Dagster, or Snowflake Tasks. The generated DAGs include both load and transform jobs in the correct execution order. After any SQL change that modifies table references, re-run starlake dag-generate to keep the DAGs in sync.

How Dependency Resolution Works

Starlake parses every SQL file in metadata/transform/ and extracts table references from FROM and JOIN clauses. It builds a directed acyclic graph (DAG) where each node is a transform task and each edge is a dependency. Upstream tables are always computed before downstream ones.

Example Dependency Graph

Consider three transforms: revenue_summary, product_summary, and order_summary. The order_summary query joins results from the other two:

order_summary
product_summary
starbake.products
starbake.order_lines
starbake.orders
revenue_summary
starbake.orders
starbake.order_lines

Starlake ensures product_summary and revenue_summary run before order_summary. Source tables from the starbake schema are loaded first via the load jobs.

Generate DAGs

Run the dag-generate command to produce DAG files for your configured orchestrator:

starlake dag-generate

The output includes both load and transform jobs in the correct execution order. Starlake generates files compatible with the target orchestrator:

  • Airflow -- Python DAG files ready for the dags/ directory
  • Dagster -- Asset definitions with dependency metadata
  • Snowflake Tasks -- SQL task definitions with scheduling

Deploy the Generated DAGs

Copy the generated files to your orchestrator's expected directory. For Airflow, this is typically the dags/ folder:

cp generated/dags/*.py $AIRFLOW_HOME/dags/

Regenerate After Changes

After any SQL modification that changes table references (adding a table, modifying a JOIN, renaming a transform), re-run starlake dag-generate. The dependency graph is recomputed from the SQL files each time.

Next Steps

Frequently Asked Questions

How does Starlake resolve dependencies between SQL transforms?

Starlake analyzes the FROM and JOIN clauses of each SQL file to build a dependency graph. Upstream tables are always executed before downstream ones.

Which orchestrators are supported by Starlake?

Starlake generates DAGs for Airflow, Dagster, and Snowflake Tasks. Other orchestrators can be integrated via CLI commands.

What does the starlake dag-generate command do?

It analyzes all SQL and Python transforms in the project and generates the corresponding DAG files for the configured orchestrator (Airflow, Dagster, or Snowflake Tasks).

Do generated DAGs include load jobs as well?

Yes. The generated DAGs include both load and transform jobs in the correct execution order.

Do you need to regenerate DAGs after every SQL change?

Yes. After any change that modifies dependencies (adding a table, modifying a JOIN), re-run starlake dag-generate to update the DAGs.

Can you customize the generated DAGs?

Yes. See the DAG customization page to configure scheduling, retries, parallelism, and other orchestrator-specific parameters.