Project Structure

A standard Starlake project follows this directory layout:

my-project/
├── metadata/                        # All pipeline definitions
│   ├── application.sl.yml           # Global config & connections
│   ├── env.sl.yml                   # Base environment variables
│   ├── env.DEV.sl.yml               # Dev overrides
│   ├── env.PROD.sl.yml              # Prod overrides
│   │
│   ├── types/                       # Data type definitions
│   │   └── default.sl.yml           # Default type mappings
│   │
│   ├── load/                        # Ingestion configurations
│   │   └── {domain}/                # One directory per domain
│   │       ├── _config.sl.yml       # Domain-level config
│   │       └── {table}.sl.yml       # Table-level schema
│   │
│   ├── transform/                   # Transformation definitions
│   │   └── {domain}/                # One directory per output domain
│   │       ├── {task}.sl.yml        # Task configuration
│   │       ├── {task}.sql           # SQL logic
│   │       └── {task}.py            # Python logic (alternative)
│   │
│   ├── extract/                     # Extraction configurations
│   │   └── {source}.sl.yml          # Source-specific extraction
│   │
│   ├── expectations/                # Data quality macros
│   │   └── {name}.j2               # Jinja2 expectation templates
│   │
│   ├── dags/                        # Orchestration templates
│   │   └── {dag}.sl.yml             # DAG configurations
│   │
│   └── secure/                      # Security policies
│       ├── rls.sl.yml               # Row-level security
│       └── cls.sl.yml               # Column-level security
│
├── datasets/                        # Data storage areas
│   ├── incoming/                    # Raw file landing zone
│   ├── pending/                     # Staged for processing
│   ├── accepted/                    # Successfully loaded
│   ├── rejected/                    # Failed validation
│   ├── unresolved/                  # Unresolved records
│   └── business/                    # Transformed output
│
└── starlake.sh                      # CLI wrapper script

Key Directories

`metadata/`

Contains all pipeline definitions as YAML files. This is the declarative core of your project — no imperative code needed for standard operations.

`metadata/load/{domain}/`

Each domain represents a logical grouping of tables (e.g., customers, sales, products). The _config.sl.yml file defines domain-level defaults inherited by all tables.

`metadata/transform/{domain}/`

Transformation tasks output to domains. Each task has a YAML configuration file and a corresponding SQL or Python file.

`datasets/`

Data flows through a pipeline of directories:

incoming → pending → accepted/rejected → business

Directory	Purpose
`incoming`	Raw files land here (from external sources)
`pending`	Files staged and ready for processing
`accepted`	Successfully ingested records
`rejected`	Records that failed validation
`business`	Output from transformations

File Naming Conventions

Pattern	Purpose
`*.sl.yml`	Starlake YAML configuration files
`_config.sl.yml`	Domain-level configuration (prefix with underscore)
`env.{ENV}.sl.yml`	Environment-specific overrides
`*.sql`	SQL transformation logic
`*.py`	Python transformation logic
`*.j2`	Jinja2 expectation templates

Key Directories​

metadata/​

metadata/load/{domain}/​

metadata/transform/{domain}/​

datasets/​

File Naming Conventions​