Skip to main content

Project Structure

A standard Starlake project follows this directory layout:

my-project/
├── metadata/ # All pipeline definitions
│ ├── application.sl.yml # Global config & connections
│ ├── env.sl.yml # Base environment variables
│ ├── env.DEV.sl.yml # Dev overrides
│ ├── env.PROD.sl.yml # Prod overrides
│ │
│ ├── types/ # Data type definitions
│ │ └── default.sl.yml # Default type mappings
│ │
│ ├── load/ # Ingestion configurations
│ │ └── {domain}/ # One directory per domain
│ │ ├── _config.sl.yml # Domain-level config
│ │ └── {table}.sl.yml # Table-level schema
│ │
│ ├── transform/ # Transformation definitions
│ │ └── {domain}/ # One directory per output domain
│ │ ├── {task}.sl.yml # Task configuration
│ │ ├── {task}.sql # SQL logic
│ │ └── {task}.py # Python logic (alternative)
│ │
│ ├── extract/ # Extraction configurations
│ │ └── {source}.sl.yml # Source-specific extraction
│ │
│ ├── expectations/ # Data quality macros
│ │ └── {name}.j2 # Jinja2 expectation templates
│ │
│ ├── dags/ # Orchestration templates
│ │ └── {dag}.sl.yml # DAG configurations
│ │
│ └── secure/ # Security policies
│ ├── rls.sl.yml # Row-level security
│ └── cls.sl.yml # Column-level security

├── datasets/ # Data storage areas
│ ├── incoming/ # Raw file landing zone
│ ├── pending/ # Staged for processing
│ ├── accepted/ # Successfully loaded
│ ├── rejected/ # Failed validation
│ ├── unresolved/ # Unresolved records
│ └── business/ # Transformed output

└── starlake.sh # CLI wrapper script

Key Directories

metadata/

Contains all pipeline definitions as YAML files. This is the declarative core of your project — no imperative code needed for standard operations.

metadata/load/{domain}/

Each domain represents a logical grouping of tables (e.g., customers, sales, products). The _config.sl.yml file defines domain-level defaults inherited by all tables.

metadata/transform/{domain}/

Transformation tasks output to domains. Each task has a YAML configuration file and a corresponding SQL or Python file.

datasets/

Data flows through a pipeline of directories:

incoming → pending → accepted/rejected → business
DirectoryPurpose
incomingRaw files land here (from external sources)
pendingFiles staged and ready for processing
acceptedSuccessfully ingested records
rejectedRecords that failed validation
businessOutput from transformations

File Naming Conventions

PatternPurpose
*.sl.ymlStarlake YAML configuration files
_config.sl.ymlDomain-level configuration (prefix with underscore)
env.{ENV}.sl.ymlEnvironment-specific overrides
*.sqlSQL transformation logic
*.pyPython transformation logic
*.j2Jinja2 expectation templates