Create a New Starlake Project with Bootstrap
The starlake bootstrap command creates a new Starlake project with a standard directory structure. It generates the metadata/ folder containing application.sl.yml (project configuration), environment override files, type mappings, and sample data in datasets/incoming/. The sample data includes JSON, CSV, and XML files so you can test loading different formats.
Run the Bootstrap Command
Create an empty directory and run starlake bootstrap. The command generates the full project structure in the current directory. By default, Starlake uses the current working directory. To use a different location, set the SL_ROOT environment variable.
- Linux/MacOS
- Windows
- Docker
mkdir $HOME/userguide
cd $HOME/userguide
starlake bootstrap
To bootstrap in a custom directory:
SL_ROOT=/my/other/location starlake bootstrap
mkdir c:\userguide
cd c:\userguide
starlake bootstrap
mkdir $HOME/userguide
cd $HOME/userguide
docker run -v `pwd`:/app/userguide -e SL_ROOT=/app/userguide -it starlakeai/starlake:VERSION bootstrap
Default Starlake Project Directory Structure
The bootstrap command creates the following hierarchy:
.
├── metadata
│ ├── application.sl.yml # project configuration
│ ├── env.sl.yml # variables used in the project with their default values
│ ├── env.BQ.sl.yml # variables overridden for a BigQuery connection
│ ├── env.DUCKDB.sl.yml # variables overridden for a DuckDB connection
│ ├── expectations
│ │ └── default.sl.yml # expectations macros
│ ├── extract
│ ├── load
│ ├── transform
│ ├── types
│ │ ├── default.sl.yml # types mapping
└── datasets # sample incoming data for this user guide
└── incoming
└── starbake
├── order_202403011414.json
├── order_line_202403011415.csv
└── product.xml
Key directories:
- metadata/: Contains all configuration files for extract, load, and transform pipelines
- metadata/expectations/: Contains data validation rules applied during load and transform
- datasets/incoming/: Contains files to be loaded into your data warehouse. The sample data uses the Starbake project (a bakery management demo)
Configure Your Data Warehouse Connection in application.sl.yml
The metadata/application.sl.yml file is the main project configuration. It defines:
- The list of database connections (DuckDB, BigQuery, Snowflake, etc.)
- The active connection reference (
connectionRef) - Audit sink configuration
- Environment-specific overrides
Here is the default configuration:
application:
connectionRef: "{{activeConnection}}"
audit:
sink:
connectionRef: "{{activeConnection}}"
connections:
sparkLocal:
type: "fs" # Connection to local file system (delta files)
duckdb:
type: "jdbc" # Connection to DuckDB
options:
url: "jdbc:duckdb:{{SL_ROOT}}/datasets/duckdb.db" # Location of the DuckDB database
driver: "org.duckdb.DuckDBDriver"
bigquery:
type: "bigquery"
options:
location: europe-west1
authType: "APPLICATION_DEFAULT"
authScopes: "https://www.googleapis.com/auth/cloud-platform"
writeMethod: "direct"
The connectionRef uses a variable ({{activeConnection}}) that is resolved from the environment files. Each environment file sets this variable to point to the appropriate connection.
Switch Between DuckDB and BigQuery Environments
The files env.DUCKDB.sl.yml and env.BQ.sl.yml override default variable values for their respective connections. Set the SL_ENV environment variable to select the active environment:
- Linux/MacOS
- Windows
- Docker
SL_ENV=DUCKDB starlake <command>
SET SL_ENV=DUCKDB
starlake <command>
docker run -v `pwd`:/app/userguide \
-e SL_ROOT=/app/userguide \
-e SL_ENV=DUCKDB \
-it starlakeai/starlake:VERSION <command>
Next Steps: Load, Transform, and Orchestrate Data
With the project created, follow these guides:
- Load data into your warehouse
- Transform data for analysis
- Run transformations from CLI and Airflow
- Generate project documentation
The tutorials use the Starbake sample project. Starbake is a bakery management demo that ships with the bootstrap command and demonstrates Starlake's load, transform, and orchestration features.
Frequently Asked Questions
What does starlake bootstrap do?
The starlake bootstrap command creates a new Starlake project with a default directory structure. It generates the metadata/ folder with configuration files, type mappings, and sample data in datasets/incoming/.
What is the default Starlake project structure?
A bootstrapped project contains: metadata/application.sl.yml (project configuration), metadata/env.sl.yml (default variables), environment-specific overrides (env.BQ.sl.yml, env.DUCKDB.sl.yml), and directories for extract, load, transform configurations. Sample data is placed in datasets/incoming/.
How do I switch between DuckDB and BigQuery in Starlake?
Set the SL_ENV environment variable. Use SL_ENV=DUCKDB starlake <command> for DuckDB or SL_ENV=BQ starlake <command> for BigQuery. Each environment has its own override file (env.DUCKDB.sl.yml, env.BQ.sl.yml).
What is the application.sl.yml file?
It is the main project configuration file in Starlake. It defines the list of database connections, the active connection reference, audit configuration, and environment-specific overrides.
Can I bootstrap a Starlake project in a custom directory?
Yes. Set the SL_ROOT environment variable to the desired path: SL_ROOT=/my/other/location starlake bootstrap.
How do I run Starlake bootstrap with Docker?
Create a directory, then run: docker run -v $(pwd):/app/userguide -e SL_ROOT=/app/userguide -it starlakeai/starlake:VERSION bootstrap.