Manual Load: Configure Custom Separators and Incoming Paths
Starlake manual load gives you full control over file parsing, directory layout, and per-environment configuration. Use it when autoload cannot handle your files -- for example, when you have multi-character delimiters or files stored outside the standard incoming/<domain>/ directory structure. All configuration is defined in YAML at the domain and table level.
When to Use Manual Load Instead of Autoload
Use manual load (starlake load) instead of autoload when:
- Your files use a multi-character separator that autoload cannot detect automatically (e.g.,
||,::,~|~). - Your files are stored in a non-standard folder that does not follow the
incoming/<domain>/convention. - You need full control over schema customization before any data is loaded.
For all other cases, autoload is simpler and requires no configuration. See the load tutorial for a step-by-step walkthrough.
Set a Custom Incoming Directory
By default, Starlake looks for files in $SL_ROOT/datasets/incoming/. Override this location by setting the application.area.incoming property:
application:
area:
incoming: /path/to/incoming
Per-Environment Incoming Path Configuration
Use an environment variable to set different incoming paths for dev, test, and prod:
incoming_path: /data/dev/incoming
incoming_path: /data/prod/incoming
application:
area:
incoming: "{{incoming_path}}"
The {{incoming_path}} placeholder is resolved at runtime from the active environment file.
Domain Configuration (_config.sl.yml)
Each domain needs a configuration file that tells the load command where to find incoming files. Create a _config.sl.yml file under metadata/load/<domain>/:
load:
metadata:
directory: "{{incoming_path}}/starbake"
The {{incoming_path}} variable is resolved from the environment file (metadata/env.sl.yml or metadata/env.<env>.sl.yml). This means all files for the starbake domain are loaded from this directory.
Since the directory is set at the domain level, all tables within this domain share the same incoming path.
Table Configuration by File Format
Each table requires a <table>.sl.yml file that defines the file pattern, format, separator, and attribute schema. Configuration differs by file format:
- Load CSV and DSV files -- Delimited files with configurable separator, encoding, and quoting
- Load JSON files -- Flat or nested JSON with struct and array support
- Load XML files -- XML with rowTag, attribute prefix, and XSD validation
- Load fixed-width positional files -- Column positions defined by first/last offsets
Running the Load Command
After configuring domain and table YAML files, run the load command:
starlake load
Starlake processes all domains and tables, matching files in each domain's configured directory against the file patterns defined in each table's .sl.yml file. The load strategy controls the order in which files are processed.
Frequently Asked Questions
When should I use manual load instead of autoload?
Use manual load when your files have a multi-character separator that autoload cannot detect, or when your files are not stored in the standard incoming/<domain>/ directory structure.
How do I set a custom incoming directory in Starlake?
Set the application.area.incoming property in metadata/application.sl.yml. You can use an environment variable (e.g., {{incoming_path}}) and define its value per environment in metadata/env.<env>.sl.yml.
Can I use different incoming paths for dev, test, and prod?
Yes. Define the incoming_path variable in environment-specific files such as metadata/env.dev.sl.yml and reference it with {{incoming_path}} in application.sl.yml.
What is a domain configuration file in Starlake?
The _config.sl.yml file under metadata/load/<domain>/ defines domain-level settings, including the directory where the load command looks for incoming files.
What file formats are supported for manual load?
Manual load supports CSV/DSV, JSON, XML, and fixed-width (positional) files. Each format has its own configuration page with parsing options.