Skip to main content

Manual Load: Configure Custom Separators and Incoming Paths

Starlake manual load gives you full control over file parsing, directory layout, and per-environment configuration. Use it when autoload cannot handle your files -- for example, when you have multi-character delimiters or files stored outside the standard incoming/<domain>/ directory structure. All configuration is defined in YAML at the domain and table level.

When to Use Manual Load Instead of Autoload

Use manual load (starlake load) instead of autoload when:

  • Your files use a multi-character separator that autoload cannot detect automatically (e.g., ||, ::, ~|~).
  • Your files are stored in a non-standard folder that does not follow the incoming/<domain>/ convention.
  • You need full control over schema customization before any data is loaded.

For all other cases, autoload is simpler and requires no configuration. See the load tutorial for a step-by-step walkthrough.

Set a Custom Incoming Directory

By default, Starlake looks for files in $SL_ROOT/datasets/incoming/. Override this location by setting the application.area.incoming property:

metadata/application.sl.yml
application:
area:
incoming: /path/to/incoming

Per-Environment Incoming Path Configuration

Use an environment variable to set different incoming paths for dev, test, and prod:

metadata/env.dev.sl.yml
incoming_path: /data/dev/incoming
metadata/env.prod.sl.yml
incoming_path: /data/prod/incoming
metadata/application.sl.yml
application:
area:
incoming: "{{incoming_path}}"

The {{incoming_path}} placeholder is resolved at runtime from the active environment file.

Domain Configuration (_config.sl.yml)

Each domain needs a configuration file that tells the load command where to find incoming files. Create a _config.sl.yml file under metadata/load/<domain>/:

metadata/load/starbake/_config.sl.yml
load:
metadata:
directory: "{{incoming_path}}/starbake"

The {{incoming_path}} variable is resolved from the environment file (metadata/env.sl.yml or metadata/env.<env>.sl.yml). This means all files for the starbake domain are loaded from this directory.

Since the directory is set at the domain level, all tables within this domain share the same incoming path.

Table Configuration by File Format

Each table requires a <table>.sl.yml file that defines the file pattern, format, separator, and attribute schema. Configuration differs by file format:

Running the Load Command

After configuring domain and table YAML files, run the load command:

starlake load

Starlake processes all domains and tables, matching files in each domain's configured directory against the file patterns defined in each table's .sl.yml file. The load strategy controls the order in which files are processed.

Frequently Asked Questions

When should I use manual load instead of autoload?

Use manual load when your files have a multi-character separator that autoload cannot detect, or when your files are not stored in the standard incoming/<domain>/ directory structure.

How do I set a custom incoming directory in Starlake?

Set the application.area.incoming property in metadata/application.sl.yml. You can use an environment variable (e.g., {{incoming_path}}) and define its value per environment in metadata/env.<env>.sl.yml.

Can I use different incoming paths for dev, test, and prod?

Yes. Define the incoming_path variable in environment-specific files such as metadata/env.dev.sl.yml and reference it with {{incoming_path}} in application.sl.yml.

What is a domain configuration file in Starlake?

The _config.sl.yml file under metadata/load/<domain>/ defines domain-level settings, including the directory where the load command looks for incoming files.

What file formats are supported for manual load?

Manual load supports CSV/DSV, JSON, XML, and fixed-width (positional) files. Each format has its own configuration page with parsing options.