Test Load Tasks
Starlake load tests validate data loading pipelines locally on DuckDB without cloud access. Provide sample input files and expected CSV output, and Starlake compares schema and data automatically. Test reports include the DuckDB database, unexpected rows, and missing rows for debugging.
For an overview of the testing approach and DuckDB transpilation, see Unit Testing Concepts. For SQL transformation tests, see Test Transform Tasks.
How to write a load test
- Create the test directory -- Add a directory under
metadata/tests/load/{domain}/{table}/test-name. - Add initial data -- Place a CSV or JSONL file named
domain.table.csv(or.json) with data to preload. - Add input data files -- Add files matching the pattern expected by the loader for this table.
- Define expected output -- Add a
_expected.csvfile with the expected table data after loading. - Run the test -- Execute
starlake test --loadorstarlake test --domain <domain> --table <table> --test <test-name>.
Test directory structure
Load tests reside in metadata/tests/load. Each test is a directory inside the domain/table subdirectory.
A test directory contains:
- Initial data file -- A CSV or JSONL file with data to preload into the table before the test runs. Name it after the domain and table:
starbake.product.jsonorstarbake.product.csv. - Input data files -- One or more files whose names match the file pattern expected by the loader. This data is loaded using the
starlake loadtask. - Expected output -- A
_expected.csvfile containing the expected data in the table after the load completes.
How validation works
After loading data against the schema defined in metadata/load, Starlake:
- Compares the table schema with the expected data schema.
- Compares the actual loaded data with the
_expected.csvfile.
The test passes if both schema and data match. Any mismatch raises an error.
Test reports
Reports for a test named test-name are stored in test-reports/load/test-name/ and contain:
testname.db-- The DuckDB database after the load task runs, containing:sl_expected-- The expected data.domain.table(e.g.,starbake.product) -- The actual loaded data.sl_expectations-- Results of any expectation related to this table.audit.audit-- The audit log of the load task.audit.rejected-- Rejected data.
not_expected.csv-- Rows in the actual table that are not in the expected data.missing.csv-- Rows in the expected data that are missing from the actual table.
Running load tests
Run all load tests (skip transform tests):
starlake test --load
Run a specific test:
starlake test --domain starbake --table product --test test-name
Running tests also generates a complete HTML report website in the test-reports directory.
Example: Load summary report
Frequently Asked Questions
Where to place load tests in Starlake?
Load tests are in metadata/tests/load. Each test is a directory in the domain/table subdirectory.
What files make up a load test?
A CSV or JSONL file with initial data (named domain.table.json or .csv), one or more data files matching the loader's expected pattern, and a _expected.csv file containing the expected data after loading.
How does Starlake validate load test results?
Starlake compares the table schema with the expected data schema and the actual data with the _expected.csv file. The test fails if the schema or data does not match.
Where to find load test reports?
In test-reports/load/test-name/. The report contains the DuckDB database (testname.db), unexpected data (not_expected.csv), and missing data (missing.csv).
What does the test report database contain?
The tables sl_expected (expected data), domain.table (actual data), sl_expectations (expectation results), audit.audit (audit log), and audit.rejected (rejected data).
How to run only load tests?
Use the command starlake test --load. For a specific test: starlake test --domain <domain> --table <table> --test <test-name>.
Do tests generate a visual report?
Yes. Running tests generates a complete website in the test-reports directory.