Command line tools

📄️ acl

Export all ACL and Row Level Security entries to a Starlake Data Stack YAML file. Supports local and cloud filesystem paths.

📄️ acl-dependencies

Generate GraphViz dot, SVG, PNG or JSON files visualizing ACL dependencies from Domain and Schema YAML definitions.

📄️ autoload

Automatically watch and load incoming data files into specified domains and tables with optional scheduling and substitutions.

📄️ bootstrap

Create a new Starlake project from scratch or from a template such as quickstart or userguide with a single command.

📄️ bq-freshness

Check BigQuery table freshness by inspecting last modification timestamps, with optional persistence and connection settings.

📄️ bq-info

Retrieve metadata and information about BigQuery tables and datasets, with options to persist results or filter by table.

📄️ cnxload

Load a parquet source file into a JDBC table with configurable connection options and write strategies like APPEND or OVERWRITE.

📄️ col-lineage

Build and export column-level data lineage for a specific task as JSON, helping trace data flow across transformations.

📄️ compare

Compare two versions of a Starlake project using file paths, git commits, or tags and generate a diff report with templates.

📄️ console

Launch the Starlake interactive console for exploring and managing your data project from a local web-based interface.

📄️ dag-deploy

Deploy previously generated DAG files and library dependencies to a target directory for orchestration tools like Airflow.

📄️ dag-generate

Generate DAG files for tasks and domains with optional tag filtering and role definitions for workflow orchestration tools.

📄️ esload

Load datasets in parquet, JSON, or JSON-array format into Elasticsearch indices with custom mappings and Spark configuration.

📄️ extract

Top-level extract command that groups sub-commands for extracting data, schemas, scripts, and BigQuery metadata.

📄️ extract-bq-schema

Extract BigQuery table schemas and dataset metadata with options to filter tables, set connections, and persist results.

📄️ extract-data

Extract data from any database in parallel with support for partitioning, incremental exports, and configurable parallelism.

📄️ extract-rest-data

Extract data from REST API endpoints into CSV files with support for pagination, authentication, and rate limiting.

📄️ extract-rest-schema

Extract schemas from REST API endpoints by fetching sample data and inferring the structure.

📄️ extract-schema

Extract database table schemas and output them as YAML definition files with optional snake_case naming and parallelism control.

📄️ extract-script

Generate database extraction scripts from domain schemas using Mustache templates for tools like sqlplus or pgsql export.

📄️ freshness

Check data freshness across tables and datasets with configurable connections, write modes, and optional result persistence.

📄️ gizmosql

Run SQL queries on any database using JDBC with Spark, supporting session parameters and variable substitutions.

📄️ iam-policies

Generate and apply IAM policies for your Starlake project resources, managing access control with authentication tokens.

📄️ index

Alias for esload. Load datasets into Elasticsearch indices.

📄️ infer-schema

Automatically infer and generate YAML schema definitions from input datasets in CSV, JSON, XML, or Parquet format.

📄️ ingest

Ingest data files into a target domain and schema, with support for multiple paths, scheduling, and custom substitutions.

📄️ job

Execute a named Spark or BigQuery job with configurable options, parameters, and interactive output formats.

📄️ kafkaload

Load and offload data between Kafka topics and BigQuery, files, or other sinks in batch or streaming mode.

📄️ lineage

Generate task dependency graphs as DOT, SVG, PNG or JSON to visualize data lineage across your Starlake project.

📄️ load

Ingest raw files into your data warehouse by watching specified domains and tables with schema validation and options.

📄️ metrics

Compute and publish data quality metrics for a given domain and schema, with optional Google Cloud authentication.

📄️ migrate

Migrate your Starlake project to the latest version, with warnings for breaking changes that require manual attention.

📄️ parquet2csv

Convert Parquet files to CSV format with configurable partitions, write modes, and Spark options like delimiter and header.

📄️ preload

Pre-load domains and tables using a configurable strategy before the main ingestion step, with global ack file support.

📄️ quack

Manage Quack DuckDB query servers — serve in foreground or detach as a background daemon.

📄️ secure

Apply security rules and access controls on specified domains and tables, including row-level and column-level security.

📄️ serve

Start a local Starlake HTTP server on a configurable host and port to serve API requests for your data project.

📄️ settings

Display and validate Starlake project settings, including testing database and warehouse connection configurations.

📄️ site

Generate a documentation site from your Starlake project.

📄️ stage

Move and uncompress files from the landing area to the pending area, handling ack files and domain-based directories.

📄️ summarize

Generate a statistical summary for a specific domain and table, providing key data profiling insights at a glance.

📄️ table-dependencies

Generate GraphViz dependency diagrams from domain and schema YAML files, with SVG, PNG, and JSON output options.

📄️ test

Run unit tests for load and transform tasks on specific domains and tables, with optional HTML report generation.

📄️ transform

Execute SQL-based transformation tasks with dry-run, recursive dependency resolution, and interactive output formats.

📄️ validate

Validate your Starlake project configuration by reloading all YAML files from disk and checking for errors or warnings.

📄️ xls2yml

Convert Excel files describing domains, schemas and attributes into Starlake YAML configuration files.

📄️ xls2ymljob

Convert Excel files describing transform job definitions into Starlake YAML task configuration files.

📄️ yml2ddl

Generate DDL statements from Starlake YAML schemas for your target data warehouse, with optional JDBC apply support.

📄️ yml2xls

Export IAM policy tags from YAML definitions to Excel spreadsheets.