📄️ acl-dependencies
Generate GraphViz dot, SVG, PNG or JSON files visualizing ACL dependencies from Domain and Schema YAML definitions.
📄️ autoload
Automatically watch and load incoming data files into specified domains and tables with optional scheduling and substitutions.
📄️ bootstrap
Create a new Starlake project from scratch or from a template such as quickstart or userguide with a single command.
📄️ bq-freshness
Check BigQuery table freshness by inspecting last modification timestamps, with optional persistence and connection settings.
📄️ bq-info
Retrieve metadata and information about BigQuery tables and datasets, with options to persist results or filter by table.
📄️ cnxload
Load a parquet source file into a JDBC table with configurable connection options and write strategies like APPEND or OVERWRITE.
📄️ col-lineage
Build and export column-level data lineage for a specific task as JSON, helping trace data flow across transformations.
📄️ compare
Compare two versions of a Starlake project using file paths, git commits, or tags and generate a diff report with templates.
📄️ console
Launch the Starlake interactive console for exploring and managing your data project from a local web-based interface.
📄️ dag-deploy
Deploy previously generated DAG files and library dependencies to a target directory for orchestration tools like Airflow.
📄️ dag-generate
Generate DAG files for tasks and domains with optional tag filtering and role definitions for workflow orchestration tools.
📄️ esload
Load datasets in parquet, JSON, or JSON-array format into Elasticsearch indices with custom mappings and Spark configuration.
📄️ extract
Top-level extract command that groups sub-commands for extracting data, schemas, scripts, and BigQuery metadata.
📄️ extract-bq-schema
Extract BigQuery table schemas and dataset metadata with options to filter tables, set connections, and persist results.
📄️ extract-data
Extract data from any database in parallel with support for partitioning, incremental exports, and configurable parallelism.
📄️ extract-schema
Extract database table schemas and output them as YAML definition files with optional snake_case naming and parallelism control.
📄️ extract-script
Generate database extraction scripts from domain schemas using Mustache templates for tools like sqlplus or pgsql export.
📄️ freshness
Check data freshness across tables and datasets with configurable connections, write modes, and optional result persistence.
📄️ iam-policies
Generate and apply IAM policies for your Starlake project resources, managing access control with authentication tokens.
📄️ infer-schema
Automatically infer and generate YAML schema definitions from input datasets in CSV, JSON, XML, or Parquet format.
📄️ ingest
Ingest data files into a target domain and schema, with support for multiple paths, scheduling, and custom substitutions.
📄️ kafkaload
Load and offload data from Kafka topics in batch or streaming mode with configurable Spark options and triggers.
📄️ lineage
Generate task dependency graphs as DOT, SVG, PNG or JSON to visualize data lineage across your Starlake project.
📄️ load
Ingest raw files into your data warehouse by watching specified domains and tables with schema validation and options.
📄️ metrics
Compute and publish data quality metrics for a given domain and schema, with optional Google Cloud authentication.
📄️ migrate
Migrate your Starlake project to the latest version, with warnings for breaking changes that require manual attention.
📄️ parquet2csv
Convert Parquet files to CSV format with configurable partitions, write modes, and Spark options like delimiter and header.
📄️ preload
Pre-load domains and tables using a configurable strategy before the main ingestion step, with global ack file support.
📄️ secure
Apply security rules and access controls on specified domains and tables, including row-level and column-level security.
📄️ serve
Start a local Starlake HTTP server on a configurable host and port to serve API requests for your data project.
📄️ settings
Display and validate Starlake project settings, including testing database and warehouse connection configurations.
📄️ site
Generate a documentation site from your Starlake project in JSON or Docusaurus MDX format with customizable templates.
📄️ stage
Move and uncompress files from the landing area to the pending area, handling ack files and domain-based directories.
📄️ summarize
Generate a statistical summary for a specific domain and table, providing key data profiling insights at a glance.
📄️ table-dependencies
Generate GraphViz dependency diagrams from domain and schema YAML files, with SVG, PNG, and JSON output options.
📄️ test
Run unit tests for load and transform tasks on specific domains and tables, with optional HTML report generation.
📄️ transform
Execute SQL-based transformation tasks with dry-run, recursive dependency resolution, and interactive output formats.
📄️ validate
Validate your Starlake project configuration by reloading all YAML files from disk and checking for errors or warnings.
📄️ xls2yml
Convert Excel files describing domains, schemas and attributes into Starlake YAML configuration files. Ideal for business analysts who prefer spreadsheets over YAML editing.
📄️ xls2ymljob
Convert Excel files describing transform job definitions into Starlake YAML task configuration files. Ideal for business analysts who define transformations in spreadsheets.
📄️ yml2ddl
Generate DDL statements from Starlake YAML schemas for your target data warehouse, with optional JDBC apply support.
📄️ yml2xls
Export Starlake YAML domain definitions and IAM policy tags to Excel spreadsheets for review and collaboration.