Ingestion & Loading

9 skills covering every aspect of data ingestion — from auto-inferring schemas to loading into BigQuery, Snowflake, Elasticsearch, and Kafka.

Skills

autoload

Auto-infer schemas and load data in a single step. Scans incoming files, infers column types, and creates load configurations automatically.

You: /autoload Infer schema from my CSV files in the incoming/customers/ directory

Key features:

Automatic type inference from file samples
Support for CSV, JSON, XML, and Parquet
Generates both domain and table YAML configurations

load

Load data from the pending area into your target warehouse. The primary ingestion skill with comprehensive write strategy support.

You: /load Configure loading JSON files into BigQuery with UPSERT_BY_KEY strategy

Write strategies:

Strategy	Description
`APPEND`	Add new records without deduplication
`OVERWRITE`	Replace all existing data
`UPSERT_BY_KEY`	Update by primary key, insert new
`UPSERT_BY_KEY_AND_TIMESTAMP`	Update by key + timestamp for SCD
`SCD2`	Slowly changing dimension type 2
`DELETE_THEN_INSERT`	Delete matching records, then insert

Supported sinks: BigQuery, Snowflake, DuckDB, PostgreSQL, Redshift, Databricks, Elasticsearch, Kafka

cnxload

Load files directly into JDBC tables. Bypasses the standard staging pipeline for direct file-to-database loading.

You: /cnxload Load a CSV file directly into my PostgreSQL customers table

esload

Load data into Elasticsearch indices. Configure index mappings, document IDs, and bulk loading parameters.

You: /esload Configure loading product data into an Elasticsearch index

index

Alias for esload. The starlake index command is identical to starlake esload — all options and behavior are the same. See esload for full details.

ingest

Generic data ingestion skill. Covers the overall ingestion pipeline from landing to accepted/rejected areas.

You: /ingest Walk me through the full ingestion pipeline for XML files

kafkaload

Load data to/from Apache Kafka topics. Configure producers, consumers, serialization, and topic management.

You: /kafkaload Set up Kafka ingestion for real-time event data

Capabilities:

Topic-to-table loading
Table-to-topic offloading
Avro/JSON serialization
Consumer group management

preload

Check the landing area for incoming files before processing. Validates file presence, naming conventions, and readiness.

You: /preload Configure landing area checks for my daily CSV deliveries

stage

Move files from landing to pending area. Handles file organization, renaming, and preparation for the load step.

You: /stage Set up staging rules for files arriving in GCS

Common Configuration Example

A typical ingestion domain configuration:

# metadata/load/customers/_config.sl.yml
load:
  metadata:
    mode: FILE
    format: DSV
    withHeader: true
    separator: ","
    encoding: UTF-8
    multiline: false
    writeStrategy:
      type: UPSERT_BY_KEY_AND_TIMESTAMP
      key: [customer_id]
      timestamp: updated_at
    sink:
      connectionRef: my-bigquery
      partition:
        - ingestion_date

# metadata/load/customers/customers.sl.yml
table:
  name: customers
  pattern: "customers-.*.csv"
  attributes:
    - name: customer_id
      type: long
      required: true
      privacy: NONE
    - name: email
      type: string
      required: true
      privacy: SHA256
    - name: name
      type: string
    - name: created_at
      type: timestamp
    - name: updated_at
      type: timestamp

Skills​

autoload​

load​

cnxload​

esload​

index​

ingest​

kafkaload​

preload​

stage​

Common Configuration Example​