Skip to main content

Monitor Starlake Data Extractions

Starlake records the result of every extraction run in the SL_LAST_EXPORT audit table. This table stores the schema name, table name, row count, success status, extraction mode, and the last exported partition column value. You can query it directly with SQL to detect failures, track volumes, and monitor incremental extraction progress. The audit table lives in a configurable schema (named audit by default) and its connection is defined in metadata/application.sl.yml.

Configure the Audit Schema Connection

Define the audit schema and its JDBC connection in metadata/application.sl.yml:

version: 1
application:
audit:
domain: "audit" # Schema name
sink:
connectionRef: "audit_connection" # Connection to the audit schema

connections:
audit_connection:
type: "jdbc"
options:
url: "jdbc:duckdb:{{SL_ROOT}}/datasets/duckdb.db" # Location of the DuckDB database
driver: "org.duckdb.DuckDBDriver"
...

The domain field sets the audit schema name. The sink.connectionRef points to a JDBC connection defined in the connections section.

SL_LAST_EXPORT Table Schema

The SL_LAST_EXPORT table stores the state of each extraction run. Its columns are:

ColumnTypeDescription
domainVARCHARThe source schema name (e.g., starbake)
schemaVARCHARThe table name that was extracted
last_tsTIMESTAMPTimestamp of the last extraction run
last_dateVARCHARDate string of the last extraction
last_longBIGINTLast exported value of the partition column (for incremental extractions)
last_stringVARCHARLast exported string value of the partition column
modeVARCHARExtraction mode: FULL or INCREMENTAL
countBIGINTNumber of rows extracted in the last run
successBOOLEANWhether the extraction completed successfully
messageVARCHARError or status message
stepVARCHARThe extraction step

SQL Queries to Check Extraction Status

Check the last export for all tables in a schema:

-- Last export status for all tables in the starbake schema
SELECT domain, schema, last_ts, last_long, mode, count, success
FROM audit.SL_LAST_EXPORT
WHERE domain = 'starbake'
ORDER BY last_ts DESC;

Find failed extractions in the last 24 hours:

-- Failed extractions in the last 24 hours
SELECT domain, schema, last_ts, message
FROM audit.SL_LAST_EXPORT
WHERE success = false
AND last_ts >= CURRENT_TIMESTAMP - INTERVAL '24 hours';

View the extraction history for a specific table:

-- Extraction history for the order table
SELECT last_ts, mode, count, success, message
FROM audit.SL_LAST_EXPORT
WHERE domain = 'starbake' AND schema = 'order'
ORDER BY last_ts DESC;

Extraction Monitoring Workflow: Detect Failures and Track Row Counts

A typical monitoring workflow runs after each extraction pipeline:

  1. Run the extraction via starlake extract-data.
  2. Query SL_LAST_EXPORT to confirm all expected tables were extracted successfully.
  3. Alert on failures -- if any row has success = false, trigger a notification to the operations team.
  4. Track row counts over time -- sudden drops in count may indicate source data issues or connectivity problems.
  5. Check incremental state -- for tables using incremental extraction, verify that last_long or last_string progresses as expected.

Frequently Asked Questions

Where does Starlake store the extraction history?

In the SL_LAST_EXPORT table of the audit schema (audit by default). The connection to the audit schema is configured in metadata/application.sl.yml.

What information does the SL_LAST_EXPORT table contain?

It contains: the schema name (domain), the table name (schema), the timestamp of the last extraction (last_ts), the mode (FULL or INCREMENTAL), the number of rows extracted (count), the success status (success), and an optional error message (message).

How do I detect failed extractions?

Query the SL_LAST_EXPORT table with the filter WHERE success = false. You can add a time filter, for example AND last_ts >= CURRENT_TIMESTAMP - INTERVAL '24 hours'.

How do I configure the audit schema connection?

In metadata/application.sl.yml, section audit, define domain (audit schema name) and sink.connectionRef (reference to a JDBC connection defined in the connections section).

How do I detect anomalies in extracted data volume?

Monitor the count field in SL_LAST_EXPORT. A sudden drop in row count may indicate a problem in the source data or connectivity.