Skip to main content

extract-rest-data

Synopsis

starlake extract-rest-data [options]

Description

Extract data from REST API endpoints into CSV files. Supports pagination (offset, cursor, link header, page number), authentication (bearer, API key, basic, OAuth2), rate limiting, and parent-child endpoint relationships.

The extracted CSV files can then be ingested using starlake load.

Examples

starlake.sh extract-rest-data --config my-rest-api --outputDir /tmp/api-data starlake.sh extract-rest-data --config my-rest-api --outputDir /tmp/api-data --limit 1000

Parameters

ParameterCardinalityDescription
--config <value>RequiredREST API extraction config file (in metadata/extract/)
--outputDir <value>RequiredWhere to output CSV files
--limit <value>OptionalLimit number of records per endpoint
--parallelism <value>OptionalParallelism level for endpoint extraction. Default: available CPU cores
--incrementalOptionalOnly extract new data since last extraction. Uses incrementalField from endpoint config.
--resumeOptionalResume extraction from where a previous run failed, skipping already-extracted pages.
--outputFormat <value>OptionalOutput format: csv (default) or jsonl (JSON Lines, preserves nested structures)
--reportFormat <value>OptionalReport format: console, json, html