extract-rest-data
Synopsis
starlake extract-rest-data [options]
Description
Extract data from REST API endpoints into CSV files. Supports pagination (offset, cursor, link header, page number), authentication (bearer, API key, basic, OAuth2), rate limiting, and parent-child endpoint relationships.
The extracted CSV files can then be ingested using starlake load.
Examples
starlake.sh extract-rest-data --config my-rest-api --outputDir /tmp/api-data starlake.sh extract-rest-data --config my-rest-api --outputDir /tmp/api-data --limit 1000
Parameters
| Parameter | Cardinality | Description |
|---|---|---|
--config <value> | Required | REST API extraction config file (in metadata/extract/) |
--outputDir <value> | Required | Where to output CSV files |
--limit <value> | Optional | Limit number of records per endpoint |
--parallelism <value> | Optional | Parallelism level for endpoint extraction. Default: available CPU cores |
| --incremental | Optional | Only extract new data since last extraction. Uses incrementalField from endpoint config. |
| --resume | Optional | Resume extraction from where a previous run failed, skipping already-extracted pages. |
--outputFormat <value> | Optional | Output format: csv (default) or jsonl (JSON Lines, preserves nested structures) |
--reportFormat <value> | Optional | Report format: console, json, html |