Tutorial

Extract tables in one shot or incrementally from a database as a set of files.

Prerequisites

Install duckdb
Download the sample duckdb.db database and store it in the datasets folder in your project directory $SL_ROOT/datasets

This will bring in the following database schema:

$ duckdb $SL_ROOT/datasets/duckdb.db
v0.10.0 20b1486d11
Enter ".help" for usage hints.
D show;
┌──────────┬──────────┬────────────┬────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────┬───────────┐
│ database │  schema  │    name    │                      column_names                      │                    column_types                     │ temporary │
│ varchar  │ varchar  │  varchar   │                       varchar[]                        │                      varchar[]                      │  boolean  │
├──────────┼──────────┼────────────┼────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────┼───────────┤
│ duckdb   │ starbake │ order      │ [customer_id, order_id, status, timestamp]             │ [BIGINT, BIGINT, VARCHAR, TIMESTAMP]                │ false     │
│ duckdb   │ starbake │ order_line │ [order_id, product_id, quantity, sale_price]           │ [BIGINT, BIGINT, BIGINT, DOUBLE]                    │ false     │
│ duckdb   │ starbake │ product    │ [category, cost, description, name, price, product_id] │ [VARCHAR, DOUBLE, VARCHAR, VARCHAR, DOUBLE, BIGINT] │ false     │
└──────────┴──────────┴────────────┴────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┴───────────┘
D .quit
$

Extract Data

Extracting Data involves the following steps:

Configure the connection to the source database
Select the database schema to extract
Select the tables to extract
Select the columns to extract (optional)
Select the rows to extract (optional)

Configure the connection to the source database

In the project folder, under the metadata folder, edit the file application.sl.yml and set the connection parameters to the source database.

metadata/application.sl.yml
version: 1
application:
  ...
  connections:
    duckdb:
      type: "jdbc" # Connection to DuckDB
      options:
        url: "jdbc:duckdb:{{SL_ROOT}}/datasets/duckdb.db" # Location of the DuckDB database
        driver: "org.duckdb.DuckDBDriver"

  ...

In the metadata/extract folder, create a new file my_extract_config.sl.yml and set the connectionRef to the connection defined in the application.sl.yml file.

metadata/extract/my_extract_config.sl.yml
version: 1
extract:
  connectionRef: "duckdb" # The database connection to use
  jdbcSchemas:
    - schema: "starbake"
      tables:
        - name: "*"               # table names or  "*" to extract all tables
      tableTypes:                 # (optional)  table types to extract
        - "TABLE"
        #- "VIEW"
        #- "SYSTEM TABLE"
        #- "GLOBAL TEMPORARY"
        #- "LOCAL TEMPORARY"
        #- "ALIAS"
        #- "SYNONYM"

That's it! We are ready to extract the data from the database.

Extract the data

$ cd $SL_ROOT
$ starlake extract-data --config my_extract_config --outputDir $SL_ROOT/incoming/

The $SL_ROOT/incoming/starbake folder will contain the extracted data in CSV format.

You are now ready to load the data into the datawarehouse of your choice using the starlake load command.

Extract Schema (Optional)

The command below generates a table schema description file in the metadata/load directory and allows you to load the data into the datawarehouse of your choice, using starlake load command instead of going through the infer-schema feature starlake provides.

$ cd $SL_ROOT
$ starlake extract-schema --config my_extract_config # extract description

Prerequisites​

Extract Data​

Configure the connection to the source database​

Extract the data​

Extract Schema (Optional)​

Prerequisites

Extract Data

Configure the connection to the source database

Extract the data

Extract Schema (Optional)