Skip to main content

Configure Starlake Database Connections

Starlake supports connections to BigQuery, Snowflake, Databricks/Spark, Amazon Redshift, DuckDB, DuckLake, PostgreSQL, and any JDBC-compliant database. All connections are defined in the connections section of metadata/application.sl.yml. Each connection type has specific options for authentication, driver, and runtime settings.

Supported Connection Types

Connect Starlake to the Local File System

The local file system connection reads and writes files to disk. Files are stored in subdirectories under datasets/. Each subdirectory represents a processing stage in the data pipeline.

application:
connections:
local:
type: local

The area settings control the directory names for each stage. You can override them with environment variables.

application:
datasets: "{{root}}/datasets" # or set with the SL_DATASETS environment variable
area:
pending: "pending" # Files waiting to be loaded (SL_AREA_PENDING)
unresolved: "unresolved" # Files that do not match any pattern (SL_AREA_UNRESOLVED)
archive: "archive" # Files moved after processing (SL_AREA_ARCHIVE)
ingesting: "ingesting" # Files currently being processed (SL_AREA_INGESTING)
accepted: "accepted" # Files processed and accepted (SL_AREA_ACCEPTED)
rejected: "rejected" # Files processed and rejected (SL_AREA_REJECTED)
business: "business" # Transform task results (SL_AREA_BUSINESS)
replay: "replay" # Rejected records in original format (SL_AREA_REPLAY)
hiveDatabase: "${domain}_${area}" # Hive database name (SL_AREA_HIVE_DATABASE)

Connect Starlake to Google BigQuery

Starlake supports native and Spark BigQuery connections. The native connection is the default. To use the Spark BigQuery connector, set sparkFormat: "bigquery".

BigQuery Authentication Methods

Starlake supports three authentication methods for BigQuery:

  • APPLICATION_DEFAULT: Uses the default credentials configured in your environment (e.g., gcloud auth application-default login)
  • SERVICE_ACCOUNT_JSON_KEYFILE: Uses a JSON key file for a service account
  • ACCESS_TOKEN: Uses a direct GCP access token

BigQuery Connection Configuration

application:
connections:
bigquery:
type: "bigquery"
# Uncomment the line below to use the Spark BigQuery connector instead of the native one.
# sparkFormat: "bigquery"
options:
location: "us-central1" # EU or US or any BigQuery region
authType: "APPLICATION_DEFAULT"
authScopes: "https://www.googleapis.com/auth/cloud-platform"
# writeMethod: "direct" # Only when sparkFormat is set. "direct" or "indirect"
# temporaryGcsBucket: "bucket_name" # Only when sparkFormat is set. No "gcs://" prefix
#authType: SERVICE_ACCOUNT_JSON_KEYFILE
#jsonKeyfile: "/Users/me/.gcloud/keys/starlake-me.json"
#authType: "ACCESS_TOKEN"
#gcpAccessToken: "your-access-token"
accessPolicies: # Required when applying Column Level Security
apply: true
location: EU
taxonomy: RGPD

Connect Starlake to Snowflake

Starlake connects to Snowflake using the JDBC driver. Two authentication methods are available: user/password and OAuth SSO.

Snowflake User/Password Authentication

Use the native Snowflake JDBC driver for most use cases. The Spark Snowflake connector is only needed when exporting data to Excel, CSV, or Parquet files, or when loading data using the embedded Spark library.

application:
connections:
snowflake:
type: jdbc
# Uncomment the line below to use the Snowflake Spark connector instead of JDBC.
# sparkFormat: snowflake
options:
url: "jdbc:snowflake://{{SNOWFLAKE_ACCOUNT}}.snowflakecomputing.com"
driver: "net.snowflake.client.jdbc.SnowflakeDriver"
user: "{{SNOWFLAKE_USER}}"
password: "{{SNOWFLAKE_PASSWORD}}"
warehouse: "{{SNOWFLAKE_WAREHOUSE}}"
db: "{{SNOWFLAKE_DB}}"
keep_column_case: "off"
preActions: "alter session set TIMESTAMP_TYPE_MAPPING = 'TIMESTAMP_LTZ';ALTER SESSION SET QUOTED_IDENTIFIERS_IGNORE_CASE = true"
sfUrl: "{{SNOWFLAKE_ACCOUNT}}.snowflakecomputing.com" # Do not prefix with jdbc:snowflake://
# When sparkFormat is set, prefix keys with "sf":
# sfUrl: "jdbc:snowflake://{{SNOWFLAKE_ACCOUNT}}.snowflakecomputing.com"
# sfUser: "{{SNOWFLAKE_USER}}"
# sfPassword: "{{SNOWFLAKE_PASSWORD}}"
# sfWarehouse: "{{SNOWFLAKE_WAREHOUSE}}"
# sfDatabase: "{{SNOWFLAKE_DB}}"

Snowflake OAuth Single Sign-On (SSO)

SSO is supported via the OAuth authentication type. It is only compatible with the native Snowflake JDBC driver.

When a user authenticates through Snowflake OAuth, Starlake automatically switches to the OAuth authentication type. The user and password fields in the configuration are ignored and replaced by the token generated by Snowflake.

Step 1: Create the OAuth Security Integration in Snowflake

Log in to Snowflake with the ACCOUNTADMIN role (or another role with CREATE INTEGRATION privileges) and run:

CREATE OR REPLACE SECURITY INTEGRATION STARLAKE
TYPE = OAUTH
ENABLED = TRUE
OAUTH_CLIENT = CUSTOM
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = '<REDIRECT_URI>'
OAUTH_ISSUE_REFRESH_TOKENS = TRUE
OAUTH_REFRESH_TOKEN_VALIDITY = 7776000 -- Valid for 90 days
OAUTH_USE_SECONDARY_ROLES = 'IMPLICIT';

OAuth Redirect URI values:

  • Local development: http://localhost:8080/api/v1/auth/snowflake/callback
  • Snowflake native app: https://<account>.snowflakecomputing.com/api/v1/auth/snowflake/callback

Token validity:

  • Access token: 10 minutes
  • Refresh token: 90 days (automatically renews the access token). After 90 days, the user must reauthenticate.

Step 2: Retrieve Integration Credentials

Run the following SQL to extract the account name, client ID, and client secret:

WITH SECURITY_INTEGRATION as (
SELECT PARSE_JSON(SYSTEM$SHOW_OAUTH_CLIENT_SECRETS('STARLAKE')) as OAUTH_CLIENT_SECRETS
)
select
CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME() as account,
OAUTH_CLIENT_SECRETS:"OAUTH_CLIENT_ID"::string as client_id,
OAUTH_CLIENT_SECRETS:"OAUTH_CLIENT_SECRET"::string as client_secret
from
SECURITY_INTEGRATION;

Step 3: Configure Starlake UI

In the Starlake UI, log in as an admin and navigate to Admin > Snowflake SSO. Enter the Account, Client ID, and Client Secret values from Step 2.

Connect Starlake to Spark and Databricks

Spark connections support three write formats: Parquet, Delta, and Iceberg. Each format requires specific Spark extensions.

application:
defaultWriteFormat: parquet
connections:
spark:
type: "spark"
options:
# any spark configuration can be set here

Connect Starlake to Amazon Redshift

Use the Redshift JDBC driver. If running on Spark or Databricks, uncomment the sparkFormat line.

application:
connections:
redshift:
type: jdbc
# Uncomment the line below if running on Spark or Databricks
# sparkFormat: "io.github.spark_redshift_community.spark.redshift" # Use "redshift" on Databricks
options:
url: "jdbc:redshift://account.region.redshift.amazonaws.com:5439/database"
driver: com.amazon.redshift.Driver
password: "{{REDSHIFT_PASSWORD}}"
tempdir: "s3a://bucketName/data"
tempdir_region: "eu-central-1" # Required only when running from outside AWS
aws_iam_role: "arn:aws:iam::aws_count_id:role/role_name"

Connect Starlake to DuckDB and DuckLake

DuckDB is a lightweight, in-process SQL OLAP database. Starlake connects to DuckDB via JDBC. DuckLake extends DuckDB with a metadata catalog layer.

application:
connections:
duckdb:
type: jdbc
options:
url: "jdbc:duckdb:{{DUCKDB_PATH}}"
driver: "org.duckdb.DuckDBDriver"
user: "{{DATABASE_USER}}"
password: "{{DATABASE_PASSWORD}}"
## Uncomment and customize the line below for DuckLake
# preActions: "ATTACH IF NOT EXISTS 'ducklake:metadata.ducklake' As my_ducklake (DATA_PATH 'file_path/');USE my_ducklake;"

For DuckLake, the preActions setting attaches the DuckLake metadata catalog before running any queries.

Connect Starlake to Any JDBC Database

Starlake connects to any JDBC-compliant database (PostgreSQL, MySQL, etc.) using the generic jdbc connection type. The example below uses PostgreSQL.

application:
connectionRef: "postgresql"
connections:
postgresql:
type: jdbc
# Uncomment the line below to use the Spark JDBC connector
# sparkFormat: jdbc
options:
url: "jdbc:postgresql://{{POSTGRES_HOST}}:{{POSTGRES_PORT}}/{{POSTGRES_DATABASE}}"
driver: "org.postgresql.Driver"
user: "{{DATABASE_USER}}"
password: "{{DATABASE_PASSWORD}}"
quoteIdentifiers: false

Replace the connection name, URL, and driver class to connect to any JDBC-compliant database.

Frequently Asked Questions

How do I configure a BigQuery connection in Starlake?

Define a connection of type bigquery in the connections section of application.sl.yml. Set the location, authType (APPLICATION_DEFAULT, SERVICE_ACCOUNT_JSON_KEYFILE, or ACCESS_TOKEN), and optionally enable the Spark BigQuery connector by setting sparkFormat to bigquery.

Does Starlake support Snowflake OAuth (SSO)?

Yes, Starlake supports Snowflake OAuth for Single Sign-On. You need to create an OAuth security integration in Snowflake, then configure the account, client ID, and client secret in the Starlake UI admin page under the Snowflake SSO tab.

Which databases can Starlake connect to?

Starlake supports BigQuery, Snowflake, Databricks/Spark, Amazon Redshift, DuckDB, DuckLake, PostgreSQL, and any JDBC-compliant database. Connections are configured in the application.sl.yml file.

Can I use Starlake with DuckDB or DuckLake?

Yes, Starlake supports DuckDB as a JDBC connection. Configure it with the DuckDB JDBC URL and driver in application.sl.yml. DuckLake is also supported by adding a preActions setting to attach the DuckLake metadata catalog.

What BigQuery authentication methods does Starlake support?

Starlake supports three methods: APPLICATION_DEFAULT (default credentials), SERVICE_ACCOUNT_JSON_KEYFILE (JSON key file), and ACCESS_TOKEN (direct GCP access token).

How do I configure a Redshift connection in Starlake?

Use a JDBC connection with url: "jdbc:redshift://<account>.<region>.redshift.amazonaws.com:5439/<database>" and driver: com.amazon.redshift.Driver. Provide the password, a temporary S3 bucket, and an IAM role ARN.