Skip to main content

Starlake vs Fivetran

Starlake and Fivetran both move data into cloud warehouses, but they differ in scope, openness, and how much of the pipeline they own.

Philosophy

StarlakeFivetran
ApproachDeclarative (YAML + SQL)Managed connectors (UI + REST/Terraform)
ModelFull ELT platform (Extract, Load, Transform, Orchestrate)EL platform (Extract, Load); Transform via Fivetran Transformations (dbt Core)
ConfigurationYAML files — no code requiredUI-driven, REST API, or Terraform provider
RuntimeMulti-engine (BigQuery, Snowflake, Spark, DuckDB, JDBC)Fivetran-managed cloud (or Hybrid Deployment agent in your VPC)
LicenseOpen sourceProprietary SaaS

Data Sources

StarlakeFivetran
ConnectorsFiles, JDBC databases, REST APIs, Kafka500+ pre-built connectors (SaaS, APIs, databases, files, events)
FilesCSV, JSON, XML, Parquet, fixed-widthCSV, JSON, Parquet, Avro, XLSX (via S3/GCS/Azure/SFTP/Box/Dropbox)
DatabasesJDBC extraction with incremental supportLog-based CDC for Postgres, MySQL, Oracle, SQL Server, MongoDB, etc.
APIsREST API extraction (any JSON/XML API) with auth, pagination, rate limiting, incremental supportSaaS connectors (Salesforce, HubSpot, NetSuite, Workday, Stripe, etc.)
StreamsKafka / Kafka StreamsKafka, Kinesis, Confluent Cloud
Custom sourcesOpenAPI schema extraction for automatic table generationConnector SDK (Python) and Cloud Functions connectors

Destinations

StarlakeFivetran
Cloud warehousesBigQuery, Snowflake, Databricks, RedshiftBigQuery, Snowflake, Databricks, Redshift, Synapse
DatabasesAny JDBC (PostgreSQL, MySQL, ClickHouse, etc.)PostgreSQL, MySQL, SQL Server, MariaDB, Panoply
LocalDuckDB, filesystem
Lake formatsDelta Lake, ParquetIceberg, Delta Lake (via Databricks/Managed Iceberg)
OtherElasticsearch, KafkaS3, ADLS, GCS (data lake destinations)

Schema Management

StarlakeFivetran
DefinitionExplicit YAML schema with typed attributesAutomatic inference from source schema
EvolutionManual or via syncStrategy (NONE, ADD, ALL)Automatic schema drift handling (allow, block, or unblock per column)
Nested datastruct / array types in schemaJSON columns or auto-flattening (destination-dependent)
ValidationRegex-based type checking per valueSource-driven type mapping

Write Strategies

StrategyStarlakeFivetran
AppendAPPENDHistory Mode (append-only with _fivetran_active)
OverwriteOVERWRITERe-sync (full refresh)
Upsert by keyUPSERT_BY_KEYDefault — primary-key-based merge
Upsert by key + timestampUPSERT_BY_KEY_AND_TIMESTAMPCDC with _fivetran_synced cursor
Partition overwriteOVERWRITE_BY_PARTITION
Delete then insertDELETE_THEN_INSERT
SCD2SCD2History Mode (Type 2 change tracking)
Adaptive (runtime)ADAPTATIVE

Transformations

StarlakeFivetran
SQL transformsBuilt-in: SELECT materialization, incremental modelling, variable substitution, dialect transpilationFivetran Transformations (dbt Core, scheduled or integrated)
Python transformsPySpark scripts with SL_THIS view
Computed columnsscript property (Spark SQL expressions)
Pre/Post hookspresql / postsqldbt pre/post hooks
Dependency detectionAutomatic FROM/JOIN parsing → DAGVia dbt manifest
Pre-built modelsQuickstart Data Models (analytics-ready dbt packages per connector)

Data Quality

StarlakeFivetran
Type validationRegex-based per value; rejected rows → audit.rejectedSource-driven type coercion
Expectations53 built-in Jinja2 macros (completeness, validity, volume, schema, uniqueness, numeric)dbt tests (via Transformations)
Data contractsYAML schema + expectations + failOnError
MetricsContinuous, discrete, text profiling per columnSync-level metrics (MAR, rows, volume)
FreshnessConfigurable warn/error thresholdsConnector-level scheduling, alerting, and Fivetran Platform Connector for monitoring

Security & Privacy

StarlakeFivetran
Column maskingHIDE, MD5, SHA1, SHA256, SHA512, AES, SQL expressionsColumn hashing and column blocking per source
Row-level securityDeclarative RLS with predicates and grants
Column-level accessaccessPolicy (BigQuery policy tags)Column blocking (exclude from sync)
Table ACLDeclarative ACL with roles and grantsRBAC at workspace/connector level
SecretsEnvironment variablesManaged secrets, PrivateLink, customer-managed keys
ComplianceSelf-managedSOC 2 Type II, HIPAA, GDPR, ISO 27001, PCI DSS
Data residencyWherever you deployMulti-region (US, EU, AU, APAC, Hybrid Deployment in your VPC)

Orchestration

StarlakeFivetran
Built-inDAG generation from SQL dependenciesBuilt-in scheduler (15-minute to 24-hour sync frequencies)
AirflowAuto-generated DAGs (Bash, Cloud Run, Dataproc, Fargate)Airflow provider (airflow-provider-fivetran)
DagsterAuto-generated assets (Shell, Cloud Run, Dataproc, Fargate)dagster-fivetran integration
Snowflake TasksAuto-generated native tasks
API triggersREST API to trigger syncs and manage connectors

Testing

StarlakeFivetran
Unit testsBuilt-in: load tests + transform tests with DuckDBdbt tests (within Transformations)
Test dataCSV/JSON fixtures with _expected files
ReportsJUnit XML + HTML report websiteSync logs, dashboards, Fivetran Platform Connector
CoverageTested vs untested domains/tables tracking
SQL transpilationAutomatic (BigQuery/Snowflake/etc. → DuckDB)N/A

Deployment

StarlakeFivetran
InstallJava CLI (starlake binary)SaaS account (Free, Standard, Enterprise, Business Critical)
Runtimenative engine or SparkFivetran-managed
InfrastructureOn-premise, Cloud Run, Dataproc, Fargate, SnowflakeFivetran Cloud, or Hybrid Deployment (agent in customer VPC)
Managed offeringFully managed SaaS
PricingFree (open source)Consumption-based (Monthly Active Rows)