Skip to main content

Starlake vs dlt (dltHub)

Starlake and dlt are both open-source data pipeline tools, but they take fundamentally different approaches.

Philosophy

Starlakedlt
ApproachDeclarative (YAML + SQL)Imperative (Python code)
ModelFull ELT platform (Extract, Load, Transform, Orchestrate)EL library (Extract, Load); delegates Transform to dbt
ConfigurationYAML files — no code requiredPython scripts with decorators and dictionaries
RuntimeMulti-engine (BigQuery, Snowflake, Spark, DuckDB, JDBC)Python process with destination-specific loaders

Data Sources

Starlakedlt
FilesCSV, JSON, XML, Parquet, fixed-widthCSV, JSON, Parquet (XML and fixed-width not native)
DatabasesJDBC extraction with incremental support100+ databases via SQLAlchemy
APIsREST API declarative source, 60+ verified connectors
StreamsKafka / Kafka Streams

Destinations

Starlakedlt
Cloud warehousesBigQuery, Snowflake, Databricks, RedshiftBigQuery, Snowflake, Databricks, Redshift, Synapse
DatabasesAny JDBC (PostgreSQL, MySQL, ClickHouse, etc.)PostgreSQL, DuckDB, ClickHouse, SQL Server, 30+ via SQLAlchemy
LocalDuckDB, filesystemDuckDB, MotherDuck, filesystem
Lake formatsDelta Lake, ParquetDelta Lake, Iceberg
OtherElasticsearch, KafkaVector databases (Weaviate, LanceDB, Qdrant)

Schema Management

Starlakedlt
DefinitionExplicit YAML schema with typed attributesAutomatic inference from data
EvolutionManual or via syncStrategy (NONE, ADD, ALL)Automatic (evolve, freeze, discard_row, discard_value)
Nested datastruct / array types in schemaAuto-flattening into child tables, variant columns
ValidationRegex-based type checking per valuePydantic models, schema contracts

Write Strategies

StrategyStarlakedlt
AppendAPPENDappend
OverwriteOVERWRITEreplace
Upsert by keyUPSERT_BY_KEYmerge + upsert strategy
Upsert by key + timestampUPSERT_BY_KEY_AND_TIMESTAMP
Partition overwriteOVERWRITE_BY_PARTITIONmerge + delete-insert strategy
Delete then insertDELETE_THEN_INSERTmerge + delete-insert strategy
SCD2SCD2merge + scd2 strategy
Adaptive (runtime)ADAPTATIVE

Transformations

Starlakedlt
SQL transformsBuilt-in: SELECT materialization, incremental modelling, variable substitution, dialect transpilation
Python transformsPySpark scripts with SL_THIS viewPre-load Python transformations on data stream
Computed columnsscript property (Spark SQL expressions)Python add_map() / add_filter()
Pre/Post hookspresql / postsql
Dependency detectionAutomatic FROM/JOIN parsing → DAGManual (via dbt or orchestrator)

Data Quality

Starlakedlt
Type validationRegex-based per value; rejected rows → audit.rejectedSchema contracts (evolve/freeze/discard)
Expectations53 built-in Jinja2 macros (completeness, validity, volume, schema, uniqueness, numeric)No built-in expectations engine
Data contractsYAML schema + expectations + failOnErrorPydantic models + schema contracts
MetricsContinuous, discrete, text profiling per column
FreshnessConfigurable warn/error thresholds

Security & Privacy

Starlakedlt
Column maskingHIDE, MD5, SHA1, SHA256, SHA512, AES, SQL expressionsPseudonymization via add_map() (manual)
Row-level securityDeclarative RLS with predicates and grants
Column-level accessaccessPolicy (BigQuery policy tags)
Table ACLDeclarative ACL with roles and grants

Orchestration

Starlakedlt
Built-inDAG generation from SQL dependenciesNone — embeds in external orchestrators
AirflowAuto-generated DAGs (Bash, Cloud Run, Dataproc, Fargate)Manual DAG with dlt calls
DagsterAuto-generated assets (Shell, Cloud Run, Dataproc, Fargate)dagster-dlt library maps sources to assets
Snowflake TasksAuto-generated native tasks
SchedulingCron, data cycles, pre-load strategies (ACK, IMPORTED, PENDING)

Testing

Starlakedlt
Unit testsBuilt-in: load tests + transform tests with DuckDBPytest fixtures + assertion helpers
Test dataCSV/JSON fixtures with _expected filesPython dicts / DuckDB local destination
ReportsJUnit XML + HTML report websiteStandard pytest output
CoverageTested vs untested domains/tables tracking
SQL transpilationAutomatic (BigQuery/Snowflake/etc. → DuckDB)N/A (Python, not SQL)

Deployment

Starlakedlt
InstallJava CLI (starlake binary)Python package (pip install dlt)
RuntimeJVM (Spark, standalone) or native enginePython process
InfrastructureOn-premise, Cloud Run, Dataproc, Fargate, SnowflakeAnywhere Python runs (Lambda, Cloud Functions, K8s, laptop)

When to Choose

Choose Starlake when:

  • You prefer declarative YAML + SQL over Python code
  • You need built-in transformations with automatic dependency resolution
  • You need comprehensive data quality (53 expectation macros, type validation, rejection routing)
  • You need built-in security (column masking, RLS, ACL)
  • You want auto-generated orchestration DAGs
  • Your sources are primarily files (CSV, JSON, XML, fixed-width) or JDBC databases
  • You work across multiple SQL engines and need dialect transpilation