Load Fixed-Width Positional Files into Your Data Warehouse
Starlake loads fixed-width (positional) files into BigQuery, Snowflake, Databricks, DuckDB, and other warehouses. Column boundaries are defined using zero-based first and last position properties in YAML -- no delimiter is needed. Schema inference is supported via a reference file. This format is commonly used for mainframe COBOL file exports and legacy system data.
For delimiter-separated files, see the CSV/DSV documentation. For zero-config loading, see autoload.
Fixed-Width File Format
In fixed-width files, each column occupies a fixed number of characters. There is no delimiter between columns. You must know the width of each column to parse the file.
Example fixed-width file representing orders:
00001 62024-02-05T21:19:15.454ZCancelled
00002 72024-02-05T21:19:15.454ZCancelled
00003 82024-02-05T21:19:15.454ZDelivered
Column layout for this file:
| Column | Width | Characters |
|---|---|---|
order_id | 5 | 0-4 |
customer_id | 5 | 5-9 |
timestamp | 24 | 10-33 |
status | 10 | 34-43 |
Infer Schema from a Fixed-Width Reference File
To infer the schema, create a reference file where each field appears on a separate line in name:value format. The value must have the exact width of the column (including padding spaces).
order_id:00001
customer_id: 6
timestamp:2024-02-05T21:19:15.454Z
status:Cancelled
Run the infer-schema command with the --format FIXED option:
starlake infer-schema --input /my/path/fixed_width_file.txt --format FIXED --outputDir $SL_ROOT/metadata/load/starbake
The schema is saved in the directory specified by --outputDir.
Column Position Configuration (first / last)
The generated YAML defines column positions using position.first and position.last properties. Both values are 0-based and inclusive.
table:
metadata:
format: FIXED
writeStrategy:
type: APPEND
...
attributes:
- name: order_id
type: string
position: # first 5 characters
first: 0
last: 4
- name: customer_id
type: string
position: # next 5 characters
first: 5
last: 9
- name: timestamp
type: string
position: # next 24 characters
first: 10
last: 33
- name: status
type: string
position: # next 10 characters
first: 34
last: 43
A column spanning the first 5 characters has first: 0 and last: 4. The inferred schema defaults all columns to string -- edit the file to set appropriate types (int, date, timestamp, etc.).
Additional Parsing Options
The following options from CSV loading also apply to fixed-width files:
encoding: File encoding (defaultUTF-8)trim: Trim strategy per attribute (None,Left,Right,Both)filter: Condition/regex to filter lines
Run the Load Command
After reviewing and customizing the schema, load the data:
starlake load
Starlake parses each line using the defined column positions, validates records against the schema, and loads valid records into the target table. Rejected records are stored in the audit.rejected table.
Use Case: Mainframe COBOL File Exports
Fixed-width is the standard format for mainframe and COBOL system exports. To load these files:
- Map each field from your COBOL copybook to a
position.first/position.lastpair in the YAML schema. - Set appropriate types for numeric fields (
int,long,double). - Use
trim: Bothon string fields to remove padding spaces.
All Starlake write strategies work with fixed-width files: APPEND, OVERWRITE, OVERWRITE_BY_PARTITION, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, and SCD2.
Frequently Asked Questions
How does Starlake load fixed-width (positional) files?
Starlake uses the FIXED format and position properties (first and last) on each attribute to define character positions. No delimiter is needed.
How do I infer the schema of a fixed-width file?
Create a reference file with one field per line in name:value format. Run starlake infer-schema --input <file> --format FIXED --outputDir <dir>. Starlake detects column widths from the value lengths.
What does the position property look like in the YAML schema?
Each attribute has a position block with first (0-based start index) and last (0-based end index, inclusive). Example: first: 0, last: 4 reads the first 5 characters.
Are positions zero-indexed in Starlake?
Yes. The first and last values are 0-based. A column spanning characters 1 to 5 has first: 0 and last: 4.
Can I use fixed-width loading for mainframe COBOL file exports?
Yes. Fixed-width is the standard format for mainframe exports. Define column positions matching your COBOL copybook layout.
What write strategies work with fixed-width files?
All Starlake write strategies (APPEND, OVERWRITE, UPSERT_BY_KEY, SCD2, etc.) work with fixed-width files, configured the same way as for CSV or JSON.
Can I change column types after inferring a fixed-width schema?
Yes. The inferred schema defaults all columns to string. Edit the .sl.yml file to set appropriate types (int, date, timestamp, etc.).