Load Fixed-Width Positional Files into Your Data Warehouse

Q: How does Starlake load fixed-width (positional) files?

Starlake uses the FIXED format and position properties (first and last) on each attribute to define character positions. No delimiter is needed.

Q: How do I infer the schema of a fixed-width file?

Create a reference file with one field per line in name:value format. Run 'starlake infer-schema --input --format FIXED --outputDir '. Starlake detects column widths from the value lengths.

Q: What does the position property look like in the YAML schema?

Each attribute has a position block with first (0-based start index) and last (0-based end index, inclusive). Example: first: 0, last: 4 reads the first 5 characters.

Q: Are positions zero-indexed in Starlake?

Yes. The first and last values are 0-based. A column spanning characters 1 to 5 has first: 0 and last: 4.

Starlake loads fixed-width (positional) files into BigQuery, Snowflake, Databricks, DuckDB, and other warehouses. Column boundaries are defined using zero-based first and last position properties in YAML -- no delimiter is needed. Schema inference is supported via a reference file. This format is commonly used for mainframe COBOL file exports and legacy system data.

For delimiter-separated files, see the CSV/DSV documentation. For zero-config loading, see autoload.

Fixed-Width File Format

In fixed-width files, each column occupies a fixed number of characters. There is no delimiter between columns. You must know the width of each column to parse the file.

Example fixed-width file representing orders:

  62024-02-05T21:19:15.454ZCancelled
  72024-02-05T21:19:15.454ZCancelled
  82024-02-05T21:19:15.454ZDelivered

Column layout for this file:

Column	Width	Characters
`order_id`	5	0-4
`customer_id`	5	5-9
`timestamp`	24	10-33
`status`	10	34-43

Infer Schema from a Fixed-Width Reference File

To infer the schema, create a reference file where each field appears on a separate line in name:value format. The value must have the exact width of the column (including padding spaces).

Reference file for schema inference
order_id:00001
customer_id:    6
timestamp:2024-02-05T21:19:15.454Z
status:Cancelled

Run the infer-schema command with the --format FIXED option:

starlake infer-schema --input /my/path/fixed_width_file.txt --format FIXED --outputDir $SL_ROOT/metadata/load/starbake

The schema is saved in the directory specified by --outputDir.

Column Position Configuration (first / last)

The generated YAML defines column positions using position.first and position.last properties. Both values are 0-based and inclusive.

metadata/load/<domain>/<table>.sl.yml
table:
    metadata:
      format: FIXED
      writeStrategy:
        type: APPEND
    ...
    attributes:
      - name: order_id
        type: string
        position:  # first 5 characters
          first: 0
          last: 4
      - name: customer_id
        type: string
        position: # next 5 characters
          first: 5
          last: 9
      - name: timestamp
        type: string
        position: # next 24 characters
          first: 10
          last: 33
      - name: status
        type: string
        position: # next 10 characters
          first: 34
          last: 43

A column spanning the first 5 characters has first: 0 and last: 4. The inferred schema defaults all columns to string -- edit the file to set appropriate types (int, date, timestamp, etc.).

Additional Parsing Options

The following options from CSV loading also apply to fixed-width files:

encoding: File encoding (default UTF-8)
trim: Trim strategy per attribute (None, Left, Right, Both)
filter: Condition/regex to filter lines

Run the Load Command

After reviewing and customizing the schema, load the data:

starlake load

Starlake parses each line using the defined column positions, validates records against the schema, and loads valid records into the target table. Rejected records are stored in the audit.rejected table.

Use Case: Mainframe COBOL File Exports

Fixed-width is the standard format for mainframe and COBOL system exports. To load these files:

Map each field from your COBOL copybook to a position.first / position.last pair in the YAML schema.
Set appropriate types for numeric fields (int, long, double).
Use trim: Both on string fields to remove padding spaces.

All Starlake write strategies work with fixed-width files: APPEND, OVERWRITE, OVERWRITE_BY_PARTITION, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, and SCD2.

Frequently Asked Questions

How does Starlake load fixed-width (positional) files?

Starlake uses the FIXED format and position properties (first and last) on each attribute to define character positions. No delimiter is needed.

How do I infer the schema of a fixed-width file?

Create a reference file with one field per line in name:value format. Run starlake infer-schema --input <file> --format FIXED --outputDir <dir>. Starlake detects column widths from the value lengths.

What does the position property look like in the YAML schema?

Each attribute has a position block with first (0-based start index) and last (0-based end index, inclusive). Example: first: 0, last: 4 reads the first 5 characters.

Are positions zero-indexed in Starlake?

Yes. The first and last values are 0-based. A column spanning characters 1 to 5 has first: 0 and last: 4.

Can I use fixed-width loading for mainframe COBOL file exports?

Yes. Fixed-width is the standard format for mainframe exports. Define column positions matching your COBOL copybook layout.

What write strategies work with fixed-width files?

All Starlake write strategies (APPEND, OVERWRITE, UPSERT_BY_KEY, SCD2, etc.) work with fixed-width files, configured the same way as for CSV or JSON.

Can I change column types after inferring a fixed-width schema?

Yes. The inferred schema defaults all columns to string. Edit the .sl.yml file to set appropriate types (int, date, timestamp, etc.).

Fixed-Width File Format​

Infer Schema from a Fixed-Width Reference File​

Column Position Configuration (first / last)​

Additional Parsing Options​

Run the Load Command​

Use Case: Mainframe COBOL File Exports​

Frequently Asked Questions​

How does Starlake load fixed-width (positional) files?​

How do I infer the schema of a fixed-width file?​

What does the position property look like in the YAML schema?​

Are positions zero-indexed in Starlake?​

Can I use fixed-width loading for mainframe COBOL file exports?​

What write strategies work with fixed-width files?​

Can I change column types after inferring a fixed-width schema?​