Expectations

Expectations allow to test if the resulting table contains the expected data.

They are defined using the expectations attribute as follows:

table:
    ...
    attributes:
       - name: id
        type: integer
        ...
    expectations:
      - expect: "is_col_value_not_unique('id') => result(0) == 1"
        failOnError: true # or false if you want to continue

The expectations attribute is a list of strings, each string representing an expectation. The expectation is defined as a string with the following format:

<query_name>(<param>*) => <condition>

<query_name> is the name of the expectation to be tested. The available expectations are defined in the expectations directory. An query is a SELECT statement run against the target table and the result is used to test the expectation.

<param> is the parameter to be passed to the expectation. It is a value that can be read as the expected type by the expectation. Parameters are separated by commas.

<condition> is the condition to be tested. It is a string that can be parsed to a boolean by the expectation and that will be used to test the expectation:

The following variables are available in the condition:

Variable	Type	Description
count	Long	The number of rows in the result
result	Seq[Any]	The first row in the result as a collection of values (one per column)
results	Seq[Seq[Any]]	The result as a collection of rows, each row as a collection of values (one per column)

Writing expectation queries

Expectation queries are mainly jinja macro templates that are used to generate SQL queries.

{% macro is_col_value_not_unique(col, table='SL_THIS') %}
    SELECT max(cnt)
    FROM (SELECT {{ col }}, count(*) as cnt FROM {{ table }}
    GROUP BY {{ col }}
    HAVING cnt > 1)
{% endmacro %}

{% macro is_row_count_to_be_between(min_value, max_value, table_name = 'SL_THIS') -%}
    select
        case
            when count(*) between {{min_value}} and {{max_value}} then 1
        else
            0
        end
    from {{table_name}}
{%- endmacro %}

{% macro col_value_count_greater_than(col, table_name='SL_THIS') %}
    SELECT {{ col }}, count(*) FROM {{ table_name }}
    GROUP BY {{ col }}
    HAVING count(*) > {{ count }}
{% endmacro %}


{% macro count_by_value(col, value, table='SL_THIS') %}
    SELECT count(*)
    FROM {{ table }}
    WHERE {{ col }} LIKE '{{ value }}'
{% endmacro %}


{% macro column_occurs(col, times, table='SL_THIS') %}
    SELECT max(cnt)
    FROM  (
        SELECT {{ col }}, count(*) as cnt FROM {{ table }}
        GROUP BY {{ col }}
        HAVING cnt == {{ times }}
    )
{% endmacro %}

Put the queries in the expectations directory and use the .j2 extension. You may organize your queries in subdirectories.

Writing expectation queries​

Writing expectation queries