Data Build Tool: dbt Testing complete Guide

Data Build Tool: dbt Testing complete Guide

dbt Testing complete Guide

 

What is dbt Testing:

Data tests in your dbt project are assertions made about your models and other resources, such as sources, seeds, and snapshots. When running a dbt test, you will be notified of any passing or failing tests within your project. These tests can help improve the integrity of the SQL within each model by making assertions about the results it produces.

Several common generic tests include checking for duplicates, ensuring uniqueness in columns, verifying that there are no null values, testing referential integrity with parent tables, confirming data freshness based on a defined timestamp, and ensuring that columns only contain accepted or default values.

 The most common generic tests include:

 1. Duplicates – Column don’t have any duplicates (repeated) values and null values

2. Uniqueness - Column has no repeating values

3. Not Null – There are no null values in a column

4. Referential Integrity - Test that the column values having an existing relationship with a parent table

5.  Freshness: data should be freshness SLA based on a predefined timestamp

6. Accepted (Default) values – Check that column contain only pre-defined values

 

Example:

An employee table having empno, ename, sal, loc

-         empno is primary key and should not contain duplicates and null values

-         sal column not contain 0

-         loc column is accepted only default values like (INDIA,USA )

Data tests produce a collection of failed records. Test blocks are used to define generic data tests, which were previously known as schema tests.

There are two methods for defining data tests in dbt: singular data test and generic data test..

1.     Singular data test

2.     Generic data test

 Singular data test:

A singular data test involves a straightforward SQL query that identifies failing rows. This query can be saved in a .sql file within the test directory or model.

Sql file name emp_sal_zero.sal

 {{ config(materialized='table') }}

with emp_test as(

    select totalsal from analytics.emp_fact

    where totalsal <=0

)  select * from emp_test

To run an SQL file, use the following command:

If the .sql file is in the model directory:

dbt run –select emp_sal_zero

 If the .sql file is in the test directory:

dbt test

 When running the above command, it will create a table called emp_sal_zero and store the results in the table..

 Generic Data test:

 A generic data test defined two ways to test .

In the model directory, create a .yml file and specify unique, not null, and accepted values for each column. You can define any number of columns in the file.

 Example below

version: 2

 

models:

  - name: emp_fact

    database: mysql

    schema: analytics

    description: "Employee Table"

    columns:

      - name: empno

        description: "The primary key for this table"

        tests:

          - unique

          - not_null

      - name: sal_status

        description: "Checking Default values"

        tests:

          - accepted_values:

              values: ['lo','me','hg']

You may also use a predefined generic model like a macro in the .yml file along with the above steps. 

      - name: sal_status

        description: "Checking Default values"

        tests:

          - dbt_utils.not_accepted_values:

              values: ['lo','me','hg']

Some data tests are generic and can be reused multiple times. These tests are defined in a test block, which includes a parametrized query and accepts arguments. An example of a generic data test may look like:

 .sql file name assert_negative_value.sql

 {% test assert_negative_value(model, column_name) %}

     select {{ column_name }}

    from {{ model }}

    where {{ column_name }} is null

 {% endtest %}

 Add   assert_negative_value.sql file name into .yml as below

      - name: columnname

        description: "To Test negative value in sal column"

        tests:

          - assert_negative_value

 

The tests to run can be selected using the --select flag discussed

# run tests for one_specific_model

dbt test --select "one_specific_model"

 # run tests for all models in package

dbt test --select "some_package.*"

 # run only tests defined singularly

dbt test --select "test_type:singular"

 # run only tests defined generically

dbt test --select "test_type:generic"

 # run singular tests limited to one_specific_model

dbt test --select "one_specific_model,test_type:singular"

# run generic tests limited to one_specific_model

dbt test --select "one_specific_model,test_type:generic"

 

 dbt tests tutorial

 


Post a Comment

0 Comments