Data Build Tool: DBT Snapshot Complete Guide | Timestamp and Check Strategies - SCD Type 2 Implementations

Data Build Tool: DBT Snapshot Complete Guide | Timestamp and Check Strategies - SCD Type 2 Implementations

What is dbt Snapshot.

 

1.  Dbt Snapshots implement SCD type-2 over mutable source tables. 

2.  DBT Snapshots are typically used to track changes in your data over time and ensure data integrity.

On First run:

·      During the first run, dbt will generate the initial snapshot table by executing a set of select statements and adding additional columns, including dbt_valid_from and dbt_valid_to. 

·      The dbt_valid_from column will contain the load timestamp, while dbt_valid_to will be null.

On Subsequent Run:

·      On subsequent runs, dbt will check If a record already exists, dbt will update the dbt_valid_to column with the current timestamp. 

·      For newly inserted records, the dbt_valid_to column will remain null.

 

What are the dbt strageies to implement SCD Type 2?


Snapshot "strategies" define how dbt knows if a row has changed. 


There are two strategies built-in to

·      Timestamp 

·      Check


Strategy: Timestamp

·      The timestamp strategy utilizes an updated_at field to identify if a row has been altered.

·      When the updated_at column for a row is more recent than the last snapshot run, dbt will invalidate the previous record and capture the updated one.

·       If the timestamps remain the same, dbt will not make any changes.

 

Configuration:


DBT Strategy: Timestamp
{% snapshot orders_snapshot %}
{{ config 
(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='id',
      strategy='timestamp',
      updated_at =’date columns',
    )
}}
select * from source_table
{% endsnapshot %}

Strategy: Check

·      The check strategy is useful for tables which do not have a reliable updated_at column. 

·      This strategy works by comparing a list of columns between their current and historical values. 

·      If any of these columns have changed, then dbt will invalidate the old record and record the new one.

·       If the column values are identical, then dbt will not take any action.

Configuration:

DBT Strategy: Check
{% snapshot orders_snapshot_check %}
 {{  config
 (
          target_database='analytics',
          target_schema='snapshots',
          unique_key='id',
          strategy='check',
          check_cols=['status', 'is_cancelled'],
        )
    }}
select * from source_table
{% endsnapshot %}

 


Post a Comment

0 Comments