What is
dbt Snapshot.
1.
Dbt Snapshots implement SCD type-2 over mutable
source tables.
2.
DBT Snapshots are typically used to track changes
in your data over time and ensure data integrity.
On First run:
· During
the first run, dbt will generate the initial snapshot table by executing a set
of select statements and adding additional columns, including dbt_valid_from
and dbt_valid_to.
· The
dbt_valid_from column will contain the load timestamp, while dbt_valid_to will
be null.
On
Subsequent Run:
· On
subsequent runs, dbt will check If a record already exists, dbt will update the
dbt_valid_to column with the current timestamp.
· For
newly inserted records, the dbt_valid_to column will remain null.
What are
the dbt strageies to implement SCD Type 2?
Snapshot "strategies" define how dbt knows if a row has
changed.
There are two strategies built-in to
· Timestamp
· Check
Strategy: Timestamp
· The
timestamp strategy utilizes an updated_at field to identify if a row has been
altered.
· When
the updated_at column for a row is more recent than the last snapshot run, dbt
will invalidate the previous record and capture the updated one.
· If
the timestamps remain the same, dbt will not make any changes.
Configuration:
DBT Strategy: Timestamp
{% snapshot orders_snapshot %}
{{ config
(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at =’date columns',
)
}}
select * from source_table
{% endsnapshot %}
Strategy: Check
· The
check strategy is useful for tables which do not have a reliable updated_at
column.
· This
strategy works by comparing a list of columns between their current and
historical values.
· If any
of these columns have changed, then dbt will invalidate the old record and
record the new one.
· If
the column values are identical, then dbt will not take any action.
Configuration:
DBT Strategy: Check
{% snapshot orders_snapshot_check %}
{{ config
(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols=['status', 'is_cancelled'],
)
}}
select * from source_table
{% endsnapshot %}
0 Comments
Thanks for your message.