Data Build Tool: dbt source Tutorial

Data Build Tool: dbt source Tutorial

 
Data Build Tool – SOURCE Tutorial
 
1.     What is dbt source?
 dbt (short for "data build tool") is a command-line tool that allows data analysts and engineers to transform and model data in the warehouse.
 One of the key features of dbt is the ability to manage your data sources effectively.
 
-         Source properties can declared in any properties.yml file in your model directory.
-         Source properties are “special properties” in that you cannot configure them in dbt_project.yml file or using config().

Using Sources:
1.     Select from source tables in your model using the {{ source () }} function
2.     Test your assumption about source data
3.     Calculate the fresheness of your source data
 
Step 2: Define Your Sources
Create a Source File: Inside your dbt project directory, navigate to your models or sources directory (this could be models/ or a dedicated sources/ directory). Create a new file named sources.yml.
 
Define the Source: You need to specify the database and the schema where your source tables exist. An example definition might look like this:
 

version: 2

version: 2

 

 

sources:

sources:

  - name: my_source

  - name: analytics

    database: my_database

    database: mysql

    schema: public

    schema: analytics

    tables:

    tables:

      - name: my_table

      - name: emp

      - name: my_other_table

      - name: dept


 
 
Step 3: Referencing Sources in Models
You can use the source function in your dbt models to reference these source tables. Create a new model, e.g., my_model.sql, in the models directory:
 
with source_data as (
    select * from {{ source('my_source', 'my_table') }}
)
 
select *
from source_data
 
This SQL model selects all records from my_table in my_source.
 
Step 4: Run Your dbt Models
Dbt run –select my_model
 
Step 5: Testing Your Sources
dbt allows you to test the validity of your sources. You can add tests in your sources.yml file. For example:
version: 2
 
sources:
  - name: my_source
    database: my_database
    schema: public
    tables:
      - name: my_table
        description: "This is a table containing user data"
        columns:
          - name: id
            tests:
              - unique
              - not_null

Step 6: Documenting Your Sources
You can provide descriptions for your sources and models in your YAML files, which can then be used to generate documentation:

version: 2
 
sources:
  - name: my_source
    description: "This source contains important data for analysis"
 
To generate the documentation, run
dbt docs generate
dbt docs serve
 
 
 
 
Questions:
1.     How to run models downstream of one source
Using source: selector
Dbt run –select source:table+
 

Post a Comment

0 Comments