DBT Seeds
1. The
dbt seed command will load CSV file located in the seeds path directory of dbt
project
2. seeds
are best suited to static data which changed frequently
3. Seeds
can referenced in downstream models by using ref function.
4. We
cannot put same csv file within dbt folder
1. Processing CSV Files Using dbt Seeds
In dbt, CSV files can be efficiently processed using the seed functionality. To do this, all your CSV files must be placed within the designated seeds directory, typically located at seeds/ in your dbt project folder. Once in the correct folder, you can run dbt seed to load these CSV files into your data warehouse as tables. This method is straightforward and ideal for static reference data or small datasets that do not change frequently.
2. Adding a New Column and Reprocessing the Same CSV File
If you need to modify an existing CSV file by
adding a new column and then reprocess it, you'll encounter an issue. dbt does
not support altering seed files in-place with new columns without additional
configuration. Attempting to re-run dbt seed on a CSV with an added column will
produce an error similar to:
Note: here we have processed emp file which contain deptno no
1054 (42S22): Unknown column 'deptno' in 'field list'
This happens because dbt treats seed tables as static and expects the schema to match exactly. To successfully reprocess with new columns, you must update the seed file, then re-run dbt seed after ensuring that your schema definitions are aligned. Alternatively, consider using incremental models or external table management for more flexible transformation workflows.
3. Removing an Existing Column and Reprocessing
When you remove a column from your CSV file
and reprocess it, dbt will import the data, but the removed column will be
present with null values in the resulting table. Essentially, the table schema
remains unchanged unless explicitly altered, so the removed column’s data will
be null for all records. To fully remove the column, you should update your
schema definition accordingly or use schema management tools to alter the table
structure post-seed.
4. Adjusting Column Lengths During CSV Processing
dbt's seed functionality allows flexibility in column size definitions. You can increase or decrease the length of string columns (e.g., VARCHAR) during processing without encountering errors, provided the data fits within the specified length. This enables you to adapt your schema as data requirements evolve without reconfiguring or reloading the entire dataset.
5. Processing CSV Files with Integer Columns Containing Leading Zeros (Without Additional Configuration)
When processing CSV files that have integer columns with leading zeros (e.g., "00123"), dbt will automatically strip the leading zeros and store the value as an integer (e.g., 123). This behavior occurs because, by default, integer columns do not preserve leading zeros, which are considered formatting rather than data.
6. Preserving Leading Zzeros in Seed Data
If you want to maintain leading zeros in your data — for example, storing "00123" as-is — you need to specify the column type as VARCHAR rather than INTEGER. This can be achieved through configuration files:
-Using dbt_project.yml:
dbt_project.yml
seeds:
your_project_name:
seed_test:
+column_types:
sal: varchar(10)
- Using properties.yml in the seed folder:
Create a properties.yml
file in your seed directory with the following content:
version: 2
seeds:
-
name: seed_test
config:
column_types:
sal: varchar(10)
This configuration instructs dbt to treat the
sal column as a string, preserving any leading zeros during processing.
7. Processing CSV Files Located Outside the
Seed Folder
If your CSV files are stored outside the
default seed directory, you can still process them by placing the files
anywhere within your dbt project folder structure. Then, specify the path in
your dbt_project.yml under the seed-paths setting. For example:
seed-paths: ["seeds","models"]
After configuring the seed paths, run dbt seed
to process all CSV files present in those directories.
8. Processing Multiple Files into Different
Schemas
To handle multiple seed files and load them
into different schemas, you need to configure your dbt_project.yml accordingly.
For example:
dbt_project.yml
name: your_project_name
version: 1.0
config-version: 2
seeds:
your_project_name:
seed_test:
+column_types:
sal: varchar(10)
market:
+schema: market
market_test:
+column_types:
msal: varchar(10)
This
setup:
- Enables seed processing.
- Loads seed_test into the default schema.
- Loads market seed into a specific schema
named market.
- Configures market_test seed with custom
column types.
Ensure that the seed files are named and
organized appropriately within your seed directories to match these
configurations.
**In summary**, leveraging dbt's seed
functionality allows for straightforward CSV processing, but managing schema
changes, data types, and multiple files requires careful configuration. By
adjusting your dbt_project.yml and schema definitions, you can tailor the seed
processing to meet your data governance and transformation needs effectively.
SNO |
Scenario |
Steps |
Status |
Comment |
1 |
Process CSV File |
Normal CSV File |
Pass |
|
Remove Existing
Column |
Failed |
NULL values
loaded in removed column |
||
Add New Column |
Failed |
Column
"new" of relation "file" doesnot exist |
||
Change Column
Data type INT to CHAR |
Pass |
Invalid input
syntax for type integer |
||
Change Column
Data type CHAR to INT |
Pass |
data will load |
||
Increase/Descrease
Column data length |
Pass |
data will load |
||
2 |
Process CSV File other than Seed folder |
|
|
|
Pleace csv file
into any dbt folder |
Pass |
configure
folder into dbt_project.yml{seed} then it will load |
||
3 |
How to process leading zero values |
|
|
|
configure
column data type char in dbt_project.yml file |
Need to check |
|
||
configure
column data type char in properties.yml file |
Pass |
|
||
4 |
Process same two csv files from different
folders |
|
Unable to
process |
since these
resources have the same name, dbt will be unable to find the correct resource |
5 |
Process two different csv files into
different schema ,tables |
Ex-
seed/marketing/market.csv |
|
|
configure
dbt_project.yml |
Pass |
|
||
configure
properties.yml |
Pass |
need to check |
||
6 |
Test seed file |
|
|
|
add file
details in schema.yml in seed folder |
Pass |
|
0 Comments
Thanks for your message.