dbt Cheat Sheet
Overview
dbt (data build tool) is an open-source command-line tool that enables analytics engineers to transform data in their warehouse by writing SQL select statements. Rather than writing boilerplate DDL/DML, engineers define models as select statements and dbt handles materialization, dependency management, testing, and documentation. dbt follows the ELT pattern where raw data is first loaded into the warehouse, then transformed in place.
dbt supports all major data warehouses including Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, and more through adapters. It brings software engineering practices to analytics work including version control, testing, CI/CD, documentation generation, and modular code organization. dbt Cloud provides a hosted IDE and scheduling, while dbt Core is the free open-source CLI. The tool has become the standard for the “T” in ELT pipelines.
Installation
Install dbt Core
# Install dbt with your warehouse adapter
pip install dbt-core dbt-snowflake # Snowflake
pip install dbt-core dbt-bigquery # BigQuery
pip install dbt-core dbt-redshift # Redshift
pip install dbt-core dbt-postgres # PostgreSQL
pip install dbt-core dbt-databricks # Databricks
pip install dbt-core dbt-duckdb # DuckDB
# Verify installation
dbt --version
Initialize a Project
# Create a new dbt project
dbt init my_project
# Navigate to project
cd my_project
# Test warehouse connection
dbt debug
Configure profiles.yml
# ~/.dbt/profiles.yml
my_project:
target: dev
outputs:
dev:
type: snowflake
account: my_account.us-east-1
user: "{{ env_var('DBT_USER') }}"
password: "{{ env_var('DBT_PASSWORD') }}"
role: TRANSFORMER
database: ANALYTICS
warehouse: TRANSFORM_WH
schema: DEV_{{ env_var('DBT_USER') }}
threads: 4
prod:
type: snowflake
account: my_account.us-east-1
user: "{{ env_var('DBT_PROD_USER') }}"
password: "{{ env_var('DBT_PROD_PASSWORD') }}"
role: TRANSFORMER
database: ANALYTICS
warehouse: TRANSFORM_WH
schema: PROD
threads: 8
Core Commands
| Command | Description |
|---|---|
dbt run | Execute all models |
dbt run --select model_name | Run a specific model |
dbt run --select +model_name | Run a model and all upstream dependencies |
dbt run --select model_name+ | Run a model and all downstream dependents |
dbt run --select tag:daily | Run models with a specific tag |
dbt run --full-refresh | Rebuild incremental models from scratch |
dbt test | Run all tests |
dbt test --select model_name | Run tests for a specific model |
dbt build | Run models, tests, snapshots, and seeds in order |
dbt compile | Compile SQL without executing |
dbt seed | Load CSV files into the warehouse |
dbt snapshot | Run snapshot models (SCD Type-2) |
dbt docs generate | Generate documentation |
dbt docs serve | Serve documentation site locally |
dbt clean | Delete compiled files and packages |
dbt deps | Install dbt packages from packages.yml |
dbt source freshness | Check source data freshness |
dbt debug | Test connection and show config |
dbt ls --select model_name+ | List models matching selector |
dbt retry | Retry the last failed run |
Model Definition
Staging Model
-- models/staging/stg_orders.sql
with source as (
select * from {{ source('raw', 'orders') }}
),
renamed as (
select
id as order_id,
user_id as customer_id,
order_date,
status as order_status,
amount as order_amount,
_loaded_at as loaded_at
from source
where order_date is not null
)
select * from renamed
Intermediate Model
-- models/intermediate/int_orders_enriched.sql
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('stg_customers') }}
),
enriched as (
select
o.order_id,
o.customer_id,
c.customer_name,
c.customer_segment,
o.order_date,
o.order_status,
o.order_amount
from orders o
left join customers c on o.customer_id = c.customer_id
)
select * from enriched
Mart Model (Incremental)
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
on_schema_change='sync_all_columns',
cluster_by=['order_date']
)
}}
with orders as (
select * from {{ ref('int_orders_enriched') }}
{% if is_incremental() %}
where order_date > (select max(order_date) from {{ this }})
{% endif %}
)
select
order_id,
customer_id,
customer_name,
customer_segment,
order_date,
order_status,
order_amount,
current_timestamp() as dbt_updated_at
from orders
Configuration
dbt_project.yml
name: 'my_project'
version: '1.0.0'
config-version: 2
profile: 'my_project'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
models:
my_project:
staging:
+materialized: view
+schema: staging
intermediate:
+materialized: ephemeral
marts:
+materialized: table
+schema: marts
finance:
+tags: ['finance', 'daily']
marketing:
+tags: ['marketing', 'hourly']
seeds:
my_project:
+schema: seeds
vars:
start_date: '2023-01-01'
default_currency: 'USD'
Schema and Tests (YAML)
# models/staging/_schema.yml
version: 2
sources:
- name: raw
database: RAW_DB
schema: PUBLIC
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _loaded_at
tables:
- name: orders
columns:
- name: id
tests:
- unique
- not_null
- name: customers
models:
- name: stg_orders
description: "Cleaned and renamed orders from source"
columns:
- name: order_id
description: "Primary key"
tests:
- unique
- not_null
- name: customer_id
tests:
- not_null
- relationships:
to: ref('stg_customers')
field: customer_id
- name: order_amount
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 100000
Advanced Usage
Macros
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
round({{ column_name }} / 100.0, 2)
{% endmacro %}
-- Usage in model:
-- select {{ cents_to_dollars('amount_cents') }} as amount_dollars
Custom Generic Tests
-- tests/generic/test_positive_value.sql
{% test positive_value(model, column_name) %}
select *
from {{ model }}
where {{ column_name }} < 0
{% endtest %}
Snapshots (SCD Type-2)
-- snapshots/snap_customers.sql
{% snapshot snap_customers %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}
select * from {{ source('raw', 'customers') }}
{% endsnapshot %}
Packages (packages.yml)
packages:
- package: dbt-labs/dbt_utils
version: [">=1.0.0", "<2.0.0"]
- package: dbt-labs/codegen
version: 0.12.0
- package: calogica/dbt_expectations
version: 0.10.0
- git: "https://github.com/my-org/dbt-internal-utils.git"
revision: v0.3.0
Pre/Post Hooks
{{
config(
materialized='table',
pre_hook="ALTER WAREHOUSE transform_wh SET WAREHOUSE_SIZE = 'XLARGE'",
post_hook=[
"ALTER WAREHOUSE transform_wh SET WAREHOUSE_SIZE = 'SMALL'",
"GRANT SELECT ON {{ this }} TO ROLE ANALYST"
]
)
}}
Troubleshooting
| Issue | Solution |
|---|---|
dbt debug fails | Check profiles.yml path (~/.dbt/). Verify credentials and network access |
| Model compilation error | Run dbt compile to see generated SQL. Check Jinja syntax |
| Circular dependency | Use dbt ls --select +model+ to visualize the DAG. Refactor to break cycles |
| Incremental model drift | Run dbt run --full-refresh -s model_name to rebuild |
| Source freshness failure | Check if upstream loads completed. Adjust warn_after/error_after thresholds |
| Package install fails | Clear dbt_packages/ dir and rerun dbt deps. Check version compatibility |
| Test failures | Run dbt test --select model_name with --store-failures to inspect bad rows |
| Slow model performance | Change materialization, add cluster keys, or use incremental strategy |
| Schema change errors | Set on_schema_change='sync_all_columns' for incremental models |
| Permission errors | Verify role grants in warehouse. Check schema config in dbt_project.yml |