Ir al contenido

dbt Cheat Sheet

Overview

dbt (data build tool) is an open-source command-line tool that enables analytics engineers to transform data in their warehouse by writing SQL select statements. Rather than writing boilerplate DDL/DML, engineers define models as select statements and dbt handles materialization, dependency management, testing, and documentation. dbt follows the ELT pattern where raw data is first loaded into the warehouse, then transformed in place.

dbt supports all major data warehouses including Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, and more through adapters. It brings software engineering practices to analytics work including version control, testing, CI/CD, documentation generation, and modular code organization. dbt Cloud provides a hosted IDE and scheduling, while dbt Core is the free open-source CLI. The tool has become the standard for the “T” in ELT pipelines.

Installation

Install dbt Core

# Install dbt with your warehouse adapter
pip install dbt-core dbt-snowflake       # Snowflake
pip install dbt-core dbt-bigquery        # BigQuery
pip install dbt-core dbt-redshift        # Redshift
pip install dbt-core dbt-postgres        # PostgreSQL
pip install dbt-core dbt-databricks      # Databricks
pip install dbt-core dbt-duckdb          # DuckDB

# Verify installation
dbt --version

Initialize a Project

# Create a new dbt project
dbt init my_project

# Navigate to project
cd my_project

# Test warehouse connection
dbt debug

Configure profiles.yml

# ~/.dbt/profiles.yml
my_project:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: my_account.us-east-1
      user: "{{ env_var('DBT_USER') }}"
      password: "{{ env_var('DBT_PASSWORD') }}"
      role: TRANSFORMER
      database: ANALYTICS
      warehouse: TRANSFORM_WH
      schema: DEV_{{ env_var('DBT_USER') }}
      threads: 4
    prod:
      type: snowflake
      account: my_account.us-east-1
      user: "{{ env_var('DBT_PROD_USER') }}"
      password: "{{ env_var('DBT_PROD_PASSWORD') }}"
      role: TRANSFORMER
      database: ANALYTICS
      warehouse: TRANSFORM_WH
      schema: PROD
      threads: 8

Core Commands

CommandDescription
dbt runExecute all models
dbt run --select model_nameRun a specific model
dbt run --select +model_nameRun a model and all upstream dependencies
dbt run --select model_name+Run a model and all downstream dependents
dbt run --select tag:dailyRun models with a specific tag
dbt run --full-refreshRebuild incremental models from scratch
dbt testRun all tests
dbt test --select model_nameRun tests for a specific model
dbt buildRun models, tests, snapshots, and seeds in order
dbt compileCompile SQL without executing
dbt seedLoad CSV files into the warehouse
dbt snapshotRun snapshot models (SCD Type-2)
dbt docs generateGenerate documentation
dbt docs serveServe documentation site locally
dbt cleanDelete compiled files and packages
dbt depsInstall dbt packages from packages.yml
dbt source freshnessCheck source data freshness
dbt debugTest connection and show config
dbt ls --select model_name+List models matching selector
dbt retryRetry the last failed run

Model Definition

Staging Model

-- models/staging/stg_orders.sql
with source as (
    select * from {{ source('raw', 'orders') }}
),

renamed as (
    select
        id as order_id,
        user_id as customer_id,
        order_date,
        status as order_status,
        amount as order_amount,
        _loaded_at as loaded_at
    from source
    where order_date is not null
)

select * from renamed

Intermediate Model

-- models/intermediate/int_orders_enriched.sql
with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('stg_customers') }}
),

enriched as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.customer_segment,
        o.order_date,
        o.order_status,
        o.order_amount
    from orders o
    left join customers c on o.customer_id = c.customer_id
)

select * from enriched

Mart Model (Incremental)

-- models/marts/fct_orders.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        on_schema_change='sync_all_columns',
        cluster_by=['order_date']
    )
}}

with orders as (
    select * from {{ ref('int_orders_enriched') }}
    {% if is_incremental() %}
        where order_date > (select max(order_date) from {{ this }})
    {% endif %}
)

select
    order_id,
    customer_id,
    customer_name,
    customer_segment,
    order_date,
    order_status,
    order_amount,
    current_timestamp() as dbt_updated_at
from orders

Configuration

dbt_project.yml

name: 'my_project'
version: '1.0.0'
config-version: 2
profile: 'my_project'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

models:
  my_project:
    staging:
      +materialized: view
      +schema: staging
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table
      +schema: marts
      finance:
        +tags: ['finance', 'daily']
      marketing:
        +tags: ['marketing', 'hourly']

seeds:
  my_project:
    +schema: seeds

vars:
  start_date: '2023-01-01'
  default_currency: 'USD'

Schema and Tests (YAML)

# models/staging/_schema.yml
version: 2

sources:
  - name: raw
    database: RAW_DB
    schema: PUBLIC
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    loaded_at_field: _loaded_at
    tables:
      - name: orders
        columns:
          - name: id
            tests:
              - unique
              - not_null
      - name: customers

models:
  - name: stg_orders
    description: "Cleaned and renamed orders from source"
    columns:
      - name: order_id
        description: "Primary key"
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id
      - name: order_amount
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 100000

Advanced Usage

Macros

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
    round({{ column_name }} / 100.0, 2)
{% endmacro %}

-- Usage in model:
-- select {{ cents_to_dollars('amount_cents') }} as amount_dollars

Custom Generic Tests

-- tests/generic/test_positive_value.sql
{% test positive_value(model, column_name) %}
select *
from {{ model }}
where {{ column_name }} < 0
{% endtest %}

Snapshots (SCD Type-2)

-- snapshots/snap_customers.sql
{% snapshot snap_customers %}
{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at',
        invalidate_hard_deletes=True,
    )
}}

select * from {{ source('raw', 'customers') }}

{% endsnapshot %}

Packages (packages.yml)

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0", "<2.0.0"]
  - package: dbt-labs/codegen
    version: 0.12.0
  - package: calogica/dbt_expectations
    version: 0.10.0
  - git: "https://github.com/my-org/dbt-internal-utils.git"
    revision: v0.3.0

Pre/Post Hooks

{{
    config(
        materialized='table',
        pre_hook="ALTER WAREHOUSE transform_wh SET WAREHOUSE_SIZE = 'XLARGE'",
        post_hook=[
            "ALTER WAREHOUSE transform_wh SET WAREHOUSE_SIZE = 'SMALL'",
            "GRANT SELECT ON {{ this }} TO ROLE ANALYST"
        ]
    )
}}

Troubleshooting

IssueSolution
dbt debug failsCheck profiles.yml path (~/.dbt/). Verify credentials and network access
Model compilation errorRun dbt compile to see generated SQL. Check Jinja syntax
Circular dependencyUse dbt ls --select +model+ to visualize the DAG. Refactor to break cycles
Incremental model driftRun dbt run --full-refresh -s model_name to rebuild
Source freshness failureCheck if upstream loads completed. Adjust warn_after/error_after thresholds
Package install failsClear dbt_packages/ dir and rerun dbt deps. Check version compatibility
Test failuresRun dbt test --select model_name with --store-failures to inspect bad rows
Slow model performanceChange materialization, add cluster keys, or use incremental strategy
Schema change errorsSet on_schema_change='sync_all_columns' for incremental models
Permission errorsVerify role grants in warehouse. Check schema config in dbt_project.yml