Meltano Cheat Sheet
Overview
Meltano is an open-source platform for the full data lifecycle: Extract, Load, Transform, and Orchestrate. Built by GitLab’s data team and now maintained by the Meltano community, it uses the Singer protocol for data integration (taps for extraction, targets for loading) and integrates with dbt for transformations and Airflow for orchestration. Meltano manages everything through a declarative YAML configuration and a powerful CLI.
Meltano provides a plugin-based architecture with access to hundreds of Singer taps and targets from the MeltanoHub registry. It handles virtual environment management, configuration, secrets, and pipeline execution. Projects are version-controlled, making it easy to manage data infrastructure as code. Meltano supports environments (dev, staging, prod), scheduled pipelines, and can be deployed to any infrastructure including Docker, Kubernetes, and cloud platforms.
Installation
# Install Meltano
pip install meltano
# Or with pipx (recommended for isolation)
pipx install meltano
# Initialize a new project
meltano init my_data_project
cd my_data_project
# Verify installation
meltano version
# Start the web UI (optional)
meltano ui
Docker Installation
# Use official Docker image
docker run -v $(pwd):/project -w /project meltano/meltano init my_project
# Run with Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
meltano:
image: meltano/meltano:latest
volumes:
- .:/project
working_dir: /project
ports:
- "5000:5000"
environment:
- MELTANO_DATABASE_URI=postgresql://meltano:meltano@db:5432/meltano
db:
image: postgres:16
environment:
POSTGRES_USER: meltano
POSTGRES_PASSWORD: meltano
POSTGRES_DB: meltano
EOF
docker compose up -d
Project Structure
my_data_project/
├── meltano.yml # Main configuration file
├── .env # Environment variables (secrets)
├── requirements.txt # Python dependencies
├── analyze/ # Analysis files
├── extract/ # Custom extractor configs
├── load/ # Custom loader configs
├── transform/ # dbt project
│ ├── dbt_project.yml
│ ├── models/
│ └── macros/
├── orchestrate/ # Airflow DAGs
│ └── dags/
├── plugins/ # Plugin lock files
├── output/ # Pipeline output
└── .meltano/ # Internal state (gitignored)
CLI Commands
| Command | Description |
|---|---|
meltano init <name> | Create new project |
meltano add extractor <tap> | Add a Singer tap |
meltano add loader <target> | Add a Singer target |
meltano add transformer dbt | Add dbt transformer |
meltano add orchestrator airflow | Add Airflow orchestrator |
meltano add utility <name> | Add a utility plugin |
meltano install | Install all plugins |
meltano install extractor <tap> | Install specific extractor |
meltano config <plugin> set <key> <value> | Set plugin config |
meltano config <plugin> list | List plugin config |
meltano config <plugin> test | Test plugin config |
meltano select <tap> <entity> <attr> | Select entities/attributes to extract |
meltano select <tap> --list | List selected entities |
meltano run <tap> <target> | Run an EL pipeline |
meltano run <tap> <target> dbt-postgres:run | Run ELT pipeline |
meltano elt <tap> <target> --transform=run | Legacy ELT command |
meltano invoke <plugin> <args> | Run plugin command directly |
meltano schedule add <name> --job <job> | Add a schedule |
meltano schedule list | List schedules |
meltano job add <name> --tasks "<tap> <target>" | Define a job |
meltano job list | List jobs |
meltano environment add <name> | Add an environment |
meltano test | Run data tests |
meltano discover extractors | Browse available extractors |
meltano discover loaders | Browse available loaders |
meltano lock --update --all | Update all plugin lock files |
Configuration
meltano.yml
version: 1
project_id: my-data-project
default_environment: dev
environments:
- name: dev
- name: staging
- name: prod
plugins:
extractors:
- name: tap-postgres
variant: meltanolabs
pip_url: meltanolabs-tap-postgres
config:
host: $PG_HOST
port: 5432
database: $PG_DATABASE
user: $PG_USER
password: $PG_PASSWORD
filter_schemas:
- public
- analytics
default_replication_method: INCREMENTAL
select:
- public-orders.*
- public-customers.*
- public-products.*
- analytics-events.*
- name: tap-github
variant: meltanolabs
pip_url: meltanolabs-tap-github
config:
auth_token: $GITHUB_TOKEN
repositories:
- org/repo-1
- org/repo-2
start_date: "2024-01-01T00:00:00Z"
- name: tap-salesforce
variant: meltanolabs
pip_url: tap-salesforce
config:
client_id: $SALESFORCE_CLIENT_ID
client_secret: $SALESFORCE_CLIENT_SECRET
refresh_token: $SALESFORCE_REFRESH_TOKEN
api_type: BULK
loaders:
- name: target-snowflake
variant: meltanolabs
pip_url: meltanolabs-target-snowflake
config:
account: $SNOWFLAKE_ACCOUNT
user: $SNOWFLAKE_USER
password: $SNOWFLAKE_PASSWORD
database: RAW
warehouse: LOADING_WH
role: LOADER
default_target_schema: $MELTANO_EXTRACT__LOAD_SCHEMA
- name: target-postgres
variant: meltanolabs
pip_url: meltanolabs-target-postgres
config:
host: $TARGET_PG_HOST
port: 5432
database: warehouse
user: $TARGET_PG_USER
password: $TARGET_PG_PASSWORD
default_target_schema: raw
transformers:
- name: dbt-snowflake
variant: dbt-labs
pip_url: dbt-core~=1.7.0 dbt-snowflake~=1.7.0
config:
account: $SNOWFLAKE_ACCOUNT
user: $SNOWFLAKE_USER
password: $SNOWFLAKE_PASSWORD
database: ANALYTICS
warehouse: TRANSFORM_WH
role: TRANSFORMER
schema: PROD
utilities:
- name: great_expectations
variant: great-expectations
pip_url: great_expectations
schedules:
- name: daily-postgres-sync
job: postgres-to-snowflake
interval: "@daily"
- name: hourly-github-sync
job: github-to-snowflake
interval: "0 * * * *"
jobs:
- name: postgres-to-snowflake
tasks:
- tap-postgres target-snowflake
- dbt-snowflake:run
- dbt-snowflake:test
- name: github-to-snowflake
tasks:
- tap-github target-snowflake
Environment-Specific Config
environments:
- name: dev
config:
plugins:
extractors:
- name: tap-postgres
config:
host: localhost
database: dev_db
loaders:
- name: target-postgres
config:
host: localhost
database: dev_warehouse
- name: prod
config:
plugins:
extractors:
- name: tap-postgres
config:
host: prod-db.example.com
database: production
loaders:
- name: target-snowflake
config:
warehouse: PROD_LOADING_WH
Running Pipelines
Basic EL Pipeline
# Run extraction and loading
meltano run tap-postgres target-snowflake
# Run with specific environment
meltano --environment=prod run tap-postgres target-snowflake
# Run with state (incremental)
meltano run tap-postgres target-snowflake
# Run full pipeline (ELT)
meltano run tap-postgres target-snowflake dbt-snowflake:run dbt-snowflake:test
# Run with debug logging
meltano --log-level=debug run tap-postgres target-snowflake
Entity Selection
# Select specific tables/streams
meltano select tap-postgres public-orders "*"
meltano select tap-postgres public-customers "id,name,email"
# Exclude entities
meltano select tap-postgres --exclude public-audit_logs "*"
# List current selection
meltano select tap-postgres --list
# Select with replication method
meltano config tap-postgres set _metadata public-orders replication-method INCREMENTAL
meltano config tap-postgres set _metadata public-orders replication-key updated_at
Advanced Usage
Custom Extractors
# Create custom extractor from SDK
pip install cookiecutter
cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/tap-template"
# tap_custom_api/tap.py
from singer_sdk import Tap, Stream
from singer_sdk.typing import PropertiesList, Property, StringType, IntegerType
class CustomAPIStream(Stream):
name = "records"
primary_keys = ["id"]
replication_key = "updated_at"
schema = PropertiesList(
Property("id", IntegerType, required=True),
Property("name", StringType),
Property("status", StringType),
).to_dict()
def get_records(self, context):
response = self.authenticator.session.get(f"{self.url_base}/records")
yield from response.json()["data"]
class TapCustomAPI(Tap):
name = "tap-custom-api"
config_jsonschema = PropertiesList(
Property("api_key", StringType, required=True),
Property("base_url", StringType, required=True),
).to_dict()
def discover_streams(self):
return [CustomAPIStream(self)]
Orchestration with Airflow
# Add Airflow orchestrator
meltano add orchestrator airflow
meltano invoke airflow:initialize
# Start Airflow scheduler
meltano invoke airflow scheduler &
meltano invoke airflow webserver &
# Create schedule
meltano schedule add daily-sync --job full-elt --interval "@daily"
# Airflow DAG is auto-generated from schedules
State Management
# View pipeline state
meltano state list
meltano state get dev:tap-postgres-to-target-snowflake
# Clear state (force full resync)
meltano state clear dev:tap-postgres-to-target-snowflake
# Set custom state
meltano state set dev:tap-postgres-to-target-snowflake '{"bookmarks": {}}'
# Merge state
meltano state merge dev:tap-postgres-to-target-snowflake '{"bookmarks": {"orders": {"replication_key_value": "2024-06-01"}}}'
Troubleshooting
| Issue | Solution |
|---|---|
| Plugin installation fails | Check Python version compatibility. Try meltano install --clean |
| Connection refused | Verify credentials in .env. Test with meltano config <plugin> test |
| No data extracted | Check entity selection with meltano select <tap> --list. Verify source has data |
| State not persisting | Check MELTANO_DATABASE_URI is set. Default uses SQLite in .meltano/ |
| Incremental sync not working | Verify replication_key is set. Check state with meltano state get |
| Schema conflicts at target | Set default_target_schema differently per tap. Use add_record_metadata |
| dbt transform errors | Run meltano invoke dbt-snowflake debug. Check model SQL syntax |
| Environment variables not loading | Verify .env file exists. Use $VAR_NAME syntax in meltano.yml |
| Schedule not running | Ensure Airflow is initialized and scheduler is running. Check DAG parsing |
| Lock file conflicts | Run meltano lock --update --all to regenerate lock files |