CloudQuery - Cloud Asset Inventory as SQL Cheatsheet
CloudQuery is an open-source, plugin-based data movement framework that extracts configuration from cloud providers and SaaS APIs (AWS, GCP, Azure, Kubernetes, GitHub, and more) and loads it into a destination — most commonly PostgreSQL — so you can query your entire infrastructure with SQL. Security and platform teams use it for asset inventory, posture management, compliance evidence, and answering “what do we actually have running?” with a query instead of a console click-through.
Architecture
| Component | Role |
|---|
| Source plugin | Pulls data from an API (aws, gcp, azure, k8s, github, …) |
| Destination plugin | Writes data to a store (postgresql, bigquery, sqlite, file, …) |
| Sync | One run that extracts from sources and loads to destinations |
| Config | YAML files describing sources and destinations |
Installation
| Method | Command |
|---|
| Homebrew | brew install cloudquery/tap/cloudquery |
| Script | curl -L https://github.com/cloudquery/cloudquery/releases/latest/download/cloudquery_linux_amd64 -o cloudquery && chmod +x cloudquery |
| Docker | docker run ghcr.io/cloudquery/cloudquery:latest |
| Verify | cloudquery --version |
Configuration
# aws-to-postgres.yaml
kind: source
spec:
name: aws
path: cloudquery/aws
version: "VERSION"
destinations: ["postgresql"]
tables: ["aws_ec2_instances", "aws_s3_buckets", "aws_iam_*"]
---
kind: destination
spec:
name: postgresql
path: cloudquery/postgresql
version: "VERSION"
spec:
connection_string: "postgresql://user:pass@localhost:5432/cq"
Core Commands
| Command | Description |
|---|
cloudquery sync config.yaml | Run a sync (extract → load) |
cloudquery sync aws.yaml pg.yaml | Combine multiple config files |
cloudquery init --source aws --destination postgresql | Scaffold a config |
cloudquery tables config.yaml | List tables a source provides |
cloudquery migrate config.yaml | Apply schema migrations only |
cloudquery plugin install config.yaml | Pre-install plugins |
cloudquery --log-level debug sync ... | Verbose logging |
Querying the Inventory
Once synced, query with plain SQL:
-- Public S3 buckets
SELECT name, region FROM aws_s3_buckets
WHERE block_public_acls = false;
-- EC2 instances missing a required tag
SELECT instance_id, region FROM aws_ec2_instances
WHERE tags->>'Owner' IS NULL;
-- IAM users without MFA
SELECT user_name FROM aws_iam_users
WHERE mfa_active = false;
-- Cross-cloud: count compute by provider
SELECT 'aws' AS cloud, count(*) FROM aws_ec2_instances
UNION ALL SELECT 'gcp', count(*) FROM gcp_compute_instances;
Common Source Plugins
| Plugin | Covers |
|---|
cloudquery/aws | EC2, S3, IAM, VPC, RDS, Lambda, … |
cloudquery/gcp | Compute, Storage, IAM, GKE, … |
cloudquery/azure | VMs, Storage, AAD, … |
cloudquery/k8s | Pods, Deployments, RBAC, … |
cloudquery/github | Repos, members, branch protection |
cloudquery/cloudflare, okta, gcp… | SaaS posture |
Scheduling & CI
| Approach | How |
|---|
| Cron | Run cloudquery sync on a schedule |
| CI pipeline | Sync then run SQL policy checks, fail on violations |
| Incremental | Many tables support incremental syncs to reduce cost |
| Policies | Pair with SQL queries as compliance controls |
Common Workflows
# Nightly inventory refresh into Postgres
cloudquery sync aws.yaml gcp.yaml azure.yaml postgres.yaml
# Quick local exploration into SQLite (no DB server)
cloudquery sync aws.yaml sqlite.yaml
sqlite3 cq.db "SELECT name FROM aws_s3_buckets"
# List what an AWS source exposes before syncing
cloudquery tables aws.yaml
CloudQuery vs Other Approaches
| Aspect | CloudQuery | Steampipe | Native CLIs |
|---|
| Model | Sync to a DB, then SQL | Live SQL over APIs | Imperative per-call |
| Best for | Inventory, history, joins at scale | Ad-hoc live queries | One-off lookups |
| Persistence | Yes (your database) | Query-time | None |
| Cross-cloud joins | Yes | Yes | Manual |
Resources