Skip to content

CloudQuery - Cloud Asset Inventory as SQL Cheatsheet

CloudQuery - Cloud Asset Inventory as SQL Cheatsheet

CloudQuery is an open-source, plugin-based data movement framework that extracts configuration from cloud providers and SaaS APIs (AWS, GCP, Azure, Kubernetes, GitHub, and more) and loads it into a destination — most commonly PostgreSQL — so you can query your entire infrastructure with SQL. Security and platform teams use it for asset inventory, posture management, compliance evidence, and answering “what do we actually have running?” with a query instead of a console click-through.

Architecture

ComponentRole
Source pluginPulls data from an API (aws, gcp, azure, k8s, github, …)
Destination pluginWrites data to a store (postgresql, bigquery, sqlite, file, …)
SyncOne run that extracts from sources and loads to destinations
ConfigYAML files describing sources and destinations

Installation

MethodCommand
Homebrewbrew install cloudquery/tap/cloudquery
Scriptcurl -L https://github.com/cloudquery/cloudquery/releases/latest/download/cloudquery_linux_amd64 -o cloudquery && chmod +x cloudquery
Dockerdocker run ghcr.io/cloudquery/cloudquery:latest
Verifycloudquery --version

Configuration

# aws-to-postgres.yaml
kind: source
spec:
  name: aws
  path: cloudquery/aws
  version: "VERSION"
  destinations: ["postgresql"]
  tables: ["aws_ec2_instances", "aws_s3_buckets", "aws_iam_*"]
---
kind: destination
spec:
  name: postgresql
  path: cloudquery/postgresql
  version: "VERSION"
  spec:
    connection_string: "postgresql://user:pass@localhost:5432/cq"

Core Commands

CommandDescription
cloudquery sync config.yamlRun a sync (extract → load)
cloudquery sync aws.yaml pg.yamlCombine multiple config files
cloudquery init --source aws --destination postgresqlScaffold a config
cloudquery tables config.yamlList tables a source provides
cloudquery migrate config.yamlApply schema migrations only
cloudquery plugin install config.yamlPre-install plugins
cloudquery --log-level debug sync ...Verbose logging

Querying the Inventory

Once synced, query with plain SQL:

-- Public S3 buckets
SELECT name, region FROM aws_s3_buckets
WHERE block_public_acls = false;

-- EC2 instances missing a required tag
SELECT instance_id, region FROM aws_ec2_instances
WHERE tags->>'Owner' IS NULL;

-- IAM users without MFA
SELECT user_name FROM aws_iam_users
WHERE mfa_active = false;

-- Cross-cloud: count compute by provider
SELECT 'aws' AS cloud, count(*) FROM aws_ec2_instances
UNION ALL SELECT 'gcp', count(*) FROM gcp_compute_instances;

Common Source Plugins

PluginCovers
cloudquery/awsEC2, S3, IAM, VPC, RDS, Lambda, …
cloudquery/gcpCompute, Storage, IAM, GKE, …
cloudquery/azureVMs, Storage, AAD, …
cloudquery/k8sPods, Deployments, RBAC, …
cloudquery/githubRepos, members, branch protection
cloudquery/cloudflare, okta, gcpSaaS posture

Scheduling & CI

ApproachHow
CronRun cloudquery sync on a schedule
CI pipelineSync then run SQL policy checks, fail on violations
IncrementalMany tables support incremental syncs to reduce cost
PoliciesPair with SQL queries as compliance controls

Common Workflows

# Nightly inventory refresh into Postgres
cloudquery sync aws.yaml gcp.yaml azure.yaml postgres.yaml

# Quick local exploration into SQLite (no DB server)
cloudquery sync aws.yaml sqlite.yaml
sqlite3 cq.db "SELECT name FROM aws_s3_buckets"

# List what an AWS source exposes before syncing
cloudquery tables aws.yaml

CloudQuery vs Other Approaches

AspectCloudQuerySteampipeNative CLIs
ModelSync to a DB, then SQLLive SQL over APIsImperative per-call
Best forInventory, history, joins at scaleAd-hoc live queriesOne-off lookups
PersistenceYes (your database)Query-timeNone
Cross-cloud joinsYesYesManual

Resources