Apache Cassandra Cheatsheet¶

Apache Cassandra - Distributed NoSQL Database

Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Table of Contents¶

Installation
Basic Commands
Keyspace Operations
Table Operations
Data Types
CRUD Operations
Query Operations
Indexes
User Defined Types
Functions
Materialized Views
Batch Operations
Consistency Levels
Cluster Management
Performance Tuning
Monitoring
Security
Best Practices

Installation¶

Ubuntu/Debian Installation¶

# Add Apache Cassandra repository
echo "deb https://debian.cassandra.apache.org 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -

# Update package index
sudo apt update

# Install Cassandra
sudo apt install cassandra

# Start Cassandra service
sudo systemctl start cassandra
sudo systemctl enable cassandra

# Check status
sudo systemctl status cassandra

# Verify installation
nodetool status

CentOS/RHEL Installation¶

# Create repository file
sudo tee /etc/yum.repos.d/cassandra.repo << EOF
[cassandra]
name=Apache Cassandra
baseurl=https://redhat.cassandra.apache.org/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS
EOF

# Install Cassandra
sudo yum install cassandra

# Start Cassandra service
sudo systemctl start cassandra
sudo systemctl enable cassandra

# Check status
nodetool status

Docker Installation¶

# Pull Cassandra image
docker pull cassandra:latest

# Run Cassandra container
docker run --name cassandra-node \
  -p 9042:9042 \
  -p 7000:7000 \
  -p 7001:7001 \
  -p 7199:7199 \
  -p 9160:9160 \
  -d cassandra:latest

# Connect to Cassandra
docker exec -it cassandra-node cqlsh

# Docker Compose for cluster
cat > docker-compose.yml << EOF
version: '3.8'
services:
  cassandra-1:
    image: cassandra:latest
    container_name: cassandra-1
    ports:
      - "9042:9042"
    environment:
      - CASSANDRA_CLUSTER_NAME=MyCluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=datacenter1
    volumes:
      - cassandra-1-data:/var/lib/cassandra

  cassandra-2:
    image: cassandra:latest
    container_name: cassandra-2
    environment:
      - CASSANDRA_CLUSTER_NAME=MyCluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_SEEDS=cassandra-1
    depends_on:
      - cassandra-1
    volumes:
      - cassandra-2-data:/var/lib/cassandra

  cassandra-3:
    image: cassandra:latest
    container_name: cassandra-3
    environment:
      - CASSANDRA_CLUSTER_NAME=MyCluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_SEEDS=cassandra-1
    depends_on:
      - cassandra-1
    volumes:
      - cassandra-3-data:/var/lib/cassandra

volumes:
  cassandra-1-data:
  cassandra-2-data:
  cassandra-3-data:
EOF

docker-compose up -d

Manual Installation¶

# Download Cassandra
wget https://downloads.apache.org/cassandra/4.0.7/apache-cassandra-4.0.7-bin.tar.gz

# Extract
tar -xzf apache-cassandra-4.0.7-bin.tar.gz
sudo mv apache-cassandra-4.0.7 /opt/cassandra

# Set environment variables
echo 'export CASSANDRA_HOME=/opt/cassandra' >> ~/.bashrc
echo 'export PATH=$PATH:$CASSANDRA_HOME/bin' >> ~/.bashrc
source ~/.bashrc

# Start Cassandra
cassandra -f  # Foreground mode
# or
cassandra     # Background mode

# Connect to Cassandra
cqlsh

Basic Commands¶

Connecting to Cassandra¶

# Connect to local Cassandra
cqlsh

# Connect to remote Cassandra
cqlsh hostname
cqlsh hostname port

# Connect with credentials
cqlsh -u username -p password

# Connect with SSL
cqlsh --ssl

# Execute CQL from file
cqlsh -f script.cql

# Execute single command
cqlsh -e "DESCRIBE KEYSPACES;"

CQL Shell Commands¶

-- Show help
HELP;

-- Show current keyspace
SELECT keyspace_name FROM system.local;

-- Show cluster information
DESCRIBE CLUSTER;

-- Show keyspaces
DESCRIBE KEYSPACES;

-- Show tables in keyspace
DESCRIBE TABLES;

-- Show table structure
DESCRIBE TABLE table_name;

-- Show version
SELECT release_version FROM system.local;

-- Exit CQL shell
EXIT;
QUIT;

-- Enable/disable tracing
TRACING ON;
TRACING OFF;

-- Show query execution time
CONSISTENCY;

System Information¶

-- Show cluster information
SELECT cluster_name, listen_address, data_center, rack, release_version 
FROM system.local;

-- Show all nodes
SELECT peer, data_center, rack, release_version, tokens 
FROM system.peers;

-- Show keyspace information
SELECT keyspace_name, durable_writes, replication 
FROM system_schema.keyspaces;

-- Show table information
SELECT keyspace_name, table_name, bloom_filter_fp_chance, 
       caching, comment, compaction, compression 
FROM system_schema.tables 
WHERE keyspace_name = 'mykeyspace';

-- Show column information
SELECT keyspace_name, table_name, column_name, clustering_order, 
       column_name_bytes, kind, position, type 
FROM system_schema.columns 
WHERE keyspace_name = 'mykeyspace' AND table_name = 'mytable';

Keyspace Operations¶

Creating Keyspaces¶

-- Simple keyspace creation
CREATE KEYSPACE mykeyspace 
WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 3
};

-- Keyspace with NetworkTopologyStrategy
CREATE KEYSPACE mykeyspace 
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,
    'datacenter2': 2
} 
AND DURABLE_WRITES = true;

-- Keyspace for development (single node)
CREATE KEYSPACE dev_keyspace 
WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};

-- Keyspace with custom options
CREATE KEYSPACE analytics 
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3
} 
AND DURABLE_WRITES = false;

Managing Keyspaces¶

-- Use keyspace
USE mykeyspace;

-- Alter keyspace replication
ALTER KEYSPACE mykeyspace 
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 5,
    'datacenter2': 3
};

-- Alter durable writes
ALTER KEYSPACE mykeyspace 
WITH DURABLE_WRITES = false;

-- Drop keyspace
DROP KEYSPACE mykeyspace;

-- Show keyspace details
DESCRIBE KEYSPACE mykeyspace;

-- Show all keyspaces
SELECT keyspace_name, durable_writes, replication 
FROM system_schema.keyspaces;

Table Operations¶

Creating Tables¶

-- Basic table creation
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT,
    first_name TEXT,
    last_name TEXT,
    created_at TIMESTAMP
);

-- Table with composite primary key
CREATE TABLE user_posts (
    user_id UUID,
    post_id TIMEUUID,
    title TEXT,
    content TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
);

-- Table with clustering order
CREATE TABLE time_series (
    sensor_id UUID,
    timestamp TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    PRIMARY KEY (sensor_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

-- Table with multiple clustering columns
CREATE TABLE user_activities (
    user_id UUID,
    year INT,
    month INT,
    day INT,
    activity_id TIMEUUID,
    activity_type TEXT,
    description TEXT,
    PRIMARY KEY (user_id, year, month, day, activity_id)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, activity_id DESC);

-- Table with options
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name TEXT,
    description TEXT,
    price DECIMAL,
    category TEXT,
    created_at TIMESTAMP
) WITH comment = 'Product catalog table'
AND gc_grace_seconds = 864000
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': 32, 'min_threshold': 4}
AND compression = {'chunk_length_in_kb': 64, 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'};

Modifying Tables¶

-- Add column
ALTER TABLE users ADD phone TEXT;

-- Add multiple columns
ALTER TABLE users ADD (
    address TEXT,
    city TEXT,
    country TEXT
);

-- Drop column
ALTER TABLE users DROP phone;

-- Rename column (Cassandra 3.0+)
ALTER TABLE users RENAME email TO email_address;

-- Alter column type (limited support)
ALTER TABLE users ALTER created_at TYPE TIMESTAMP;

-- Add table options
ALTER TABLE users WITH comment = 'User information table';

-- Modify compaction strategy
ALTER TABLE users WITH compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
};

-- Modify compression
ALTER TABLE users WITH compression = {
    'class': 'DeflateCompressor',
    'chunk_length_in_kb': 64
};

-- Drop table
DROP TABLE users;

Table Information¶

-- Describe table
DESCRIBE TABLE users;

-- Show table schema
SELECT keyspace_name, table_name, column_name, clustering_order, 
       column_name_bytes, kind, position, type 
FROM system_schema.columns 
WHERE keyspace_name = 'mykeyspace' AND table_name = 'users';

-- Show table options
SELECT keyspace_name, table_name, bloom_filter_fp_chance, 
       caching, comment, compaction, compression, 
       crc_check_chance, dclocal_read_repair_chance, 
       default_time_to_live, gc_grace_seconds, 
       max_index_interval, memtable_flush_period_in_ms, 
       min_index_interval, read_repair_chance, speculative_retry 
FROM system_schema.tables 
WHERE keyspace_name = 'mykeyspace' AND table_name = 'users';

Data Types¶

Basic Data Types¶

-- Numeric types
CREATE TABLE numeric_examples (
    id UUID PRIMARY KEY,
    tiny_int_val TINYINT,      -- 8-bit signed integer
    small_int_val SMALLINT,    -- 16-bit signed integer
    int_val INT,               -- 32-bit signed integer
    big_int_val BIGINT,        -- 64-bit signed integer
    varint_val VARINT,         -- Arbitrary precision integer
    decimal_val DECIMAL,       -- Variable precision decimal
    float_val FLOAT,           -- 32-bit IEEE-754 floating point
    double_val DOUBLE          -- 64-bit IEEE-754 floating point
);

-- Text and binary types
CREATE TABLE text_examples (
    id UUID PRIMARY KEY,
    ascii_val ASCII,           -- ASCII string
    text_val TEXT,             -- UTF-8 string
    varchar_val VARCHAR,       -- UTF-8 string (alias for TEXT)
    blob_val BLOB              -- Binary data
);

-- Date and time types
CREATE TABLE datetime_examples (
    id UUID PRIMARY KEY,
    date_val DATE,             -- Date without time
    time_val TIME,             -- Time without date
    timestamp_val TIMESTAMP,   -- Date and time
    duration_val DURATION      -- Duration (Cassandra 3.10+)
);

-- Boolean and UUID types
CREATE TABLE other_examples (
    id UUID PRIMARY KEY,
    bool_val BOOLEAN,          -- True or false
    uuid_val UUID,             -- Random UUID
    timeuuid_val TIMEUUID,     -- Time-based UUID
    inet_val INET              -- IP address (IPv4 or IPv6)
);

Collection Data Types¶

-- Set type
CREATE TABLE user_interests (
    user_id UUID PRIMARY KEY,
    interests SET<TEXT>
);

-- List type
CREATE TABLE user_addresses (
    user_id UUID PRIMARY KEY,
    addresses LIST<TEXT>
);

-- Map type
CREATE TABLE user_attributes (
    user_id UUID PRIMARY KEY,
    attributes MAP<TEXT, TEXT>
);

-- Frozen collections
CREATE TABLE user_profile (
    user_id UUID PRIMARY KEY,
    tags FROZEN<SET<TEXT>>,
    metadata FROZEN<MAP<TEXT, TEXT>>
);

-- Nested collections (with FROZEN)
CREATE TABLE complex_data (
    id UUID PRIMARY KEY,
    nested_map MAP<TEXT, FROZEN<LIST<TEXT>>>,
    list_of_sets LIST<FROZEN<SET<INT>>>
);

User Defined Types¶

-- Create user defined type
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip_code TEXT,
    country TEXT
);

-- Create another UDT
CREATE TYPE phone (
    country_code TEXT,
    number TEXT,
    type TEXT
);

-- Use UDT in table
CREATE TABLE contacts (
    contact_id UUID PRIMARY KEY,
    name TEXT,
    home_address address,
    work_address address,
    phones LIST<phone>
);

-- Frozen UDT
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY,
    personal_info FROZEN<address>,
    contact_numbers FROZEN<LIST<phone>>
);

CRUD Operations¶

Insert Operations¶

-- Basic insert
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'john_doe', 'john@example.com', 'John', 'Doe', toTimestamp(now()));

-- Insert with TTL (Time To Live)
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'jane_doe', 'jane@example.com', 'Jane', 'Doe', toTimestamp(now()))
USING TTL 86400;

-- Insert with timestamp
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'bob_smith', 'bob@example.com', 'Bob', 'Smith', toTimestamp(now()))
USING TIMESTAMP 1640995200000000;

-- Insert into collection columns
INSERT INTO user_interests (user_id, interests)
VALUES (uuid(), {'technology', 'music', 'sports'});

INSERT INTO user_addresses (user_id, addresses)
VALUES (uuid(), ['123 Main St', '456 Oak Ave', '789 Pine Rd']);

INSERT INTO user_attributes (user_id, attributes)
VALUES (uuid(), {'theme': 'dark', 'language': 'en', 'timezone': 'UTC'});

-- Insert with UDT
INSERT INTO contacts (contact_id, name, home_address, phones)
VALUES (
    uuid(),
    'John Doe',
    {street: '123 Main St', city: 'New York', state: 'NY', zip_code: '10001', country: 'USA'},
    [{country_code: '+1', number: '555-1234', type: 'mobile'}]
);

-- Conditional insert (IF NOT EXISTS)
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'unique_user', 'unique@example.com', 'Unique', 'User', toTimestamp(now()))
IF NOT EXISTS;

Select Operations¶

-- Basic select
SELECT * FROM users;

-- Select specific columns
SELECT user_id, username, email FROM users;

-- Select with WHERE clause (must include partition key)
SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Select with clustering column conditions
SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

SELECT * FROM user_posts 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
AND post_id > minTimeuuid('2023-01-01');

-- Select with LIMIT
SELECT * FROM users LIMIT 10;

-- Select with ORDER BY (only on clustering columns)
SELECT * FROM user_posts 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
ORDER BY post_id DESC;

-- Select with ALLOW FILTERING (use carefully)
SELECT * FROM users WHERE email = 'john@example.com' ALLOW FILTERING;

-- Select with token function for pagination
SELECT * FROM users WHERE token(user_id) > token(123e4567-e89b-12d3-a456-426614174000);

-- Select collection elements
SELECT user_id, interests FROM user_interests;
SELECT user_id, interests[0] FROM user_interests;  -- First element of set/list

-- Select from UDT
SELECT contact_id, name, home_address.city, home_address.state FROM contacts;

-- Select with functions
SELECT user_id, username, toTimestamp(now()) as current_time FROM users;
SELECT user_id, username, dateOf(user_id) as creation_date FROM users WHERE user_id = timeuuid();

Update Operations¶

-- Basic update
UPDATE users 
SET email = 'newemail@example.com' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update multiple columns
UPDATE users 
SET email = 'john.doe@example.com', 
    first_name = 'Jonathan' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update with TTL
UPDATE users USING TTL 3600 
SET email = 'temp@example.com' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update with timestamp
UPDATE users USING TIMESTAMP 1640995200000000 
SET email = 'timestamped@example.com' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update collection - add to set
UPDATE user_interests 
SET interests = interests + {'reading'} 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update collection - remove from set
UPDATE user_interests 
SET interests = interests - {'sports'} 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update collection - append to list
UPDATE user_addresses 
SET addresses = addresses + ['999 New St'] 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update collection - prepend to list
UPDATE user_addresses 
SET addresses = ['000 First St'] + addresses 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update map
UPDATE user_attributes 
SET attributes['theme'] = 'light' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Update UDT field
UPDATE contacts 
SET home_address.city = 'Los Angeles' 
WHERE contact_id = 123e4567-e89b-12d3-a456-426614174000;

-- Conditional update
UPDATE users 
SET email = 'conditional@example.com' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
IF email = 'old@example.com';

-- Update with IF EXISTS
UPDATE users 
SET email = 'exists@example.com' 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
IF EXISTS;

Delete Operations¶

-- Delete entire row
DELETE FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Delete specific columns
DELETE email, phone FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Delete with timestamp
DELETE FROM users USING TIMESTAMP 1640995200000000 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Delete from collection
DELETE interests['technology'] FROM user_interests 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Delete list element by index
DELETE addresses[0] FROM user_addresses 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Delete map key
DELETE attributes['theme'] FROM user_attributes 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Delete UDT field
DELETE home_address.phone FROM contacts 
WHERE contact_id = 123e4567-e89b-12d3-a456-426614174000;

-- Conditional delete
DELETE FROM users 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
IF email = 'delete@example.com';

-- Delete with IF EXISTS
DELETE FROM users 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
IF EXISTS;

-- Delete range (clustering columns)
DELETE FROM user_posts 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
AND post_id > minTimeuuid('2023-01-01') 
AND post_id < maxTimeuuid('2023-12-31');

Query Operations¶

Advanced Querying¶

-- Range queries on clustering columns
SELECT * FROM time_series 
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000 
AND timestamp >= '2023-01-01' 
AND timestamp < '2023-02-01';

-- IN queries
SELECT * FROM users WHERE user_id IN (
    123e4567-e89b-12d3-a456-426614174000,
    456e7890-e89b-12d3-a456-426614174001,
    789e0123-e89b-12d3-a456-426614174002
);

-- Token-based queries for pagination
SELECT * FROM users WHERE token(user_id) > token(123e4567-e89b-12d3-a456-426614174000) LIMIT 1000;

-- CONTAINS queries (requires SASI or secondary index)
SELECT * FROM user_interests WHERE interests CONTAINS 'technology' ALLOW FILTERING;

-- CONTAINS KEY queries for maps
SELECT * FROM user_attributes WHERE attributes CONTAINS KEY 'theme' ALLOW FILTERING;

-- Slice queries
SELECT * FROM user_activities 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
AND year = 2023 
AND month >= 6;

-- Multi-column slice
SELECT * FROM user_activities 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
AND (year, month) >= (2023, 6) 
AND (year, month) <= (2023, 12);

Aggregation Functions¶

-- Count
SELECT COUNT(*) FROM users;

-- Count with WHERE clause
SELECT COUNT(*) FROM user_posts 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Min and Max (Cassandra 2.2+)
SELECT MIN(temperature), MAX(temperature) FROM time_series 
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000;

-- Sum and Average (Cassandra 2.2+)
SELECT SUM(temperature), AVG(temperature) FROM time_series 
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000;

-- Group by (limited support)
SELECT sensor_id, COUNT(*) FROM time_series GROUP BY sensor_id;

Built-in Functions¶

-- UUID functions
SELECT uuid() as new_uuid;
SELECT now() as current_timeuuid;
SELECT minTimeuuid('2023-01-01') as min_uuid;
SELECT maxTimeuuid('2023-12-31') as max_uuid;

-- Date and time functions
SELECT toTimestamp(now()) as current_timestamp;
SELECT dateOf(now()) as current_date;
SELECT unixTimestampOf(now()) as unix_timestamp;

-- Blob functions
SELECT textAsBlob('hello') as text_blob;
SELECT blobAsText(textAsBlob('hello')) as blob_text;

-- Token function
SELECT token(user_id) FROM users;

-- TTL and WRITETIME functions
SELECT TTL(email), WRITETIME(email) FROM users 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Type conversion functions
SELECT toJson(interests) FROM user_interests;
SELECT fromJson('["technology", "music"]') as parsed_interests;

Indexes¶

Secondary Indexes¶

-- Create secondary index
CREATE INDEX ON users (email);

-- Create named index
CREATE INDEX user_email_idx ON users (email);

-- Create index on collection
CREATE INDEX ON user_interests (interests);

-- Create index on map keys
CREATE INDEX ON user_attributes (KEYS(attributes));

-- Create index on map values
CREATE INDEX ON user_attributes (VALUES(attributes));

-- Create index on map entries
CREATE INDEX ON user_attributes (ENTRIES(attributes));

-- Create index on UDT field
CREATE INDEX ON contacts (home_address);

-- Show indexes
DESCRIBE INDEX user_email_idx;

-- Drop index
DROP INDEX user_email_idx;

SASI Indexes (Cassandra 3.4+)¶

-- Create SASI index for text search
CREATE CUSTOM INDEX user_name_sasi_idx ON users (first_name) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {
    'mode': 'CONTAINS',
    'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
    'case_sensitive': 'false'
};

-- Create SASI index for numeric range queries
CREATE CUSTOM INDEX product_price_sasi_idx ON products (price) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {'mode': 'SPARSE'};

-- Query with SASI index
SELECT * FROM users WHERE first_name LIKE '%john%';
SELECT * FROM products WHERE price >= 100 AND price <= 500;

User Defined Types¶

Creating and Managing UDTs¶

-- Create UDT
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip_code TEXT,
    country TEXT
);

-- Create nested UDT
CREATE TYPE contact_info (
    email TEXT,
    phone TEXT,
    address address
);

-- Alter UDT - add field
ALTER TYPE address ADD region TEXT;

-- Alter UDT - rename field
ALTER TYPE address RENAME zip_code TO postal_code;

-- Show UDT
DESCRIBE TYPE address;

-- Drop UDT
DROP TYPE address;

-- Use UDT in table
CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    name TEXT,
    billing_address address,
    shipping_address address,
    contact contact_info
);

-- Insert with UDT
INSERT INTO customers (customer_id, name, billing_address, contact)
VALUES (
    uuid(),
    'John Doe',
    {
        street: '123 Main St',
        city: 'New York',
        state: 'NY',
        postal_code: '10001',
        country: 'USA',
        region: 'Northeast'
    },
    {
        email: 'john@example.com',
        phone: '+1-555-1234',
        address: {
            street: '456 Work Ave',
            city: 'New York',
            state: 'NY',
            postal_code: '10002',
            country: 'USA',
            region: 'Northeast'
        }
    }
);

-- Query UDT fields
SELECT customer_id, name, billing_address.city, contact.email FROM customers;

-- Update UDT field
UPDATE customers 
SET billing_address.city = 'Los Angeles' 
WHERE customer_id = 123e4567-e89b-12d3-a456-426614174000;

Functions¶

User Defined Functions (UDF)¶

-- Enable UDF (in cassandra.yaml: enable_user_defined_functions: true)

-- Create simple UDF
CREATE OR REPLACE FUNCTION fLog (input double) 
CALLED ON NULL INPUT 
RETURNS double 
LANGUAGE java 
AS 'return Double.valueOf(Math.log(input.doubleValue()));';

-- Create UDF with multiple parameters
CREATE OR REPLACE FUNCTION fAdd (x int, y int) 
CALLED ON NULL INPUT 
RETURNS int 
LANGUAGE java 
AS 'return Integer.valueOf(x.intValue() + y.intValue());';

-- Create UDF for text processing
CREATE OR REPLACE FUNCTION toUpperCase (input text) 
CALLED ON NULL INPUT 
RETURNS text 
LANGUAGE java 
AS 'return input.toUpperCase();';

-- Create UDF for collection processing
CREATE OR REPLACE FUNCTION listSize (input list<text>) 
CALLED ON NULL INPUT 
RETURNS int 
LANGUAGE java 
AS 'return Integer.valueOf(input.size());';

-- Use UDF in queries
SELECT user_id, fAdd(5, 10) as sum FROM users;
SELECT user_id, toUpperCase(first_name) as upper_name FROM users;
SELECT user_id, listSize(addresses) as address_count FROM user_addresses;

-- Show functions
DESCRIBE FUNCTION fLog;

-- Drop function
DROP FUNCTION fLog;

User Defined Aggregates (UDA)¶

-- Create state function for UDA
CREATE OR REPLACE FUNCTION avgState (state tuple<int,bigint>, val int) 
CALLED ON NULL INPUT 
RETURNS tuple<int,bigint> 
LANGUAGE java 
AS '
    if (val != null) {
        state.setInt(0, state.getInt(0) + 1);
        state.setLong(1, state.getLong(1) + val.intValue());
    }
    return state;
';

-- Create final function for UDA
CREATE OR REPLACE FUNCTION avgFinal (state tuple<int,bigint>) 
CALLED ON NULL INPUT 
RETURNS double 
LANGUAGE java 
AS '
    double r = 0;
    if (state.getInt(0) == 0) return null;
    r = state.getLong(1);
    r /= state.getInt(0);
    return Double.valueOf(r);
';

-- Create UDA
CREATE OR REPLACE AGGREGATE average (int)
SFUNC avgState
STYPE tuple<int,bigint>
FINALFUNC avgFinal
INITCOND (0, 0);

-- Use UDA
SELECT sensor_id, average(temperature) FROM time_series GROUP BY sensor_id;

-- Drop UDA
DROP AGGREGATE average;

Materialized Views¶

Creating Materialized Views¶

-- Create materialized view
CREATE MATERIALIZED VIEW users_by_email AS
SELECT user_id, username, email, first_name, last_name, created_at
FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);

-- Materialized view with different clustering
CREATE MATERIALIZED VIEW posts_by_title AS
SELECT user_id, post_id, title, content, created_at
FROM user_posts
WHERE title IS NOT NULL AND user_id IS NOT NULL AND post_id IS NOT NULL
PRIMARY KEY (title, user_id, post_id);

-- Materialized view with filtering
CREATE MATERIALIZED VIEW recent_posts AS
SELECT user_id, post_id, title, content, created_at
FROM user_posts
WHERE created_at IS NOT NULL AND user_id IS NOT NULL AND post_id IS NOT NULL
AND created_at >= '2023-01-01'
PRIMARY KEY (created_at, user_id, post_id)
WITH CLUSTERING ORDER BY (user_id ASC, post_id DESC);

-- Show materialized view
DESCRIBE MATERIALIZED VIEW users_by_email;

-- Query materialized view
SELECT * FROM users_by_email WHERE email = 'john@example.com';

-- Drop materialized view
DROP MATERIALIZED VIEW users_by_email;

Materialized View Considerations¶

-- Materialized views are automatically maintained
-- Base table changes are propagated to views

-- Insert into base table
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'new_user', 'new@example.com', 'New', 'User', toTimestamp(now()));

-- Query view to see automatic update
SELECT * FROM users_by_email WHERE email = 'new@example.com';

-- Update base table
UPDATE users SET first_name = 'Updated' WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- View is automatically updated
SELECT * FROM users_by_email WHERE email = 'john@example.com';

-- Show all materialized views
SELECT keyspace_name, view_name, base_table_name 
FROM system_schema.views;

Batch Operations¶

Batch Statements¶

-- Logged batch (default)
BEGIN BATCH
    INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
    VALUES (uuid(), 'batch_user1', 'batch1@example.com', 'Batch', 'User1', toTimestamp(now()));

    INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
    VALUES (uuid(), 'batch_user2', 'batch2@example.com', 'Batch', 'User2', toTimestamp(now()));

    UPDATE users SET email = 'updated@example.com' 
    WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
APPLY BATCH;

-- Unlogged batch (better performance, no atomicity guarantee)
BEGIN UNLOGGED BATCH
    INSERT INTO user_posts (user_id, post_id, title, content, created_at)
    VALUES (123e4567-e89b-12d3-a456-426614174000, now(), 'Post 1', 'Content 1', toTimestamp(now()));

    INSERT INTO user_posts (user_id, post_id, title, content, created_at)
    VALUES (123e4567-e89b-12d3-a456-426614174000, now(), 'Post 2', 'Content 2', toTimestamp(now()));
APPLY BATCH;

-- Batch with TTL and timestamp
BEGIN BATCH USING TTL 3600 AND TIMESTAMP 1640995200000000
    INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
    VALUES (uuid(), 'temp_user', 'temp@example.com', 'Temp', 'User', toTimestamp(now()));

    UPDATE users SET phone = '+1-555-9999' 
    WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
APPLY BATCH;

-- Conditional batch
BEGIN BATCH
    INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
    VALUES (uuid(), 'conditional_user', 'conditional@example.com', 'Conditional', 'User', toTimestamp(now()))
    IF NOT EXISTS;

    UPDATE users SET email = 'conditional_update@example.com' 
    WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
    IF email = 'old@example.com';
APPLY BATCH;

-- Counter batch
BEGIN COUNTER BATCH
    UPDATE page_views SET views = views + 1 WHERE page_id = 'home';
    UPDATE page_views SET views = views + 1 WHERE page_id = 'about';
APPLY BATCH;

Consistency Levels¶

Setting Consistency Levels¶

-- Show current consistency level
CONSISTENCY;

-- Set consistency level for reads
CONSISTENCY ONE;
CONSISTENCY QUORUM;
CONSISTENCY ALL;
CONSISTENCY LOCAL_QUORUM;
CONSISTENCY EACH_QUORUM;
CONSISTENCY LOCAL_ONE;

-- Set consistency level for writes
CONSISTENCY ONE;
CONSISTENCY QUORUM;
CONSISTENCY ALL;
CONSISTENCY LOCAL_QUORUM;
CONSISTENCY EACH_QUORUM;

-- Consistency levels explanation:
-- ONE: One replica responds
-- QUORUM: Majority of replicas respond
-- ALL: All replicas respond
-- LOCAL_QUORUM: Majority of replicas in local datacenter
-- EACH_QUORUM: Majority of replicas in each datacenter
-- LOCAL_ONE: One replica in local datacenter

-- Example queries with different consistency levels
CONSISTENCY QUORUM;
SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

CONSISTENCY LOCAL_QUORUM;
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'consistent_user', 'consistent@example.com', 'Consistent', 'User', toTimestamp(now()));

-- Serial consistency for lightweight transactions
SERIAL CONSISTENCY SERIAL;
SERIAL CONSISTENCY LOCAL_SERIAL;

-- Lightweight transaction with serial consistency
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'lwt_user', 'lwt@example.com', 'LWT', 'User', toTimestamp(now()))
IF NOT EXISTS;

Cluster Management¶

Node Operations¶

# Check cluster status
nodetool status

# Check node information
nodetool info

# Check ring information
nodetool ring

# Describe cluster
nodetool describecluster

# Check gossip information
nodetool gossipinfo

# Flush memtables to disk
nodetool flush

# Compact SSTables
nodetool compact

# Cleanup after replication changes
nodetool cleanup

# Repair data
nodetool repair

# Repair specific keyspace
nodetool repair mykeyspace

# Repair specific table
nodetool repair mykeyspace mytable

# Incremental repair
nodetool repair -inc

# Full repair
nodetool repair -full

Cluster Monitoring¶

# Check node stats
nodetool netstats

# Check compaction stats
nodetool compactionstats

# Check thread pool stats
nodetool tpstats

# Check histogram stats
nodetool cfstats

# Check table stats
nodetool tablestats mykeyspace.mytable

# Check proxyhistograms
nodetool proxyhistograms

# Check gossip state
nodetool gossipinfo

# Check schema version
nodetool describecluster

# Check pending tasks
nodetool pendingcompactions

# Check streaming operations
nodetool netstats

Backup and Snapshot¶

# Create snapshot
nodetool snapshot

# Create snapshot with name
nodetool snapshot -t backup_20231201

# Create snapshot for specific keyspace
nodetool snapshot mykeyspace

# List snapshots
nodetool listsnapshots

# Clear snapshot
nodetool clearsnapshot

# Clear specific snapshot
nodetool clearsnapshot -t backup_20231201

# Incremental backup (enable in cassandra.yaml)
# incremental_backups: true

# Restore from snapshot
# 1. Stop Cassandra
# 2. Clear data directory
# 3. Copy snapshot files to data directory
# 4. Start Cassandra
# 5. Run nodetool refresh

Adding/Removing Nodes¶

# Add new node to cluster
# 1. Install Cassandra on new node
# 2. Configure cassandra.yaml with cluster settings
# 3. Set seeds to existing nodes
# 4. Start Cassandra
# 5. Check status: nodetool status

# Bootstrap new node
nodetool bootstrap

# Decommission node (graceful removal)
nodetool decommission

# Remove dead node (forced removal)
nodetool removenode <node_id>

# Move node to new token
nodetool move <new_token>

# Drain node before shutdown
nodetool drain

# Stop gossip
nodetool disablegossip

# Start gossip
nodetool enablegossip

# Stop thrift
nodetool disablethrift

# Start thrift
nodetool enablethrift

Performance Tuning¶

Query Optimization¶

-- Use appropriate partition keys
-- Good: Query by partition key
SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Bad: Query without partition key (requires ALLOW FILTERING)
SELECT * FROM user_posts WHERE title = 'Some Title' ALLOW FILTERING;

-- Use clustering columns for range queries
SELECT * FROM time_series 
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000 
AND timestamp >= '2023-01-01' 
AND timestamp < '2023-02-01';

-- Limit result size
SELECT * FROM users LIMIT 1000;

-- Use token-based pagination
SELECT * FROM users WHERE token(user_id) > token(123e4567-e89b-12d3-a456-426614174000) LIMIT 1000;

-- Avoid ALLOW FILTERING when possible
-- Create secondary index instead
CREATE INDEX ON users (email);
SELECT * FROM users WHERE email = 'john@example.com';

-- Use prepared statements in applications
PREPARE get_user AS 
SELECT * FROM users WHERE user_id = ?;

EXECUTE get_user USING 123e4567-e89b-12d3-a456-426614174000;

Table Design Optimization¶

-- Design tables for your queries
-- Query: Get all posts by user, ordered by date
CREATE TABLE user_posts_by_date (
    user_id UUID,
    post_date DATE,
    post_id TIMEUUID,
    title TEXT,
    content TEXT,
    PRIMARY KEY (user_id, post_date, post_id)
) WITH CLUSTERING ORDER BY (post_date DESC, post_id DESC);

-- Denormalize for read performance
CREATE TABLE user_post_summary (
    user_id UUID,
    post_count COUNTER,
    last_post_date TIMESTAMP,
    PRIMARY KEY (user_id)
);

-- Use appropriate compaction strategy
ALTER TABLE large_table WITH compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
};

-- Use appropriate compression
ALTER TABLE large_table WITH compression = {
    'class': 'LZ4Compressor',
    'chunk_length_in_kb': 64
};

-- Set appropriate bloom filter
ALTER TABLE large_table WITH bloom_filter_fp_chance = 0.01;

-- Configure caching
ALTER TABLE frequently_read_table WITH caching = {
    'keys': 'ALL',
    'rows_per_partition': '100'
};

Performance Monitoring¶

# Monitor read/write latency
nodetool proxyhistograms

# Monitor compaction
nodetool compactionstats

# Monitor thread pools
nodetool tpstats

# Monitor table statistics
nodetool tablestats

# Monitor JVM
nodetool gcstats

# Monitor heap usage
nodetool info

# Enable tracing in CQL
TRACING ON;
SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
TRACING OFF;

# Check slow queries (enable in cassandra.yaml)
# slow_query_log_timeout_in_ms: 500

# Monitor with external tools
# - DataStax OpsCenter
# - Prometheus + Grafana
# - Custom JMX monitoring

Monitoring¶

System Monitoring¶

# Cluster overview
nodetool status
nodetool info
nodetool describecluster

# Node health
nodetool gossipinfo
nodetool version
nodetool uptime

# Performance metrics
nodetool tpstats
nodetool cfstats
nodetool proxyhistograms
nodetool tablehistograms mykeyspace.mytable

# Compaction monitoring
nodetool compactionstats
nodetool compactionhistory

# Memory and GC
nodetool gcstats
nodetool info | grep -E "(Heap|Off-heap)"

# Network and streaming
nodetool netstats
nodetool streamthroughput

# Pending operations
nodetool pendingcompactions
nodetool pendingranges

JMX Monitoring¶

# Enable JMX (in cassandra-env.sh)
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=7199"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=7199"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"

# Connect with JConsole
jconsole localhost:7199

# Key JMX metrics to monitor:
# - org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency
# - org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency
# - org.apache.cassandra.metrics:type=Storage,name=Load
# - org.apache.cassandra.metrics:type=Compaction,name=PendingTasks
# - java.lang:type=Memory
# - java.lang:type=GarbageCollector

Log Monitoring¶

# System log location
tail -f /var/log/cassandra/system.log

# Debug log
tail -f /var/log/cassandra/debug.log

# GC log
tail -f /var/log/cassandra/gc.log

# Key log patterns to monitor:
# - "OutOfMemoryError"
# - "Dropping mutation"
# - "Timeout"
# - "Unable to gossip"
# - "Compaction interrupted"

# Configure log levels (logback.xml)
# DEBUG, INFO, WARN, ERROR

# Monitor specific packages
# <logger name="org.apache.cassandra.service.StorageService" level="DEBUG"/>
# <logger name="org.apache.cassandra.gms.Gossiper" level="INFO"/>

Application Monitoring¶

-- Monitor query performance
TRACING ON;
-- Your queries here
TRACING OFF;

-- Check table statistics
SELECT * FROM system.size_estimates WHERE keyspace_name = 'mykeyspace';

-- Monitor tombstones
-- Enable in cassandra.yaml:
-- tombstone_warn_threshold: 1000
-- tombstone_failure_threshold: 100000

-- Monitor large partitions
-- Enable in cassandra.yaml:
-- compaction_large_partition_warning_threshold_mb: 100

-- Monitor slow queries
-- Enable in cassandra.yaml:
-- slow_query_log_timeout_in_ms: 500

-- Check system tables for monitoring
SELECT * FROM system.local;
SELECT * FROM system.peers;
SELECT * FROM system_schema.keyspaces;
SELECT * FROM system_schema.tables;

Security¶

Authentication¶

-- Enable authentication (in cassandra.yaml)
-- authenticator: PasswordAuthenticator

-- Create superuser (default: cassandra/cassandra)
-- Change default password
ALTER USER cassandra WITH PASSWORD 'new_secure_password';

-- Create new user
CREATE USER app_user WITH PASSWORD 'secure_password' NOSUPERUSER;

-- Create superuser
CREATE USER admin_user WITH PASSWORD 'admin_password' SUPERUSER;

-- Alter user password
ALTER USER app_user WITH PASSWORD 'new_password';

-- List users
LIST USERS;

-- Drop user
DROP USER app_user;

Authorization¶

-- Enable authorization (in cassandra.yaml)
-- authorizer: CassandraAuthorizer

-- Grant permissions
GRANT SELECT ON KEYSPACE mykeyspace TO app_user;
GRANT MODIFY ON KEYSPACE mykeyspace TO app_user;
GRANT CREATE ON KEYSPACE mykeyspace TO app_user;
GRANT DROP ON KEYSPACE mykeyspace TO app_user;
GRANT ALTER ON KEYSPACE mykeyspace TO app_user;

-- Grant table-level permissions
GRANT SELECT ON mykeyspace.users TO app_user;
GRANT MODIFY ON mykeyspace.users TO app_user;

-- Grant all permissions
GRANT ALL PERMISSIONS ON KEYSPACE mykeyspace TO app_user;

-- Revoke permissions
REVOKE SELECT ON KEYSPACE mykeyspace FROM app_user;

-- List permissions
LIST ALL PERMISSIONS;
LIST ALL PERMISSIONS OF app_user;

-- Create role (Cassandra 2.2+)
CREATE ROLE app_role;

-- Grant role to user
GRANT app_role TO app_user;

-- Grant permissions to role
GRANT SELECT ON KEYSPACE mykeyspace TO app_role;

-- List roles
LIST ROLES;

SSL/TLS Encryption¶

# Client-to-node encryption (in cassandra.yaml)
client_encryption_options:
    enabled: true
    optional: false
    keystore: /path/to/keystore.jks
    keystore_password: keystore_password
    truststore: /path/to/truststore.jks
    truststore_password: truststore_password
    protocol: TLS
    algorithm: SunX509
    store_type: JKS
    cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA]

# Node-to-node encryption
server_encryption_options:
    internode_encryption: all
    keystore: /path/to/keystore.jks
    keystore_password: keystore_password
    truststore: /path/to/truststore.jks
    truststore_password: truststore_password
    protocol: TLS
    algorithm: SunX509
    store_type: JKS
    cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA]

Network Security¶

# Configure listen addresses (in cassandra.yaml)
listen_address: 10.0.0.1
rpc_address: 10.0.0.1
broadcast_address: 10.0.0.1
broadcast_rpc_address: 10.0.0.1

# Configure ports
storage_port: 7000
ssl_storage_port: 7001
native_transport_port: 9042
rpc_port: 9160

# Enable native protocol SSL
native_transport_port_ssl: 9142

# Firewall rules (example for iptables)
# Allow Cassandra ports from cluster nodes only
# iptables -A INPUT -p tcp --dport 7000 -s 10.0.0.0/24 -j ACCEPT
# iptables -A INPUT -p tcp --dport 7001 -s 10.0.0.0/24 -j ACCEPT
# iptables -A INPUT -p tcp --dport 9042 -s 10.0.0.0/24 -j ACCEPT

Best Practices¶

Data Modeling Best Practices¶

-- Design tables around queries, not entities
-- Query: Get user's posts ordered by date
CREATE TABLE user_posts (
    user_id UUID,
    post_date DATE,
    post_id TIMEUUID,
    title TEXT,
    content TEXT,
    PRIMARY KEY (user_id, post_date, post_id)
) WITH CLUSTERING ORDER BY (post_date DESC, post_id DESC);

-- Denormalize for performance
-- Instead of joins, duplicate data
CREATE TABLE post_comments (
    post_id UUID,
    comment_id TIMEUUID,
    user_id UUID,
    username TEXT,  -- Denormalized from users table
    comment TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (post_id, comment_id)
);

-- Use appropriate partition sizes
-- Aim for 10MB-100MB partitions
-- Avoid hotspots with good partition key distribution

-- Use time-based partitioning for time series data
CREATE TABLE sensor_data (
    sensor_id UUID,
    year INT,
    month INT,
    day INT,
    timestamp TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    PRIMARY KEY ((sensor_id, year, month, day), timestamp)
);

-- Use counters appropriately
CREATE TABLE page_views (
    page_id TEXT PRIMARY KEY,
    views COUNTER
);

-- Avoid large collections
-- Use separate tables instead of large lists/sets/maps

Query Best Practices¶

-- Always include partition key in WHERE clause
-- Good
SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

-- Bad (requires ALLOW FILTERING)
SELECT * FROM user_posts WHERE title = 'Some Title' ALLOW FILTERING;

-- Use clustering columns for range queries
SELECT * FROM user_posts 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
AND post_date >= '2023-01-01' 
AND post_date <= '2023-12-31';

-- Use LIMIT to control result size
SELECT * FROM users LIMIT 1000;

-- Use prepared statements in applications
-- Reduces parsing overhead and prevents injection attacks

-- Avoid SELECT * in production
-- Select only needed columns
SELECT user_id, username, email FROM users;

-- Use appropriate consistency levels
-- QUORUM for critical reads/writes
-- ONE for non-critical operations
-- LOCAL_QUORUM for multi-datacenter setups

Performance Best Practices¶

-- Choose appropriate compaction strategy
-- SizeTieredCompactionStrategy: Write-heavy workloads
-- LeveledCompactionStrategy: Read-heavy workloads
-- TimeWindowCompactionStrategy: Time series data

ALTER TABLE time_series WITH compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS',
    'compaction_window_size': 1
};

-- Configure appropriate caching
ALTER TABLE frequently_read_table WITH caching = {
    'keys': 'ALL',
    'rows_per_partition': '100'
};

-- Use compression for large tables
ALTER TABLE large_table WITH compression = {
    'class': 'LZ4Compressor',
    'chunk_length_in_kb': 64
};

-- Monitor and tune bloom filters
ALTER TABLE large_table WITH bloom_filter_fp_chance = 0.01;

-- Use TTL for temporary data
INSERT INTO session_data (session_id, user_id, data, created_at)
VALUES (uuid(), 123e4567-e89b-12d3-a456-426614174000, 'session_data', toTimestamp(now()))
USING TTL 3600;  -- 1 hour TTL

Operational Best Practices¶

# Regular maintenance
# Run repair regularly (weekly for critical data)
nodetool repair -inc

# Monitor compaction
nodetool compactionstats

# Clean up after topology changes
nodetool cleanup

# Monitor disk space
df -h

# Monitor logs
tail -f /var/log/cassandra/system.log

# Backup strategy
# Take snapshots before major changes
nodetool snapshot -t pre_upgrade_backup

# Monitor cluster health
nodetool status
nodetool describecluster

# Capacity planning
# Monitor partition sizes
# Plan for 3x growth
# Monitor read/write patterns

# Security practices
# Change default passwords
# Use SSL/TLS encryption
# Implement proper authentication/authorization
# Regular security updates
# Network segmentation
# Monitor access logs

Development Best Practices¶

-- Use UUIDs for primary keys
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    customer_id UUID,
    order_date TIMESTAMP,
    total_amount DECIMAL
);

-- Use TIMEUUID for time-ordered data
CREATE TABLE events (
    event_id TIMEUUID PRIMARY KEY,
    event_type TEXT,
    event_data TEXT,
    created_at TIMESTAMP
);

-- Handle null values appropriately
-- Cassandra treats null as "no value"
-- Use default values or application logic

-- Use batch operations carefully
-- Only for related data in same partition
-- Avoid large batches (>100 statements)

-- Test with realistic data volumes
-- Performance characteristics change with scale
-- Test partition sizes and query patterns

-- Monitor application metrics
-- Track query latency
-- Monitor connection pool usage
-- Track error rates
-- Monitor timeout exceptions

Summary¶

Apache Cassandra is a highly scalable, distributed NoSQL database designed for handling large amounts of data across many commodity servers with no single point of failure. This cheatsheet provides comprehensive coverage of Cassandra operations from basic administration to advanced features.

Key Strengths: - Scalability: Linear scalability with no single point of failure - High Availability: Multi-datacenter replication with tunable consistency - Performance: Optimized for write-heavy workloads with fast reads - Fault Tolerance: Automatic data replication and failure detection - Flexible Data Model: Wide column store supporting various data types

Best Use Cases: - Time series data and IoT applications - Real-time analytics and logging - Content management and catalogs - Messaging and social media platforms - Applications requiring high write throughput

Important Considerations: - Data modeling requires query-first approach - Limited support for complex queries and joins - Eventual consistency model requires careful consideration - Operational complexity increases with cluster size - Requires understanding of distributed systems concepts

By following the practices and techniques outlined in this cheatsheet, you can effectively design, implement, and maintain Cassandra databases that provide high performance, scalability, and availability for modern distributed applications.