Hoja de Referencia de Apache Cassandra¶
Apache Cassandra - Distributed NoSQL Database
Apache Cassandra es un sistema de gestión de bases de datos NoSQL, distribuido, de columnas anchas, gratuito y de código abierto, diseñado para manejar grandes cantidades de datos en múltiples servidores comunes, proporcionando alta disponibilidad sin un punto único de falla.
[This section appears to be empty, so no translation is needed]
Tabla de Contenidos¶
- Instalación
- Comandos Básicos
- Operaciones de Keyspace
- Operaciones de Tablas
- Tipos de Datos
- Operaciones CRUD
- Operaciones de Consulta
- Índices
- Tipos Definidos por el Usuario
- Funciones
- Vistas Materializadas
- Operaciones por Lotes
- Niveles de Consistencia
- Gestión de Clúster
- Optimización de Rendimiento
- Monitoreo
- Seguridad
- Mejores Prácticas
The remaining sections (4-20) would need their specific content to be translated. Would you like me to continue with the translations for those sections?```bash
Add Apache Cassandra repository¶
echo "deb https://debian.cassandra.apache.org 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
Update package index¶
sudo apt update
Install Cassandra¶
sudo apt install cassandra
Start Cassandra service¶
sudo systemctl start cassandra sudo systemctl enable cassandra
Check status¶
sudo systemctl status cassandra
Verify installation¶
nodetool status
### CentOS/RHEL Installation
```bash
# Create repository file
sudo tee /etc/yum.repos.d/cassandra.repo << EOF
[cassandra]
name=Apache Cassandra
baseurl=https://redhat.cassandra.apache.org/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS
EOF
# Install Cassandra
sudo yum install cassandra
# Start Cassandra service
sudo systemctl start cassandra
sudo systemctl enable cassandra
# Check status
nodetool status
Docker Installation¶
# Pull Cassandra image
docker pull cassandra:latest
# Run Cassandra container
docker run --name cassandra-node \
-p 9042:9042 \
-p 7000:7000 \
-p 7001:7001 \
-p 7199:7199 \
-p 9160:9160 \
-d cassandra:latest
# Connect to Cassandra
docker exec -it cassandra-node cqlsh
# Docker Compose for cluster
cat > docker-compose.yml << EOF
version: '3.8'
services:
cassandra-1:
image: cassandra:latest
container_name: cassandra-1
ports:
- "9042:9042"
environment:
- CASSANDRA_CLUSTER_NAME=MyCluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_DC=datacenter1
volumes:
- cassandra-1-data:/var/lib/cassandra
cassandra-2:
image: cassandra:latest
container_name: cassandra-2
environment:
- CASSANDRA_CLUSTER_NAME=MyCluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_DC=datacenter1
- CASSANDRA_SEEDS=cassandra-1
depends_on:
- cassandra-1
volumes:
- cassandra-2-data:/var/lib/cassandra
cassandra-3:
image: cassandra:latest
container_name: cassandra-3
environment:
- CASSANDRA_CLUSTER_NAME=MyCluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_DC=datacenter1
- CASSANDRA_SEEDS=cassandra-1
depends_on:
- cassandra-1
volumes:
- cassandra-3-data:/var/lib/cassandra
volumes:
cassandra-1-data:
cassandra-2-data:
cassandra-3-data:
EOF
docker-compose up -d
Manual Installation¶
# Download Cassandra
wget https://downloads.apache.org/cassandra/4.0.7/apache-cassandra-4.0.7-bin.tar.gz
# Extract
tar -xzf apache-cassandra-4.0.7-bin.tar.gz
sudo mv apache-cassandra-4.0.7 /opt/cassandra
# Set environment variables
echo 'export CASSANDRA_HOME=/opt/cassandra' >> ~/.bashrc
echo 'export PATH=$PATH:$CASSANDRA_HOME/bin' >> ~/.bashrc
source ~/.bashrc
# Start Cassandra
cassandra -f # Foreground mode
# or
cassandra # Background mode
# Connect to Cassandra
cqlsh
Basic Commands¶
Connecting to Cassandra¶
# Connect to local Cassandra
cqlsh
# Connect to remote Cassandra
cqlsh hostname
cqlsh hostname port
# Connect with credentials
cqlsh -u username -p password
# Connect with SSL
cqlsh --ssl
# Execute CQL from file
cqlsh -f script.cql
# Execute single command
cqlsh -e "DESCRIBE KEYSPACES;"
CQL Shell Commands¶
-- Show help
HELP;
-- Show current keyspace
SELECT keyspace_name FROM system.local;
-- Show cluster information
DESCRIBE CLUSTER;
-- Show keyspaces
DESCRIBE KEYSPACES;
-- Show tables in keyspace
DESCRIBE TABLES;
-- Show table structure
DESCRIBE TABLE table_name;
-- Show version
SELECT release_version FROM system.local;
-- Exit CQL shell
EXIT;
QUIT;
-- Enable/disable tracing
TRACING ON;
TRACING OFF;
-- Show query execution time
CONSISTENCY;
System Information¶
-- Show cluster information
SELECT cluster_name, listen_address, data_center, rack, release_version
FROM system.local;
-- Show all nodes
SELECT peer, data_center, rack, release_version, tokens
FROM system.peers;
-- Show keyspace information
SELECT keyspace_name, durable_writes, replication
FROM system_schema.keyspaces;
-- Show table information
SELECT keyspace_name, table_name, bloom_filter_fp_chance,
caching, comment, compaction, compression
FROM system_schema.tables
WHERE keyspace_name = 'mykeyspace';
-- Show column information
SELECT keyspace_name, table_name, column_name, clustering_order,
column_name_bytes, kind, position, type
FROM system_schema.columns
WHERE keyspace_name = 'mykeyspace' AND table_name = 'mytable';
Keyspace Operations¶
Creating Keyspaces¶
-- Simple keyspace creation
CREATE KEYSPACE mykeyspace
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
-- Keyspace with NetworkTopologyStrategy
CREATE KEYSPACE mykeyspace
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3,
'datacenter2': 2
}
AND DURABLE_WRITES = true;
-- Keyspace for development (single node)
CREATE KEYSPACE dev_keyspace
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
-- Keyspace with custom options
CREATE KEYSPACE analytics
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
}
AND DURABLE_WRITES = false;
Managing Keyspaces¶
-- Use keyspace
USE mykeyspace;
-- Alter keyspace replication
ALTER KEYSPACE mykeyspace
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 5,
'datacenter2': 3
};
-- Alter durable writes
ALTER KEYSPACE mykeyspace
WITH DURABLE_WRITES = false;
-- Drop keyspace
DROP KEYSPACE mykeyspace;
-- Show keyspace details
DESCRIBE KEYSPACE mykeyspace;
-- Show all keyspaces
SELECT keyspace_name, durable_writes, replication
FROM system_schema.keyspaces;
Table Operations¶
Creating Tables¶
-- Basic table creation
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
first_name TEXT,
last_name TEXT,
created_at TIMESTAMP
);
-- Table with composite primary key
CREATE TABLE user_posts (
user_id UUID,
post_id TIMEUUID,
title TEXT,
content TEXT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, post_id)
);
-- Table with clustering order
CREATE TABLE time_series (
sensor_id UUID,
timestamp TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY (sensor_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
-- Table with multiple clustering columns
CREATE TABLE user_activities (
user_id UUID,
year INT,
month INT,
day INT,
activity_id TIMEUUID,
activity_type TEXT,
description TEXT,
PRIMARY KEY (user_id, year, month, day, activity_id)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, activity_id DESC);
-- Table with options
CREATE TABLE products (
product_id UUID PRIMARY KEY,
name TEXT,
description TEXT,
price DECIMAL,
category TEXT,
created_at TIMESTAMP
) WITH comment = 'Product catalog table'
AND gc_grace_seconds = 864000
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': 32, 'min_threshold': 4}
AND compression = {'chunk_length_in_kb': 64, 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'};
Modifying Tables¶
-- Add column
ALTER TABLE users ADD phone TEXT;
-- Add multiple columns
ALTER TABLE users ADD (
address TEXT,
city TEXT,
country TEXT
);
-- Drop column
ALTER TABLE users DROP phone;
-- Rename column (Cassandra 3.0+)
ALTER TABLE users RENAME email TO email_address;
-- Alter column type (limited support)
ALTER TABLE users ALTER created_at TYPE TIMESTAMP;
-- Add table options
ALTER TABLE users WITH comment = 'User information table';
-- Modify compaction strategy
ALTER TABLE users WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
};
-- Modify compression
ALTER TABLE users WITH compression = {
'class': 'DeflateCompressor',
'chunk_length_in_kb': 64
};
-- Drop table
DROP TABLE users;
Table Information¶
-- Describe table
DESCRIBE TABLE users;
-- Show table schema
SELECT keyspace_name, table_name, column_name, clustering_order,
column_name_bytes, kind, position, type
FROM system_schema.columns
WHERE keyspace_name = 'mykeyspace' AND table_name = 'users';
-- Show table options
SELECT keyspace_name, table_name, bloom_filter_fp_chance,
caching, comment, compaction, compression,
crc_check_chance, dclocal_read_repair_chance,
default_time_to_live, gc_grace_seconds,
max_index_interval, memtable_flush_period_in_ms,
min_index_interval, read_repair_chance, speculative_retry
FROM system_schema.tables
WHERE keyspace_name = 'mykeyspace' AND table_name = 'users';
Data Types¶
Basic Data Types¶
-- Numeric types
CREATE TABLE numeric_examples (
id UUID PRIMARY KEY,
tiny_int_val TINYINT, -- 8-bit signed integer
small_int_val SMALLINT, -- 16-bit signed integer
int_val INT, -- 32-bit signed integer
big_int_val BIGINT, -- 64-bit signed integer
varint_val VARINT, -- Arbitrary precision integer
decimal_val DECIMAL, -- Variable precision decimal
float_val FLOAT, -- 32-bit IEEE-754 floating point
double_val DOUBLE -- 64-bit IEEE-754 floating point
);
-- Text and binary types
CREATE TABLE text_examples (
id UUID PRIMARY KEY,
ascii_val ASCII, -- ASCII string
text_val TEXT, -- UTF-8 string
varchar_val VARCHAR, -- UTF-8 string (alias for TEXT)
blob_val BLOB -- Binary data
);
-- Date and time types
CREATE TABLE datetime_examples (
id UUID PRIMARY KEY,
date_val DATE, -- Date without time
time_val TIME, -- Time without date
timestamp_val TIMESTAMP, -- Date and time
duration_val DURATION -- Duration (Cassandra 3.10+)
);
-- Boolean and UUID types
CREATE TABLE other_examples (
id UUID PRIMARY KEY,
bool_val BOOLEAN, -- True or false
uuid_val UUID, -- Random UUID
timeuuid_val TIMEUUID, -- Time-based UUID
inet_val INET -- IP address (IPv4 or IPv6)
);
Collection Data Types¶
-- Set type
CREATE TABLE user_interests (
user_id UUID PRIMARY KEY,
interests SET<TEXT>
);
-- List type
CREATE TABLE user_addresses (
user_id UUID PRIMARY KEY,
addresses LIST<TEXT>
);
-- Map type
CREATE TABLE user_attributes (
user_id UUID PRIMARY KEY,
attributes MAP<TEXT, TEXT>
);
-- Frozen collections
CREATE TABLE user_profile (
user_id UUID PRIMARY KEY,
tags FROZEN<SET<TEXT>>,
metadata FROZEN<MAP<TEXT, TEXT>>
);
-- Nested collections (with FROZEN)
CREATE TABLE complex_data (
id UUID PRIMARY KEY,
nested_map MAP<TEXT, FROZEN<LIST<TEXT>>>,
list_of_sets LIST<FROZEN<SET<INT>>>
);
User Defined Types¶
-- Create user defined type
CREATE TYPE address (
street TEXT,
city TEXT,
state TEXT,
zip_code TEXT,
country TEXT
);
-- Create another UDT
CREATE TYPE phone (
country_code TEXT,
number TEXT,
type TEXT
);
-- Use UDT in table
CREATE TABLE contacts (
contact_id UUID PRIMARY KEY,
name TEXT,
home_address address,
work_address address,
phones LIST<phone>
);
-- Frozen UDT
CREATE TABLE user_profiles (
user_id UUID PRIMARY KEY,
personal_info FROZEN<address>,
contact_numbers FROZEN<LIST<phone>>
);
CRUD Operations¶
Insert Operations¶
-- Basic insert
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'john_doe', 'john@example.com', 'John', 'Doe', toTimestamp(now()));
-- Insert with TTL (Time To Live)
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'jane_doe', 'jane@example.com', 'Jane', 'Doe', toTimestamp(now()))
USING TTL 86400;
-- Insert with timestamp
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'bob_smith', 'bob@example.com', 'Bob', 'Smith', toTimestamp(now()))
USING TIMESTAMP 1640995200000000;
-- Insert into collection columns
INSERT INTO user_interests (user_id, interests)
VALUES (uuid(), {'technology', 'music', 'sports'});
INSERT INTO user_addresses (user_id, addresses)
VALUES (uuid(), ['123 Main St', '456 Oak Ave', '789 Pine Rd']);
INSERT INTO user_attributes (user_id, attributes)
VALUES (uuid(), {'theme': 'dark', 'language': 'en', 'timezone': 'UTC'});
-- Insert with UDT
INSERT INTO contacts (contact_id, name, home_address, phones)
VALUES (
uuid(),
'John Doe',
{street: '123 Main St', city: 'New York', state: 'NY', zip_code: '10001', country: 'USA'},
[{country_code: '+1', number: '555-1234', type: 'mobile'}]
);
-- Conditional insert (IF NOT EXISTS)
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'unique_user', 'unique@example.com', 'Unique', 'User', toTimestamp(now()))
IF NOT EXISTS;
Select Operations¶
-- Basic select
SELECT * FROM users;
-- Select specific columns
SELECT user_id, username, email FROM users;
-- Select with WHERE clause (must include partition key)
SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Select with clustering column conditions
SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
SELECT * FROM user_posts
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
AND post_id > minTimeuuid('2023-01-01');
-- Select with LIMIT
SELECT * FROM users LIMIT 10;
-- Select with ORDER BY (only on clustering columns)
SELECT * FROM user_posts
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
ORDER BY post_id DESC;
-- Select with ALLOW FILTERING (use carefully)
SELECT * FROM users WHERE email = 'john@example.com' ALLOW FILTERING;
-- Select with token function for pagination
SELECT * FROM users WHERE token(user_id) > token(123e4567-e89b-12d3-a456-426614174000);
-- Select collection elements
SELECT user_id, interests FROM user_interests;
SELECT user_id, interests[0] FROM user_interests; -- First element of set/list
-- Select from UDT
SELECT contact_id, name, home_address.city, home_address.state FROM contacts;
-- Select with functions
SELECT user_id, username, toTimestamp(now()) as current_time FROM users;
SELECT user_id, username, dateOf(user_id) as creation_date FROM users WHERE user_id = timeuuid();
Update Operations¶
-- Basic update
UPDATE users
SET email = 'newemail@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update multiple columns
UPDATE users
SET email = 'john.doe@example.com',
first_name = 'Jonathan'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update with TTL
UPDATE users USING TTL 3600
SET email = 'temp@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update with timestamp
UPDATE users USING TIMESTAMP 1640995200000000
SET email = 'timestamped@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update collection - add to set
UPDATE user_interests
SET interests = interests + {'reading'}
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update collection - remove from set
UPDATE user_interests
SET interests = interests - {'sports'}
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update collection - append to list
UPDATE user_addresses
SET addresses = addresses + ['999 New St']
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update collection - prepend to list
UPDATE user_addresses
SET addresses = ['000 First St'] + addresses
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update map
UPDATE user_attributes
SET attributes['theme'] = 'light'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update UDT field
UPDATE contacts
SET home_address.city = 'Los Angeles'
WHERE contact_id = 123e4567-e89b-12d3-a456-426614174000;
-- Conditional update
UPDATE users
SET email = 'conditional@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
IF email = 'old@example.com';
-- Update with IF EXISTS
UPDATE users
SET email = 'exists@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
IF EXISTS;
Delete Operations¶
-- Delete entire row
DELETE FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete specific columns
DELETE email, phone FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete with timestamp
DELETE FROM users USING TIMESTAMP 1640995200000000
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete from collection
DELETE interests['technology'] FROM user_interests
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete list element by index
DELETE addresses[0] FROM user_addresses
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete map key
DELETE attributes['theme'] FROM user_attributes
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete UDT field
DELETE home_address.phone FROM contacts
WHERE contact_id = 123e4567-e89b-12d3-a456-426614174000;
-- Conditional delete
DELETE FROM users
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
IF email = 'delete@example.com';
-- Delete with IF EXISTS
DELETE FROM users
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
IF EXISTS;
-- Delete range (clustering columns)
DELETE FROM user_posts
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
AND post_id > minTimeuuid('2023-01-01')
AND post_id < maxTimeuuid('2023-12-31');
Query Operations¶
Advanced Querying¶
-- Range queries on clustering columns
SELECT * FROM time_series
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000
AND timestamp >= '2023-01-01'
AND timestamp < '2023-02-01';
-- IN queries
SELECT * FROM users WHERE user_id IN (
123e4567-e89b-12d3-a456-426614174000,
456e7890-e89b-12d3-a456-426614174001,
789e0123-e89b-12d3-a456-426614174002
);
-- Token-based queries for pagination
SELECT * FROM users WHERE token(user_id) > token(123e4567-e89b-12d3-a456-426614174000) LIMIT 1000;
-- CONTAINS queries (requires SASI or secondary index)
SELECT * FROM user_interests WHERE interests CONTAINS 'technology' ALLOW FILTERING;
-- CONTAINS KEY queries for maps
SELECT * FROM user_attributes WHERE attributes CONTAINS KEY 'theme' ALLOW FILTERING;
-- Slice queries
SELECT * FROM user_activities
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
AND year = 2023
AND month >= 6;
-- Multi-column slice
SELECT * FROM user_activities
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
AND (year, month) >= (2023, 6)
AND (year, month) <= (2023, 12);
Aggregation Functions¶
-- Count
SELECT COUNT(*) FROM users;
-- Count with WHERE clause
SELECT COUNT(*) FROM user_posts
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Min and Max (Cassandra 2.2+)
SELECT MIN(temperature), MAX(temperature) FROM time_series
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000;
-- Sum and Average (Cassandra 2.2+)
SELECT SUM(temperature), AVG(temperature) FROM time_series
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000;
-- Group by (limited support)
SELECT sensor_id, COUNT(*) FROM time_series GROUP BY sensor_id;
Built-in Functions¶
-- UUID functions
SELECT uuid() as new_uuid;
SELECT now() as current_timeuuid;
SELECT minTimeuuid('2023-01-01') as min_uuid;
SELECT maxTimeuuid('2023-12-31') as max_uuid;
-- Date and time functions
SELECT toTimestamp(now()) as current_timestamp;
SELECT dateOf(now()) as current_date;
SELECT unixTimestampOf(now()) as unix_timestamp;
-- Blob functions
SELECT textAsBlob('hello') as text_blob;
SELECT blobAsText(textAsBlob('hello')) as blob_text;
-- Token function
SELECT token(user_id) FROM users;
-- TTL and WRITETIME functions
SELECT TTL(email), WRITETIME(email) FROM users
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Type conversion functions
SELECT toJson(interests) FROM user_interests;
SELECT fromJson('["technology", "music"]') as parsed_interests;
Indexes¶
Secondary Indexes¶
-- Create secondary index
CREATE INDEX ON users (email);
-- Create named index
CREATE INDEX user_email_idx ON users (email);
-- Create index on collection
CREATE INDEX ON user_interests (interests);
-- Create index on map keys
CREATE INDEX ON user_attributes (KEYS(attributes));
-- Create index on map values
CREATE INDEX ON user_attributes (VALUES(attributes));
-- Create index on map entries
CREATE INDEX ON user_attributes (ENTRIES(attributes));
-- Create index on UDT field
CREATE INDEX ON contacts (home_address);
-- Show indexes
DESCRIBE INDEX user_email_idx;
-- Drop index
DROP INDEX user_email_idx;
SASI Indexes (Cassandra 3.4+)¶
-- Create SASI index for text search
CREATE CUSTOM INDEX user_name_sasi_idx ON users (first_name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false'
};
-- Create SASI index for numeric range queries
CREATE CUSTOM INDEX product_price_sasi_idx ON products (price)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'SPARSE'};
-- Query with SASI index
SELECT * FROM users WHERE first_name LIKE '%john%';
SELECT * FROM products WHERE price >= 100 AND price <= 500;
User Defined Types¶
Creating and Managing UDTs¶
-- Create UDT
CREATE TYPE address (
street TEXT,
city TEXT,
state TEXT,
zip_code TEXT,
country TEXT
);
-- Create nested UDT
CREATE TYPE contact_info (
email TEXT,
phone TEXT,
address address
);
-- Alter UDT - add field
ALTER TYPE address ADD region TEXT;
-- Alter UDT - rename field
ALTER TYPE address RENAME zip_code TO postal_code;
-- Show UDT
DESCRIBE TYPE address;
-- Drop UDT
DROP TYPE address;
-- Use UDT in table
CREATE TABLE customers (
customer_id UUID PRIMARY KEY,
name TEXT,
billing_address address,
shipping_address address,
contact contact_info
);
-- Insert with UDT
INSERT INTO customers (customer_id, name, billing_address, contact)
VALUES (
uuid(),
'John Doe',
{
street: '123 Main St',
city: 'New York',
state: 'NY',
postal_code: '10001',
country: 'USA',
region: 'Northeast'
},
{
email: 'john@example.com',
phone: '+1-555-1234',
address: {
street: '456 Work Ave',
city: 'New York',
state: 'NY',
postal_code: '10002',
country: 'USA',
region: 'Northeast'
}
}
);
-- Query UDT fields
SELECT customer_id, name, billing_address.city, contact.email FROM customers;
-- Update UDT field
UPDATE customers
SET billing_address.city = 'Los Angeles'
WHERE customer_id = 123e4567-e89b-12d3-a456-426614174000;
Functions¶
User Defined Functions (UDF)¶
-- Enable UDF (in cassandra.yaml: enable_user_defined_functions: true)
-- Create simple UDF
CREATE OR REPLACE FUNCTION fLog (input double)
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java
AS 'return Double.valueOf(Math.log(input.doubleValue()));';
-- Create UDF with multiple parameters
CREATE OR REPLACE FUNCTION fAdd (x int, y int)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java
AS 'return Integer.valueOf(x.intValue() + y.intValue());';
-- Create UDF for text processing
CREATE OR REPLACE FUNCTION toUpperCase (input text)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS 'return input.toUpperCase();';
-- Create UDF for collection processing
CREATE OR REPLACE FUNCTION listSize (input list<text>)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java
AS 'return Integer.valueOf(input.size());';
-- Use UDF in queries
SELECT user_id, fAdd(5, 10) as sum FROM users;
SELECT user_id, toUpperCase(first_name) as upper_name FROM users;
SELECT user_id, listSize(addresses) as address_count FROM user_addresses;
-- Show functions
DESCRIBE FUNCTION fLog;
-- Drop function
DROP FUNCTION fLog;
User Defined Aggregates (UDA)¶
-- Create state function for UDA
CREATE OR REPLACE FUNCTION avgState (state tuple<int,bigint>, val int)
CALLED ON NULL INPUT
RETURNS tuple<int,bigint>
LANGUAGE java
AS '
if (val != null) {
state.setInt(0, state.getInt(0) + 1);
state.setLong(1, state.getLong(1) + val.intValue());
}
return state;
';
-- Create final function for UDA
CREATE OR REPLACE FUNCTION avgFinal (state tuple<int,bigint>)
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java
AS '
double r = 0;
if (state.getInt(0) == 0) return null;
r = state.getLong(1);
r /= state.getInt(0);
return Double.valueOf(r);
';
-- Create UDA
CREATE OR REPLACE AGGREGATE average (int)
SFUNC avgState
STYPE tuple<int,bigint>
FINALFUNC avgFinal
INITCOND (0, 0);
-- Use UDA
SELECT sensor_id, average(temperature) FROM time_series GROUP BY sensor_id;
-- Drop UDA
DROP AGGREGATE average;
Materialized Views¶
Creating Materialized Views¶
-- Create materialized view
CREATE MATERIALIZED VIEW users_by_email AS
SELECT user_id, username, email, first_name, last_name, created_at
FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
-- Materialized view with different clustering
CREATE MATERIALIZED VIEW posts_by_title AS
SELECT user_id, post_id, title, content, created_at
FROM user_posts
WHERE title IS NOT NULL AND user_id IS NOT NULL AND post_id IS NOT NULL
PRIMARY KEY (title, user_id, post_id);
-- Materialized view with filtering
CREATE MATERIALIZED VIEW recent_posts AS
SELECT user_id, post_id, title, content, created_at
FROM user_posts
WHERE created_at IS NOT NULL AND user_id IS NOT NULL AND post_id IS NOT NULL
AND created_at >= '2023-01-01'
PRIMARY KEY (created_at, user_id, post_id)
WITH CLUSTERING ORDER BY (user_id ASC, post_id DESC);
-- Show materialized view
DESCRIBE MATERIALIZED VIEW users_by_email;
-- Query materialized view
SELECT * FROM users_by_email WHERE email = 'john@example.com';
-- Drop materialized view
DROP MATERIALIZED VIEW users_by_email;
Materialized View Considerations¶
-- Materialized views are automatically maintained
-- Base table changes are propagated to views
-- Insert into base table
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'new_user', 'new@example.com', 'New', 'User', toTimestamp(now()));
-- Query view to see automatic update
SELECT * FROM users_by_email WHERE email = 'new@example.com';
-- Update base table
UPDATE users SET first_name = 'Updated' WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- View is automatically updated
SELECT * FROM users_by_email WHERE email = 'john@example.com';
-- Show all materialized views
SELECT keyspace_name, view_name, base_table_name
FROM system_schema.views;
Batch Operations¶
Batch Statements¶
-- Logged batch (default)
BEGIN BATCH
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'batch_user1', 'batch1@example.com', 'Batch', 'User1', toTimestamp(now()));
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'batch_user2', 'batch2@example.com', 'Batch', 'User2', toTimestamp(now()));
UPDATE users SET email = 'updated@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
APPLY BATCH;
-- Unlogged batch (better performance, no atomicity guarantee)
BEGIN UNLOGGED BATCH
INSERT INTO user_posts (user_id, post_id, title, content, created_at)
VALUES (123e4567-e89b-12d3-a456-426614174000, now(), 'Post 1', 'Content 1', toTimestamp(now()));
INSERT INTO user_posts (user_id, post_id, title, content, created_at)
VALUES (123e4567-e89b-12d3-a456-426614174000, now(), 'Post 2', 'Content 2', toTimestamp(now()));
APPLY BATCH;
-- Batch with TTL and timestamp
BEGIN BATCH USING TTL 3600 AND TIMESTAMP 1640995200000000
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'temp_user', 'temp@example.com', 'Temp', 'User', toTimestamp(now()));
UPDATE users SET phone = '+1-555-9999'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
APPLY BATCH;
-- Conditional batch
BEGIN BATCH
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'conditional_user', 'conditional@example.com', 'Conditional', 'User', toTimestamp(now()))
IF NOT EXISTS;
UPDATE users SET email = 'conditional_update@example.com'
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000
IF email = 'old@example.com';
APPLY BATCH;
-- Counter batch
BEGIN COUNTER BATCH
UPDATE page_views SET views = views + 1 WHERE page_id = 'home';
UPDATE page_views SET views = views + 1 WHERE page_id = 'about';
APPLY BATCH;
Consistency Levels¶
Setting Consistency Levels¶
-- Show current consistency level
CONSISTENCY;
-- Set consistency level for reads
CONSISTENCY ONE;
CONSISTENCY QUORUM;
CONSISTENCY ALL;
CONSISTENCY LOCAL_QUORUM;
CONSISTENCY EACH_QUORUM;
CONSISTENCY LOCAL_ONE;
-- Set consistency level for writes
CONSISTENCY ONE;
CONSISTENCY QUORUM;
CONSISTENCY ALL;
CONSISTENCY LOCAL_QUORUM;
CONSISTENCY EACH_QUORUM;
-- Consistency levels explanation:
-- ONE: One replica responds
-- QUORUM: Majority of replicas respond
-- ALL: All replicas respond
-- LOCAL_QUORUM: Majority of replicas in local datacenter
-- EACH_QUORUM: Majority of replicas in each datacenter
-- LOCAL_ONE: One replica in local datacenter
-- Example queries with different consistency levels
CONSISTENCY QUORUM;
SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
CONSISTENCY LOCAL_QUORUM;
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'consistent_user', 'consistent@example.com', 'Consistent', 'User', toTimestamp(now()));
-- Serial consistency for lightweight transactions
SERIAL CONSISTENCY SERIAL;
SERIAL CONSISTENCY LOCAL_SERIAL;
-- Lightweight transaction with serial consistency
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'lwt_user', 'lwt@example.com', 'LWT', 'User', toTimestamp(now()))
IF NOT EXISTS;
Cluster Management¶
Node Operations¶
# Check cluster status
nodetool status
# Check node information
nodetool info
# Check ring information
nodetool ring
# Describe cluster
nodetool describecluster
# Check gossip information
nodetool gossipinfo
# Flush memtables to disk
nodetool flush
# Compact SSTables
nodetool compact
# Cleanup after replication changes
nodetool cleanup
# Repair data
nodetool repair
# Repair specific keyspace
nodetool repair mykeyspace
# Repair specific table
nodetool repair mykeyspace mytable
# Incremental repair
nodetool repair -inc
# Full repair
nodetool repair -full
Cluster Monitoring¶
# Check node stats
nodetool netstats
# Check compaction stats
nodetool compactionstats
# Check thread pool stats
nodetool tpstats
# Check histogram stats
nodetool cfstats
# Check table stats
nodetool tablestats mykeyspace.mytable
# Check proxyhistograms
nodetool proxyhistograms
# Check gossip state
nodetool gossipinfo
# Check schema version
nodetool describecluster
# Check pending tasks
nodetool pendingcompactions
# Check streaming operations
nodetool netstats
Backup and Snapshot¶
# Create snapshot
nodetool snapshot
# Create snapshot with name
nodetool snapshot -t backup_20231201
# Create snapshot for specific keyspace
nodetool snapshot mykeyspace
# List snapshots
nodetool listsnapshots
# Clear snapshot
nodetool clearsnapshot
# Clear specific snapshot
nodetool clearsnapshot -t backup_20231201
# Incremental backup (enable in cassandra.yaml)
# incremental_backups: true
# Restore from snapshot
# 1. Stop Cassandra
# 2. Clear data directory
# 3. Copy snapshot files to data directory
# 4. Start Cassandra
# 5. Run nodetool refresh
Adding/Removing Nodes¶
# Add new node to cluster
# 1. Install Cassandra on new node
# 2. Configure cassandra.yaml with cluster settings
# 3. Set seeds to existing nodes
# 4. Start Cassandra
# 5. Check status: nodetool status
# Bootstrap new node
nodetool bootstrap
# Decommission node (graceful removal)
nodetool decommission
# Remove dead node (forced removal)
nodetool removenode <node_id>
# Move node to new token
nodetool move <new_token>
# Drain node before shutdown
nodetool drain
# Stop gossip
nodetool disablegossip
# Start gossip
nodetool enablegossip
# Stop thrift
nodetool disablethrift
# Start thrift
nodetool enablethrift
Performance Tuning¶
Query Optimization¶
-- Use appropriate partition keys
-- Good: Query by partition key
SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Bad: Query without partition key (requires ALLOW FILTERING)
SELECT * FROM user_posts WHERE title = 'Some Title' ALLOW FILTERING;
-- Use clustering columns for range queries
SELECT * FROM time_series
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000
AND timestamp >= '2023-01-01'
AND timestamp < '2023-02-01';
-- Limit result size
SELECT * FROM users LIMIT 1000;
-- Use token-based pagination
SELECT * FROM users WHERE token(user_id) > token(123e4567-e89b-12d3-a456-426614174000) LIMIT 1000;
-- Avoid ALLOW FILTERING when possible
-- Create secondary index instead
CREATE INDEX ON users (email);
SELECT * FROM users WHERE email = 'john@example.com';
-- Use prepared statements in applications
PREPARE get_user AS
SELECT * FROM users WHERE user_id = ?;
EXECUTE get_user USING 123e4567-e89b-12d3-a456-426614174000;
Table Design Optimization¶
-- Design tables for your queries
-- Query: Get all posts by user, ordered by date
CREATE TABLE user_posts_by_date (
user_id UUID,
post_date DATE,
post_id TIMEUUID,
title TEXT,
content TEXT,
PRIMARY KEY (user_id, post_date, post_id)
) WITH CLUSTERING ORDER BY (post_date DESC, post_id DESC);
-- Denormalize for read performance
CREATE TABLE user_post_summary (
user_id UUID,
post_count COUNTER,
last_post_date TIMESTAMP,
PRIMARY KEY (user_id)
);
-- Use appropriate compaction strategy
ALTER TABLE large_table WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
};
-- Use appropriate compression
ALTER TABLE large_table WITH compression = {
'class': 'LZ4Compressor',
'chunk_length_in_kb': 64
};
-- Set appropriate bloom filter
ALTER TABLE large_table WITH bloom_filter_fp_chance = 0.01;
-- Configure caching
ALTER TABLE frequently_read_table WITH caching = {
'keys': 'ALL',
'rows_per_partition': '100'
};
Performance Monitoring¶
```bash
Monitor read/write latency¶
nodetool proxyhistograms
Monitor compaction¶
nodetool compactionstats
Monitor thread pools¶
nodetool tpstats
Monitor table statistics¶
nodetool tablestats
Monitor JVM¶
nodetool gcstats
Monitor heap usage¶
nodetool info
Enable tracing in CQL¶
TRACING ON; SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000; TRACING OFF;
Check slow queries (enable in cassandra.yaml)¶
slow_query_log_timeout_in_ms: 500¶
Monitor with external tools¶
- DataStax OpsCenter¶
- Prometheus + Grafana¶
- Custom JMX monitoring¶
## Monitoreobash
Cluster overview¶
nodetool status nodetool info nodetool describecluster
Node health¶
nodetool gossipinfo nodetool version nodetool uptime
Performance metrics¶
nodetool tpstats nodetool cfstats nodetool proxyhistograms nodetool tablehistograms mykeyspace.mytable
Compaction monitoring¶
nodetool compactionstats nodetool compactionhistory
Memory and GC¶
nodetool gcstats nodetool info | grep -E "(Heap|Off-heap)"
Network and streaming¶
nodetool netstats nodetool streamthroughput
Pending operations¶
nodetool pendingcompactions
nodetool pendingranges
### Monitoreo JMXbash
Enable JMX (in cassandra-env.sh)¶
JVM_OPTS="\(JVM_OPTS -Dcom.sun.management.jmxremote" JVM_OPTS="\)JVM_OPTS -Dcom.sun.management.jmxremote.port=7199" JVM_OPTS="\(JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=7199" JVM_OPTS="\)JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false" JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
Connect with JConsole¶
jconsole localhost:7199
Key JMX metrics to monitor:¶
- org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency¶
- org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency¶
- org.apache.cassandra.metrics:type=Storage,name=Load¶
- org.apache.cassandra.metrics:type=Compaction,name=PendingTasks¶
- java.lang:type=Memory¶
- java.lang:type=GarbageCollector¶
### Monitoreo de Registrosbash
System log location¶
tail -f /var/log/cassandra/system.log
Debug log¶
tail -f /var/log/cassandra/debug.log
GC log¶
tail -f /var/log/cassandra/gc.log
Key log patterns to monitor:¶
- "OutOfMemoryError"¶
- "Dropping mutation"¶
- "Timeout"¶
- "Unable to gossip"¶
- "Compaction interrupted"¶
Configure log levels (logback.xml)¶
DEBUG, INFO, WARN, ERROR¶
Monitor specific packages¶
¶
¶
### Monitoreo de Aplicacionescql
-- Monitor query performance
TRACING ON;
-- Your queries here
TRACING OFF;
-- Check table statistics SELECT * FROM system.size_estimates WHERE keyspace_name = 'mykeyspace';
-- Monitor tombstones -- Enable in cassandra.yaml: -- tombstone_warn_threshold: 1000 -- tombstone_failure_threshold: 100000
-- Monitor large partitions -- Enable in cassandra.yaml: -- compaction_large_partition_warning_threshold_mb: 100
-- Monitor slow queries -- Enable in cassandra.yaml: -- slow_query_log_timeout_in_ms: 500
-- Check system tables for monitoring
SELECT * FROM system.local;
SELECT * FROM system.peers;
SELECT * FROM system_schema.keyspaces;
SELECT * FROM system_schema.tables;
## Seguridadcql
-- Enable authentication (in cassandra.yaml)
-- authenticator: PasswordAuthenticator
-- Create superuser (default: cassandra/cassandra) -- Change default password ALTER USER cassandra WITH PASSWORD 'new_secure_password';
-- Create new user CREATE USER app_user WITH PASSWORD 'secure_password' NOSUPERUSER;
-- Create superuser CREATE USER admin_user WITH PASSWORD 'admin_password' SUPERUSER;
-- Alter user password ALTER USER app_user WITH PASSWORD 'new_password';
-- List users LIST USERS;
-- Drop user
DROP USER app_user;
### Autenticacióncql
-- Enable authorization (in cassandra.yaml)
-- authorizer: CassandraAuthorizer
-- Grant permissions GRANT SELECT ON KEYSPACE mykeyspace TO app_user; GRANT MODIFY ON KEYSPACE mykeyspace TO app_user; GRANT CREATE ON KEYSPACE mykeyspace TO app_user; GRANT DROP ON KEYSPACE mykeyspace TO app_user; GRANT ALTER ON KEYSPACE mykeyspace TO app_user;
-- Grant table-level permissions GRANT SELECT ON mykeyspace.users TO app_user; GRANT MODIFY ON mykeyspace.users TO app_user;
-- Grant all permissions GRANT ALL PERMISSIONS ON KEYSPACE mykeyspace TO app_user;
-- Revoke permissions REVOKE SELECT ON KEYSPACE mykeyspace FROM app_user;
-- List permissions LIST ALL PERMISSIONS; LIST ALL PERMISSIONS OF app_user;
-- Create role (Cassandra 2.2+) CREATE ROLE app_role;
-- Grant role to user GRANT app_role TO app_user;
-- Grant permissions to role GRANT SELECT ON KEYSPACE mykeyspace TO app_role;
-- List roles
LIST ROLES;
### Autorizaciónyaml
Client-to-node encryption (in cassandra.yaml)¶
client_encryption_options: enabled: true optional: false keystore: /path/to/keystore.jks keystore_password: keystore_password truststore: /path/to/truststore.jks truststore_password: truststore_password protocol: TLS algorithm: SunX509 store_type: JKS cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA]
Node-to-node encryption¶
server_encryption_options:
internode_encryption: all
keystore: /path/to/keystore.jks
keystore_password: keystore_password
truststore: /path/to/truststore.jks
truststore_password: truststore_password
protocol: TLS
algorithm: SunX509
store_type: JKS
cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA]
### Cifrado SSL/TLSyaml
Configure listen addresses (in cassandra.yaml)¶
listen_address: 10.0.0.1 rpc_address: 10.0.0.1 broadcast_address: 10.0.0.1 broadcast_rpc_address: 10.0.0.1
Configure ports¶
storage_port: 7000 ssl_storage_port: 7001 native_transport_port: 9042 rpc_port: 9160
Enable native protocol SSL¶
native_transport_port_ssl: 9142
Firewall rules (example for iptables)¶
Allow Cassandra ports from cluster nodes only¶
iptables -A INPUT -p tcp --dport 7000 -s 10.0.0.0/24 -j ACCEPT¶
iptables -A INPUT -p tcp --dport 7001 -s 10.0.0.0/24 -j ACCEPT¶
iptables -A INPUT -p tcp --dport 9042 -s 10.0.0.0/24 -j ACCEPT¶
### Seguridad de Redcql
-- Design tables around queries, not entities
-- Query: Get user's posts ordered by date
CREATE TABLE user_posts (
user_id UUID,
post_date DATE,
post_id TIMEUUID,
title TEXT,
content TEXT,
PRIMARY KEY (user_id, post_date, post_id)
) WITH CLUSTERING ORDER BY (post_date DESC, post_id DESC);
-- Denormalize for performance -- Instead of joins, duplicate data CREATE TABLE post_comments ( post_id UUID, comment_id TIMEUUID, user_id UUID, username TEXT, -- Denormalized from users table comment TEXT, created_at TIMESTAMP, PRIMARY KEY (post_id, comment_id) );
-- Use appropriate partition sizes -- Aim for 10MB-100MB partitions -- Avoid hotspots with good partition key distribution
-- Use time-based partitioning for time series data CREATE TABLE sensor_data ( sensor_id UUID, year INT, month INT, day INT, timestamp TIMESTAMP, temperature DOUBLE, humidity DOUBLE, PRIMARY KEY ((sensor_id, year, month, day), timestamp) );
-- Use counters appropriately CREATE TABLE page_views ( page_id TEXT PRIMARY KEY, views COUNTER );
-- Avoid large collections
-- Use separate tables instead of large lists/sets/maps
## Mejores Prácticascql
-- Always include partition key in WHERE clause
-- Good
SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Bad (requires ALLOW FILTERING) SELECT * FROM user_posts WHERE title = 'Some Title' ALLOW FILTERING;
-- Use clustering columns for range queries SELECT * FROM user_posts WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 AND post_date >= '2023-01-01' AND post_date <= '2023-12-31';
-- Use LIMIT to control result size SELECT * FROM users LIMIT 1000;
-- Use prepared statements in applications -- Reduces parsing overhead and prevents injection attacks
-- Avoid SELECT * in production -- Select only needed columns SELECT user_id, username, email FROM users;
-- Use appropriate consistency levels
-- QUORUM for critical reads/writes
-- ONE for non-critical operations
-- LOCAL_QUORUM for multi-datacenter setups
### Mejores Prácticas de Modelado de Datoscql
-- Choose appropriate compaction strategy
-- SizeTieredCompactionStrategy: Write-heavy workloads
-- LeveledCompactionStrategy: Read-heavy workloads
-- TimeWindowCompactionStrategy: Time series data
ALTER TABLE time_series WITH compaction = { 'class': 'TimeWindowCompactionStrategy', 'compaction_window_unit': 'DAYS', 'compaction_window_size': 1 };
-- Configure appropriate caching ALTER TABLE frequently_read_table WITH caching = { 'keys': 'ALL', 'rows_per_partition': '100' };
-- Use compression for large tables ALTER TABLE large_table WITH compression = { 'class': 'LZ4Compressor', 'chunk_length_in_kb': 64 };
-- Monitor and tune bloom filters ALTER TABLE large_table WITH bloom_filter_fp_chance = 0.01;
-- Use TTL for temporary data
INSERT INTO session_data (session_id, user_id, data, created_at)
VALUES (uuid(), 123e4567-e89b-12d3-a456-426614174000, 'session_data', toTimestamp(now()))
USING TTL 3600; -- 1 hour TTL
### Mejores Prácticas de Consultasbash
Regular maintenance¶
Run repair regularly (weekly for critical data)¶
nodetool repair -inc
Monitor compaction¶
nodetool compactionstats
Clean up after topology changes¶
nodetool cleanup
Monitor disk space¶
df -h
Monitor logs¶
tail -f /var/log/cassandra/system.log
Backup strategy¶
Take snapshots before major changes¶
nodetool snapshot -t pre_upgrade_backup
Monitor cluster health¶
nodetool status nodetool describecluster
Capacity planning¶
Monitor partition sizes¶
Plan for 3x growth¶
Monitor read/write patterns¶
Security practices¶
Change default passwords¶
Use SSL/TLS encryption¶
Implement proper authentication/authorization¶
Regular security updates¶
Network segmentation¶
Monitor access logs¶
### Mejores Prácticas de Rendimientocql
-- Use UUIDs for primary keys
CREATE TABLE orders (
order_id UUID PRIMARY KEY,
customer_id UUID,
order_date TIMESTAMP,
total_amount DECIMAL
);
-- Use TIMEUUID for time-ordered data CREATE TABLE events ( event_id TIMEUUID PRIMARY KEY, event_type TEXT, event_data TEXT, created_at TIMESTAMP );
-- Handle null values appropriately -- Cassandra treats null as "no value" -- Use default values or application logic
-- Use batch operations carefully -- Only for related data in same partition -- Avoid large batches (>100 statements)
-- Test with realistic data volumes -- Performance characteristics change with scale -- Test partition sizes and query patterns
-- Monitor application metrics -- Track query latency -- Monitor connection pool usage -- Track error rates -- Monitor timeout exceptions ```### Mejores Prácticas Operativas