Anubis
Overview
Abschnitt betitelt „Overview“Anubis is an open-source Web AI Firewall and anti-scraping reverse proxy that protects upstream resources from AI crawlers, scraper bots, and automated threats. It implements proof-of-work (SHA-256) challenges delivered via JavaScript to verify that requests come from legitimate browsers rather than AI crawlers or bot networks.
Created by Xe Iaso after experiencing significant resource exhaustion when Amazon crawlers overloaded their Git server, Anubis provides a lightweight, efficient protection layer written in Go. It sits between user traffic and your application, transparently filtering malicious automated access.
GitHub: TecharoHQ/anubis
License: MIT/Apache 2.0
Built With: Go, JavaScript
Installation
Abschnitt betitelt „Installation“Prerequisites
Abschnitt betitelt „Prerequisites“- Go 1.19+ or Docker
- Upstream server to protect
- TLS certificates (for HTTPS protection)
Build from Source
Abschnitt betitelt „Build from Source“# Clone repository
git clone https://github.com/TecharoHQ/anubis.git
cd anubis
# Build binary
go build -o anubis ./cmd/anubis
# Verify installation
./anubis --version
Docker Installation
Abschnitt betitelt „Docker Installation“# Pull Docker image
docker pull techarohq/anubis:latest
# Run container
docker run -d \
-p 8080:8080 \
-e UPSTREAM_URL=http://backend:3000 \
techarohq/anubis:latest
Docker Compose Setup
Abschnitt betitelt „Docker Compose Setup“version: '3.8'
services:
anubis:
image: techarohq/anubis:latest
ports:
- "8080:8080"
- "8443:8443"
environment:
UPSTREAM_URL: http://backend:3000
ENABLE_HTTPS: "true"
CHALLENGE_DIFFICULTY: "medium"
LOG_LEVEL: "info"
volumes:
- ./certs:/etc/anubis/certs
restart: unless-stopped
backend:
image: myapp:latest
expose:
- "3000"
System Service (Linux)
Abschnitt betitelt „System Service (Linux)“# Copy binary to system location
sudo cp anubis /usr/local/bin/
# Create systemd service
sudo tee /etc/systemd/system/anubis.service > /dev/null << 'EOF'
[Unit]
Description=Anubis Web AI Firewall
After=network.target
[Service]
Type=simple
User=anubis
ExecStart=/usr/local/bin/anubis -config /etc/anubis/config.yaml
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable anubis
sudo systemctl start anubis
Configuration
Abschnitt betitelt „Configuration“Basic Configuration
Abschnitt betitelt „Basic Configuration“# config.yaml
server:
listen: ":8080"
read_timeout: 30s
write_timeout: 30s
idle_timeout: 60s
upstream:
url: "http://localhost:3000"
timeout: 30s
max_idle_conns: 100
challenge:
enabled: true
difficulty: "medium"
timeout: 300s
cache_size: 10000
logging:
level: "info"
format: "json"
output: "stdout"
Environment Variables
Abschnitt betitelt „Environment Variables“# Core settings
UPSTREAM_URL=http://localhost:3000
LISTEN_ADDR=:8080
LOG_LEVEL=info
# Challenge settings
CHALLENGE_ENABLED=true
CHALLENGE_DIFFICULTY=medium
CHALLENGE_TIMEOUT=300
# Performance settings
MAX_IDLE_CONNS=100
REQUEST_TIMEOUT=30
CACHE_SIZE=10000
Core Commands
Abschnitt betitelt „Core Commands“| Command | Purpose | Example |
|---|---|---|
anubis | Start with default config | anubis |
anubis -config | Start with custom config | anubis -config /etc/anubis/config.yaml |
anubis -upstream | Set upstream URL | anubis -upstream http://app:3000 |
anubis -listen | Set listen address | anubis -listen :8443 |
anubis -help | Show help | anubis -help |
anubis -version | Show version | anubis -version |
Proof-of-Work Challenge System
Abschnitt betitelt „Proof-of-Work Challenge System“How Challenges Work
Abschnitt betitelt „How Challenges Work“Anubis challenges requests with a SHA-256 proof-of-work puzzle:
- Browser receives challenge HTML/JavaScript
- Client-side JavaScript computes SHA-256 hashes
- Once valid nonce found (matching difficulty), request continues
- Server validates proof-of-work before proxying
Challenge Difficulty Levels
Abschnitt betitelt „Challenge Difficulty Levels“challenge:
difficulty: easy # ~0.5 seconds CPU (home connections)
# OR
difficulty: medium # ~2 seconds CPU (default, bots filtered)
# OR
difficulty: hard # ~10 seconds CPU (heavy protection)
Challenge Response Example
Abschnitt betitelt „Challenge Response Example“// Browser receives challenge
{
"challenge": "find_nonce_for_this_hash",
"target": "00001234abcd...",
"difficulty": "medium"
}
// Browser solves and returns
{
"challenge": "...",
"nonce": "12345",
"proof": "valid_sha256_hash"
}
Request Flow
Abschnitt betitelt „Request Flow“Normal Browser Request
Abschnitt betitelt „Normal Browser Request“┌─────────────┐
│ Browser │
└──────┬──────┘
│ HTTP GET /page
▼
┌──────────────────┐
│ Anubis Firewall │ ◄─── JavaScript Challenge
├──────────────────┤ (SHA-256 PoW)
│ Challenge System │
│ Cache │
└──────┬───────────┘
│ HTTP Request (with PoW token)
▼
┌──────────────────┐
│ Upstream Server │
└──────────────────┘
Blocked AI Crawler Request
Abschnitt betitelt „Blocked AI Crawler Request“┌─────────────┐
│ AI Crawler │
└──────┬──────┘
│ HTTP GET /page
▼
┌──────────────────┐
│ Anubis Firewall │
├──────────────────┤
│ JavaScript │
│ Not Executed ✗ │
└──────────────────┘
▼
403 Forbidden (PoW Required)
Advanced Configuration
Abschnitt betitelt „Advanced Configuration“Rate Limiting Integration
Abschnitt betitelt „Rate Limiting Integration“rate_limit:
enabled: true
requests_per_second: 100
burst: 10
per_ip: true
challenge:
difficulty: medium
# Higher difficulty for repeated failures
escalate_on_failure: true
Custom Challenge Difficulty
Abschnitt betitelt „Custom Challenge Difficulty“challenge:
difficulty: "custom"
custom_difficulty_bits: 18 # Adjust PoW difficulty in bits
timeout: 600
# Difficulty scaling based on time of day
schedules:
- time: "08:00-18:00"
difficulty: "easy"
- time: "18:00-08:00"
difficulty: "hard"
Whitelist & Blacklist
Abschnitt betitelt „Whitelist & Blacklist“acl:
whitelist:
- "203.0.113.0/24" # Trusted networks
- "user-agent:GoogleBot" # Legitimate crawlers
blacklist:
- "1.2.3.4" # Known bad IPs
- "user-agent:BadBot" # Known malicious bots
# Whitelist never challenges
# Blacklist always blocked
challenges_required_for_others: true
HTTPS/TLS Configuration
Abschnitt betitelt „HTTPS/TLS Configuration“server:
listen: ":8443"
use_tls: true
tls:
cert_file: "/etc/anubis/cert.pem"
key_file: "/etc/anubis/key.pem"
# Auto-renew with Let's Encrypt
auto_renew: true
acme_email: "admin@example.com"
Deployment Examples
Abschnitt betitelt „Deployment Examples“Nginx Reverse Proxy + Anubis
Abschnitt betitelt „Nginx Reverse Proxy + Anubis“upstream anubis {
server localhost:8080;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://anubis;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Kubernetes Deployment
Abschnitt betitelt „Kubernetes Deployment“apiVersion: apps/v1
kind: Deployment
metadata:
name: anubis
spec:
replicas: 3
selector:
matchLabels:
app: anubis
template:
metadata:
labels:
app: anubis
spec:
containers:
- name: anubis
image: techarohq/anubis:latest
ports:
- containerPort: 8080
env:
- name: UPSTREAM_URL
value: "http://backend-service:3000"
- name: CHALLENGE_DIFFICULTY
value: "medium"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
Docker Swarm Deployment
Abschnitt betitelt „Docker Swarm Deployment“# Create Anubis service
docker service create \
--name anubis \
--publish 8080:8080 \
--replicas 3 \
--env UPSTREAM_URL=http://backend:3000 \
--env CHALLENGE_DIFFICULTY=medium \
techarohq/anubis:latest
# Scale service
docker service scale anubis=5
Monitoring & Observability
Abschnitt betitelt „Monitoring & Observability“Health Check Endpoint
Abschnitt betitelt „Health Check Endpoint“# Check Anubis status
curl http://localhost:8080/health
# Response
{
"status": "healthy",
"upstream_healthy": true,
"challenges_served": 1542,
"challenges_solved": 1389,
"uptime_seconds": 86400
}
Metrics Endpoint
Abschnitt betitelt „Metrics Endpoint“# Prometheus-compatible metrics
curl http://localhost:8080/metrics
# Output includes:
# anubis_requests_total
# anubis_challenges_issued
# anubis_challenges_solved
# anubis_upstream_latency_ms
# anubis_bot_requests_blocked
Logging Configuration
Abschnitt betitelt „Logging Configuration“logging:
level: "info"
format: "json"
# Log to file
file:
enabled: true
path: "/var/log/anubis/anubis.log"
max_size_mb: 100
max_backups: 10
# Structured logging
fields:
request_id: true
user_agent: true
remote_ip: true
response_time: true
upstream_latency: true
Performance Tuning
Abschnitt betitelt „Performance Tuning“Connection Pooling
Abschnitt betitelt „Connection Pooling“upstream:
max_idle_conns: 200 # Increased from default
max_conns_per_host: 100
idle_conn_timeout: 90s
Challenge Caching
Abschnitt betitelt „Challenge Caching“challenge:
cache_size: 50000 # More cache for frequent users
cache_ttl: 3600s
cache_backend: "redis" # Optional: use Redis for distributed
Request Optimization
Abschnitt betitelt „Request Optimization“server:
read_timeout: 20s
write_timeout: 20s
idle_timeout: 30s
# Gzip compression
gzip:
enabled: true
level: 6
min_size: 1024
Bot Detection & Blocking
Abschnitt betitelt „Bot Detection & Blocking“User-Agent Based Rules
Abschnitt betitelt „User-Agent Based Rules“bot_detection:
block_headless_browsers: true
block_curl_wget: true
block_agents:
- "python-requests"
- "scrapy"
- "selenium"
- "beautifulsoup"
- "mechanize"
allow_agents:
- "googlebot"
- "bingbot"
- "applebot"
Behavioral Analysis
Abschnitt betitelt „Behavioral Analysis“bot_detection:
# Detect non-human-like behavior
require_js_execution: true
detect_headless: true
# Patterns that trigger challenges
patterns:
rapid_requests: "10/second"
sequential_urls: true
missing_referer: true
suspicious_headers: true
IP Reputation Integration
Abschnitt betitelt „IP Reputation Integration“threat_intelligence:
enabled: true
# External threat feeds
sources:
- "abuseipdb"
- "maxmind"
- "custom_internal_feed"
# Action on known bad IPs
actions:
reputation_score_above: 50
action: "block"
Integration with Applications
Abschnitt betitelt „Integration with Applications“Passing Challenge Info to Backend
Abschnitt betitelt „Passing Challenge Info to Backend“headers_to_upstream:
x-anubis-challenge-solved: true
x-anubis-solved-timestamp: "2024-01-15T10:30:00Z"
x-anubis-client-ip: true
x-anubis-difficulty-level: true
Custom Headers in JavaScript
Abschnitt betitelt „Custom Headers in JavaScript“// Browser JavaScript can access challenge info
const challengeInfo = {
solved: true,
difficulty: "medium",
duration_ms: 1234
};
// Send with next request
fetch(url, {
headers: {
'X-Challenge-Duration': challengeInfo.duration_ms
}
});
Troubleshooting
Abschnitt betitelt „Troubleshooting“High CPU Usage
Abschnitt betitelt „High CPU Usage“# Reduce challenge difficulty
challenge:
difficulty: "easy"
# Or increase cache size
challenge:
cache_size: 100000
# Check upstream performance
upstream:
timeout: 60s # Increase if backend slow
Challenge Failures
Abschnitt betitelt „Challenge Failures“# Enable debug logging
LOG_LEVEL=debug anubis
# Check JavaScript delivery
curl -v http://localhost:8080/
# Verify challenge endpoint
curl http://localhost:8080/challenge/verify
Backend Timeout Issues
Abschnitt betitelt „Backend Timeout Issues“upstream:
timeout: 60s
keepalive_timeout: 120s
server:
read_timeout: 30s
write_timeout: 30s
Redis Cache Issues (if enabled)
Abschnitt betitelt „Redis Cache Issues (if enabled)“# Check Redis connection
redis-cli PING
# Monitor cache
redis-cli MONITOR
# Clear cache if needed
redis-cli FLUSHDB
Best Practices
Abschnitt betitelt „Best Practices“Security
Abschnitt betitelt „Security“- Always use HTTPS for production deployments
- Whitelist legitimate crawlers if needed (e.g., Google, Bing)
- Monitor challenge metrics for anomalies
- Rotate TLS certificates regularly
- Keep upstream secret - don’t expose in logs/errors
Performance
Abschnitt betitelt „Performance“- Use Redis for distributed deployments
- Implement proper health checks in load balancers
- Cache aggressively - challenges are stateless
- Monitor upstream latency - Anubis is lightweight
- Scale horizontally - stateless design supports it
Operations
Abschnitt betitelt „Operations“# Monitor challenge rate
curl http://localhost:8080/metrics | grep challenges
# Check error rates
curl http://localhost:8080/metrics | grep errors
# Validate config before deployment
anubis -config config.yaml -validate
Q: Does Anubis block legitimate users?
A: No. Modern browsers execute JavaScript seamlessly. Only headless browsers and CLI tools are challenged.
Q: What about accessibility?
A: Implement fallback mechanisms for users who can’t complete challenges (optional contact form).
Q: Can it block specific content?
A: Anubis only protects with PoW challenges. Use WAF/firewall rules for content blocking.
Q: Performance impact?
A: Minimal (~1-2ms latency). Challenge computation happens client-side.
Resources
Abschnitt betitelt „Resources“- GitHub: https://github.com/TecharoHQ/anubis
- Documentation: https://anubis.techarohq.dev
- Issue Tracker: https://github.com/TecharoHQ/anubis/issues
- Discussions: https://github.com/TecharoHQ/anubis/discussions
Related Tools
Abschnitt betitelt „Related Tools“- Cloudflare Challenges (proprietary)
- AWS WAF (managed service)
- Nginx ModSecurity (open-source WAF)
- Datadome (bot management)