Anubis
Overview
Section titled “Overview”Anubis is an open-source Web AI Firewall and anti-scraping reverse proxy that protects upstream resources from AI crawlers, scraper bots, and automated threats. It implements proof-of-work (SHA-256) challenges delivered via JavaScript to verify that requests come from legitimate browsers rather than AI crawlers or bot networks.
Created by Xe Iaso after experiencing significant resource exhaustion when Amazon crawlers overloaded their Git server, Anubis provides a lightweight, efficient protection layer written in Go. It sits between user traffic and your application, transparently filtering malicious automated access.
GitHub: TecharoHQ/anubis
License: MIT/Apache 2.0
Built With: Go, JavaScript
Installation
Section titled “Installation”Prerequisites
Section titled “Prerequisites”- Go 1.19+ or Docker
- Upstream server to protect
- TLS certificates (for HTTPS protection)
Build from Source
Section titled “Build from Source”# Clone repository
git clone https://github.com/TecharoHQ/anubis.git
cd anubis
# Build binary
go build -o anubis ./cmd/anubis
# Verify installation
./anubis --version
Docker Installation
Section titled “Docker Installation”# Pull Docker image
docker pull techarohq/anubis:latest
# Run container
docker run -d \
-p 8080:8080 \
-e UPSTREAM_URL=http://backend:3000 \
techarohq/anubis:latest
Docker Compose Setup
Section titled “Docker Compose Setup”version: '3.8'
services:
anubis:
image: techarohq/anubis:latest
ports:
- "8080:8080"
- "8443:8443"
environment:
UPSTREAM_URL: http://backend:3000
ENABLE_HTTPS: "true"
CHALLENGE_DIFFICULTY: "medium"
LOG_LEVEL: "info"
volumes:
- ./certs:/etc/anubis/certs
restart: unless-stopped
backend:
image: myapp:latest
expose:
- "3000"
System Service (Linux)
Section titled “System Service (Linux)”# Copy binary to system location
sudo cp anubis /usr/local/bin/
# Create systemd service
sudo tee /etc/systemd/system/anubis.service > /dev/null << 'EOF'
[Unit]
Description=Anubis Web AI Firewall
After=network.target
[Service]
Type=simple
User=anubis
ExecStart=/usr/local/bin/anubis -config /etc/anubis/config.yaml
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable anubis
sudo systemctl start anubis
Configuration
Section titled “Configuration”Basic Configuration
Section titled “Basic Configuration”# config.yaml
server:
listen: ":8080"
read_timeout: 30s
write_timeout: 30s
idle_timeout: 60s
upstream:
url: "http://localhost:3000"
timeout: 30s
max_idle_conns: 100
challenge:
enabled: true
difficulty: "medium"
timeout: 300s
cache_size: 10000
logging:
level: "info"
format: "json"
output: "stdout"
Environment Variables
Section titled “Environment Variables”# Core settings
UPSTREAM_URL=http://localhost:3000
LISTEN_ADDR=:8080
LOG_LEVEL=info
# Challenge settings
CHALLENGE_ENABLED=true
CHALLENGE_DIFFICULTY=medium
CHALLENGE_TIMEOUT=300
# Performance settings
MAX_IDLE_CONNS=100
REQUEST_TIMEOUT=30
CACHE_SIZE=10000
Core Commands
Section titled “Core Commands”| Command | Purpose | Example |
|---|---|---|
anubis | Start with default config | anubis |
anubis -config | Start with custom config | anubis -config /etc/anubis/config.yaml |
anubis -upstream | Set upstream URL | anubis -upstream http://app:3000 |
anubis -listen | Set listen address | anubis -listen :8443 |
anubis -help | Show help | anubis -help |
anubis -version | Show version | anubis -version |
Proof-of-Work Challenge System
Section titled “Proof-of-Work Challenge System”How Challenges Work
Section titled “How Challenges Work”Anubis challenges requests with a SHA-256 proof-of-work puzzle:
- Browser receives challenge HTML/JavaScript
- Client-side JavaScript computes SHA-256 hashes
- Once valid nonce found (matching difficulty), request continues
- Server validates proof-of-work before proxying
Challenge Difficulty Levels
Section titled “Challenge Difficulty Levels”challenge:
difficulty: easy # ~0.5 seconds CPU (home connections)
# OR
difficulty: medium # ~2 seconds CPU (default, bots filtered)
# OR
difficulty: hard # ~10 seconds CPU (heavy protection)
Challenge Response Example
Section titled “Challenge Response Example”// Browser receives challenge
{
"challenge": "find_nonce_for_this_hash",
"target": "00001234abcd...",
"difficulty": "medium"
}
// Browser solves and returns
{
"challenge": "...",
"nonce": "12345",
"proof": "valid_sha256_hash"
}
Request Flow
Section titled “Request Flow”Normal Browser Request
Section titled “Normal Browser Request”┌─────────────┐
│ Browser │
└──────┬──────┘
│ HTTP GET /page
▼
┌──────────────────┐
│ Anubis Firewall │ ◄─── JavaScript Challenge
├──────────────────┤ (SHA-256 PoW)
│ Challenge System │
│ Cache │
└──────┬───────────┘
│ HTTP Request (with PoW token)
▼
┌──────────────────┐
│ Upstream Server │
└──────────────────┘
Blocked AI Crawler Request
Section titled “Blocked AI Crawler Request”┌─────────────┐
│ AI Crawler │
└──────┬──────┘
│ HTTP GET /page
▼
┌──────────────────┐
│ Anubis Firewall │
├──────────────────┤
│ JavaScript │
│ Not Executed ✗ │
└──────────────────┘
▼
403 Forbidden (PoW Required)
Advanced Configuration
Section titled “Advanced Configuration”Rate Limiting Integration
Section titled “Rate Limiting Integration”rate_limit:
enabled: true
requests_per_second: 100
burst: 10
per_ip: true
challenge:
difficulty: medium
# Higher difficulty for repeated failures
escalate_on_failure: true
Custom Challenge Difficulty
Section titled “Custom Challenge Difficulty”challenge:
difficulty: "custom"
custom_difficulty_bits: 18 # Adjust PoW difficulty in bits
timeout: 600
# Difficulty scaling based on time of day
schedules:
- time: "08:00-18:00"
difficulty: "easy"
- time: "18:00-08:00"
difficulty: "hard"
Whitelist & Blacklist
Section titled “Whitelist & Blacklist”acl:
whitelist:
- "203.0.113.0/24" # Trusted networks
- "user-agent:GoogleBot" # Legitimate crawlers
blacklist:
- "1.2.3.4" # Known bad IPs
- "user-agent:BadBot" # Known malicious bots
# Whitelist never challenges
# Blacklist always blocked
challenges_required_for_others: true
HTTPS/TLS Configuration
Section titled “HTTPS/TLS Configuration”server:
listen: ":8443"
use_tls: true
tls:
cert_file: "/etc/anubis/cert.pem"
key_file: "/etc/anubis/key.pem"
# Auto-renew with Let's Encrypt
auto_renew: true
acme_email: "admin@example.com"
Deployment Examples
Section titled “Deployment Examples”Nginx Reverse Proxy + Anubis
Section titled “Nginx Reverse Proxy + Anubis”upstream anubis {
server localhost:8080;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://anubis;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Kubernetes Deployment
Section titled “Kubernetes Deployment”apiVersion: apps/v1
kind: Deployment
metadata:
name: anubis
spec:
replicas: 3
selector:
matchLabels:
app: anubis
template:
metadata:
labels:
app: anubis
spec:
containers:
- name: anubis
image: techarohq/anubis:latest
ports:
- containerPort: 8080
env:
- name: UPSTREAM_URL
value: "http://backend-service:3000"
- name: CHALLENGE_DIFFICULTY
value: "medium"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
Docker Swarm Deployment
Section titled “Docker Swarm Deployment”# Create Anubis service
docker service create \
--name anubis \
--publish 8080:8080 \
--replicas 3 \
--env UPSTREAM_URL=http://backend:3000 \
--env CHALLENGE_DIFFICULTY=medium \
techarohq/anubis:latest
# Scale service
docker service scale anubis=5
Monitoring & Observability
Section titled “Monitoring & Observability”Health Check Endpoint
Section titled “Health Check Endpoint”# Check Anubis status
curl http://localhost:8080/health
# Response
{
"status": "healthy",
"upstream_healthy": true,
"challenges_served": 1542,
"challenges_solved": 1389,
"uptime_seconds": 86400
}
Metrics Endpoint
Section titled “Metrics Endpoint”# Prometheus-compatible metrics
curl http://localhost:8080/metrics
# Output includes:
# anubis_requests_total
# anubis_challenges_issued
# anubis_challenges_solved
# anubis_upstream_latency_ms
# anubis_bot_requests_blocked
Logging Configuration
Section titled “Logging Configuration”logging:
level: "info"
format: "json"
# Log to file
file:
enabled: true
path: "/var/log/anubis/anubis.log"
max_size_mb: 100
max_backups: 10
# Structured logging
fields:
request_id: true
user_agent: true
remote_ip: true
response_time: true
upstream_latency: true
Performance Tuning
Section titled “Performance Tuning”Connection Pooling
Section titled “Connection Pooling”upstream:
max_idle_conns: 200 # Increased from default
max_conns_per_host: 100
idle_conn_timeout: 90s
Challenge Caching
Section titled “Challenge Caching”challenge:
cache_size: 50000 # More cache for frequent users
cache_ttl: 3600s
cache_backend: "redis" # Optional: use Redis for distributed
Request Optimization
Section titled “Request Optimization”server:
read_timeout: 20s
write_timeout: 20s
idle_timeout: 30s
# Gzip compression
gzip:
enabled: true
level: 6
min_size: 1024
Bot Detection & Blocking
Section titled “Bot Detection & Blocking”User-Agent Based Rules
Section titled “User-Agent Based Rules”bot_detection:
block_headless_browsers: true
block_curl_wget: true
block_agents:
- "python-requests"
- "scrapy"
- "selenium"
- "beautifulsoup"
- "mechanize"
allow_agents:
- "googlebot"
- "bingbot"
- "applebot"
Behavioral Analysis
Section titled “Behavioral Analysis”bot_detection:
# Detect non-human-like behavior
require_js_execution: true
detect_headless: true
# Patterns that trigger challenges
patterns:
rapid_requests: "10/second"
sequential_urls: true
missing_referer: true
suspicious_headers: true
IP Reputation Integration
Section titled “IP Reputation Integration”threat_intelligence:
enabled: true
# External threat feeds
sources:
- "abuseipdb"
- "maxmind"
- "custom_internal_feed"
# Action on known bad IPs
actions:
reputation_score_above: 50
action: "block"
Integration with Applications
Section titled “Integration with Applications”Passing Challenge Info to Backend
Section titled “Passing Challenge Info to Backend”headers_to_upstream:
x-anubis-challenge-solved: true
x-anubis-solved-timestamp: "2024-01-15T10:30:00Z"
x-anubis-client-ip: true
x-anubis-difficulty-level: true
Custom Headers in JavaScript
Section titled “Custom Headers in JavaScript”// Browser JavaScript can access challenge info
const challengeInfo = {
solved: true,
difficulty: "medium",
duration_ms: 1234
};
// Send with next request
fetch(url, {
headers: {
'X-Challenge-Duration': challengeInfo.duration_ms
}
});
Troubleshooting
Section titled “Troubleshooting”High CPU Usage
Section titled “High CPU Usage”# Reduce challenge difficulty
challenge:
difficulty: "easy"
# Or increase cache size
challenge:
cache_size: 100000
# Check upstream performance
upstream:
timeout: 60s # Increase if backend slow
Challenge Failures
Section titled “Challenge Failures”# Enable debug logging
LOG_LEVEL=debug anubis
# Check JavaScript delivery
curl -v http://localhost:8080/
# Verify challenge endpoint
curl http://localhost:8080/challenge/verify
Backend Timeout Issues
Section titled “Backend Timeout Issues”upstream:
timeout: 60s
keepalive_timeout: 120s
server:
read_timeout: 30s
write_timeout: 30s
Redis Cache Issues (if enabled)
Section titled “Redis Cache Issues (if enabled)”# Check Redis connection
redis-cli PING
# Monitor cache
redis-cli MONITOR
# Clear cache if needed
redis-cli FLUSHDB
Best Practices
Section titled “Best Practices”Security
Section titled “Security”- Always use HTTPS for production deployments
- Whitelist legitimate crawlers if needed (e.g., Google, Bing)
- Monitor challenge metrics for anomalies
- Rotate TLS certificates regularly
- Keep upstream secret - don’t expose in logs/errors
Performance
Section titled “Performance”- Use Redis for distributed deployments
- Implement proper health checks in load balancers
- Cache aggressively - challenges are stateless
- Monitor upstream latency - Anubis is lightweight
- Scale horizontally - stateless design supports it
Operations
Section titled “Operations”# Monitor challenge rate
curl http://localhost:8080/metrics | grep challenges
# Check error rates
curl http://localhost:8080/metrics | grep errors
# Validate config before deployment
anubis -config config.yaml -validate
Q: Does Anubis block legitimate users?
A: No. Modern browsers execute JavaScript seamlessly. Only headless browsers and CLI tools are challenged.
Q: What about accessibility?
A: Implement fallback mechanisms for users who can’t complete challenges (optional contact form).
Q: Can it block specific content?
A: Anubis only protects with PoW challenges. Use WAF/firewall rules for content blocking.
Q: Performance impact?
A: Minimal (~1-2ms latency). Challenge computation happens client-side.
Resources
Section titled “Resources”- GitHub: https://github.com/TecharoHQ/anubis
- Documentation: https://anubis.techarohq.dev
- Issue Tracker: https://github.com/TecharoHQ/anubis/issues
- Discussions: https://github.com/TecharoHQ/anubis/discussions
Related Tools
Section titled “Related Tools”- Cloudflare Challenges (proprietary)
- AWS WAF (managed service)
- Nginx ModSecurity (open-source WAF)
- Datadome (bot management)