Pular para o conteúdo

Anubis

Anubis is an open-source Web AI Firewall and anti-scraping reverse proxy that protects upstream resources from AI crawlers, scraper bots, and automated threats. It implements proof-of-work (SHA-256) challenges delivered via JavaScript to verify that requests come from legitimate browsers rather than AI crawlers or bot networks.

Created by Xe Iaso after experiencing significant resource exhaustion when Amazon crawlers overloaded their Git server, Anubis provides a lightweight, efficient protection layer written in Go. It sits between user traffic and your application, transparently filtering malicious automated access.

GitHub: TecharoHQ/anubis
License: MIT/Apache 2.0
Built With: Go, JavaScript

  • Go 1.19+ or Docker
  • Upstream server to protect
  • TLS certificates (for HTTPS protection)
# Clone repository
git clone https://github.com/TecharoHQ/anubis.git
cd anubis

# Build binary
go build -o anubis ./cmd/anubis

# Verify installation
./anubis --version
# Pull Docker image
docker pull techarohq/anubis:latest

# Run container
docker run -d \
  -p 8080:8080 \
  -e UPSTREAM_URL=http://backend:3000 \
  techarohq/anubis:latest
version: '3.8'

services:
  anubis:
    image: techarohq/anubis:latest
    ports:
      - "8080:8080"
      - "8443:8443"
    environment:
      UPSTREAM_URL: http://backend:3000
      ENABLE_HTTPS: "true"
      CHALLENGE_DIFFICULTY: "medium"
      LOG_LEVEL: "info"
    volumes:
      - ./certs:/etc/anubis/certs
    restart: unless-stopped

  backend:
    image: myapp:latest
    expose:
      - "3000"
# Copy binary to system location
sudo cp anubis /usr/local/bin/

# Create systemd service
sudo tee /etc/systemd/system/anubis.service > /dev/null << 'EOF'
[Unit]
Description=Anubis Web AI Firewall
After=network.target

[Service]
Type=simple
User=anubis
ExecStart=/usr/local/bin/anubis -config /etc/anubis/config.yaml
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable anubis
sudo systemctl start anubis
# config.yaml
server:
  listen: ":8080"
  read_timeout: 30s
  write_timeout: 30s
  idle_timeout: 60s

upstream:
  url: "http://localhost:3000"
  timeout: 30s
  max_idle_conns: 100

challenge:
  enabled: true
  difficulty: "medium"
  timeout: 300s
  cache_size: 10000

logging:
  level: "info"
  format: "json"
  output: "stdout"
# Core settings
UPSTREAM_URL=http://localhost:3000
LISTEN_ADDR=:8080
LOG_LEVEL=info

# Challenge settings
CHALLENGE_ENABLED=true
CHALLENGE_DIFFICULTY=medium
CHALLENGE_TIMEOUT=300

# Performance settings
MAX_IDLE_CONNS=100
REQUEST_TIMEOUT=30
CACHE_SIZE=10000
CommandPurposeExample
anubisStart with default configanubis
anubis -configStart with custom configanubis -config /etc/anubis/config.yaml
anubis -upstreamSet upstream URLanubis -upstream http://app:3000
anubis -listenSet listen addressanubis -listen :8443
anubis -helpShow helpanubis -help
anubis -versionShow versionanubis -version

Anubis challenges requests with a SHA-256 proof-of-work puzzle:

  1. Browser receives challenge HTML/JavaScript
  2. Client-side JavaScript computes SHA-256 hashes
  3. Once valid nonce found (matching difficulty), request continues
  4. Server validates proof-of-work before proxying
challenge:
  difficulty: easy      # ~0.5 seconds CPU (home connections)
  # OR
  difficulty: medium    # ~2 seconds CPU (default, bots filtered)
  # OR
  difficulty: hard      # ~10 seconds CPU (heavy protection)
// Browser receives challenge
{
  "challenge": "find_nonce_for_this_hash",
  "target": "00001234abcd...",
  "difficulty": "medium"
}

// Browser solves and returns
{
  "challenge": "...",
  "nonce": "12345",
  "proof": "valid_sha256_hash"
}
┌─────────────┐
│  Browser    │
└──────┬──────┘
       │ HTTP GET /page

┌──────────────────┐
│  Anubis Firewall │  ◄─── JavaScript Challenge
├──────────────────┤       (SHA-256 PoW)
│ Challenge System │
│ Cache            │
└──────┬───────────┘
       │ HTTP Request (with PoW token)

┌──────────────────┐
│  Upstream Server │
└──────────────────┘
┌─────────────┐
│  AI Crawler │
└──────┬──────┘
       │ HTTP GET /page

┌──────────────────┐
│  Anubis Firewall │
├──────────────────┤
│ JavaScript       │
│ Not Executed ✗   │
└──────────────────┘

    403 Forbidden (PoW Required)
rate_limit:
  enabled: true
  requests_per_second: 100
  burst: 10
  per_ip: true

challenge:
  difficulty: medium
  # Higher difficulty for repeated failures
  escalate_on_failure: true
challenge:
  difficulty: "custom"
  custom_difficulty_bits: 18  # Adjust PoW difficulty in bits
  timeout: 600
  
  # Difficulty scaling based on time of day
  schedules:
    - time: "08:00-18:00"
      difficulty: "easy"
    - time: "18:00-08:00"
      difficulty: "hard"
acl:
  whitelist:
    - "203.0.113.0/24"        # Trusted networks
    - "user-agent:GoogleBot"  # Legitimate crawlers
    
  blacklist:
    - "1.2.3.4"               # Known bad IPs
    - "user-agent:BadBot"     # Known malicious bots
    
  # Whitelist never challenges
  # Blacklist always blocked
  challenges_required_for_others: true
server:
  listen: ":8443"
  use_tls: true
  
tls:
  cert_file: "/etc/anubis/cert.pem"
  key_file: "/etc/anubis/key.pem"
  # Auto-renew with Let's Encrypt
  auto_renew: true
  acme_email: "admin@example.com"
upstream anubis {
    server localhost:8080;
}

server {
    listen 80;
    server_name example.com;
    
    location / {
        proxy_pass http://anubis;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
apiVersion: apps/v1
kind: Deployment
metadata:
  name: anubis
spec:
  replicas: 3
  selector:
    matchLabels:
      app: anubis
  template:
    metadata:
      labels:
        app: anubis
    spec:
      containers:
      - name: anubis
        image: techarohq/anubis:latest
        ports:
        - containerPort: 8080
        env:
        - name: UPSTREAM_URL
          value: "http://backend-service:3000"
        - name: CHALLENGE_DIFFICULTY
          value: "medium"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
# Create Anubis service
docker service create \
  --name anubis \
  --publish 8080:8080 \
  --replicas 3 \
  --env UPSTREAM_URL=http://backend:3000 \
  --env CHALLENGE_DIFFICULTY=medium \
  techarohq/anubis:latest

# Scale service
docker service scale anubis=5
# Check Anubis status
curl http://localhost:8080/health

# Response
{
  "status": "healthy",
  "upstream_healthy": true,
  "challenges_served": 1542,
  "challenges_solved": 1389,
  "uptime_seconds": 86400
}
# Prometheus-compatible metrics
curl http://localhost:8080/metrics

# Output includes:
# anubis_requests_total
# anubis_challenges_issued
# anubis_challenges_solved
# anubis_upstream_latency_ms
# anubis_bot_requests_blocked
logging:
  level: "info"
  format: "json"
  
  # Log to file
  file:
    enabled: true
    path: "/var/log/anubis/anubis.log"
    max_size_mb: 100
    max_backups: 10
    
  # Structured logging
  fields:
    request_id: true
    user_agent: true
    remote_ip: true
    response_time: true
    upstream_latency: true
upstream:
  max_idle_conns: 200          # Increased from default
  max_conns_per_host: 100
  idle_conn_timeout: 90s
challenge:
  cache_size: 50000            # More cache for frequent users
  cache_ttl: 3600s
  cache_backend: "redis"       # Optional: use Redis for distributed
server:
  read_timeout: 20s
  write_timeout: 20s
  idle_timeout: 30s
  
  # Gzip compression
  gzip:
    enabled: true
    level: 6
    min_size: 1024
bot_detection:
  block_headless_browsers: true
  block_curl_wget: true
  
  block_agents:
    - "python-requests"
    - "scrapy"
    - "selenium"
    - "beautifulsoup"
    - "mechanize"
  
  allow_agents:
    - "googlebot"
    - "bingbot"
    - "applebot"
bot_detection:
  # Detect non-human-like behavior
  require_js_execution: true
  detect_headless: true
  
  # Patterns that trigger challenges
  patterns:
    rapid_requests: "10/second"
    sequential_urls: true
    missing_referer: true
    suspicious_headers: true
threat_intelligence:
  enabled: true
  
  # External threat feeds
  sources:
    - "abuseipdb"
    - "maxmind"
    - "custom_internal_feed"
  
  # Action on known bad IPs
  actions:
    reputation_score_above: 50
    action: "block"
headers_to_upstream:
  x-anubis-challenge-solved: true
  x-anubis-solved-timestamp: "2024-01-15T10:30:00Z"
  x-anubis-client-ip: true
  x-anubis-difficulty-level: true
// Browser JavaScript can access challenge info
const challengeInfo = {
  solved: true,
  difficulty: "medium",
  duration_ms: 1234
};

// Send with next request
fetch(url, {
  headers: {
    'X-Challenge-Duration': challengeInfo.duration_ms
  }
});
# Reduce challenge difficulty
challenge:
  difficulty: "easy"
  
# Or increase cache size
challenge:
  cache_size: 100000

# Check upstream performance
upstream:
  timeout: 60s  # Increase if backend slow
# Enable debug logging
LOG_LEVEL=debug anubis

# Check JavaScript delivery
curl -v http://localhost:8080/

# Verify challenge endpoint
curl http://localhost:8080/challenge/verify
upstream:
  timeout: 60s
  keepalive_timeout: 120s
  
server:
  read_timeout: 30s
  write_timeout: 30s
# Check Redis connection
redis-cli PING

# Monitor cache
redis-cli MONITOR

# Clear cache if needed
redis-cli FLUSHDB
  1. Always use HTTPS for production deployments
  2. Whitelist legitimate crawlers if needed (e.g., Google, Bing)
  3. Monitor challenge metrics for anomalies
  4. Rotate TLS certificates regularly
  5. Keep upstream secret - don’t expose in logs/errors
  1. Use Redis for distributed deployments
  2. Implement proper health checks in load balancers
  3. Cache aggressively - challenges are stateless
  4. Monitor upstream latency - Anubis is lightweight
  5. Scale horizontally - stateless design supports it
# Monitor challenge rate
curl http://localhost:8080/metrics | grep challenges

# Check error rates
curl http://localhost:8080/metrics | grep errors

# Validate config before deployment
anubis -config config.yaml -validate

Q: Does Anubis block legitimate users?
A: No. Modern browsers execute JavaScript seamlessly. Only headless browsers and CLI tools are challenged.

Q: What about accessibility?
A: Implement fallback mechanisms for users who can’t complete challenges (optional contact form).

Q: Can it block specific content?
A: Anubis only protects with PoW challenges. Use WAF/firewall rules for content blocking.

Q: Performance impact?
A: Minimal (~1-2ms latency). Challenge computation happens client-side.

  • Cloudflare Challenges (proprietary)
  • AWS WAF (managed service)
  • Nginx ModSecurity (open-source WAF)
  • Datadome (bot management)