SpiderFoot Cheat Sheet¶

Overview¶

SpiderFoot is an open-source intelligence (OSINT) automation tool that performs reconnaissance and information gathering on targets such as IP addresses, domain names, email addresses, and names. It integrates with over 200 data sources to collect intelligence and identify security risks, making it an essential tool for penetration testers, security researchers, and threat intelligence analysts.

⚠️ Note: Open-source tool with commercial HX edition available. Always ensure you have proper authorization before scanning targets.

Installation and Setup¶

Installation Methods¶

# Method 1: Install from PyPI
pip3 install spiderfoot

# Method 2: Install from GitHub (latest development version)
git clone https://github.com/smicallef/spiderfoot.git
cd spiderfoot
pip3 install -r requirements.txt

# Method 3: Docker installation
docker pull spiderfoot/spiderfoot
docker run -p 5001:5001 spiderfoot/spiderfoot

# Method 4: Using package managers
# Ubuntu/Debian
sudo apt update && sudo apt install spiderfoot

# Arch Linux
yay -S spiderfoot

# macOS with Homebrew
brew install spiderfoot

Initial Configuration¶

# Create configuration directory
mkdir -p ~/.spiderfoot
cd ~/.spiderfoot

# Generate default configuration
spiderfoot -C

# Edit configuration file
nano spiderfoot.conf

# Key configuration options:
# - __webaddr: Web interface bind address (default: 127.0.0.1)
# - __webport: Web interface port (default: 5001)
# - __database: Database file location
# - __logfile: Log file location
# - __modules: Module directory path

API Keys Configuration¶

# Configure API keys for enhanced data sources
# Edit ~/.spiderfoot/spiderfoot.conf

# Common API configurations:
[api_keys]
# VirusTotal API key
virustotal_api_key = your_virustotal_api_key

# Shodan API key
shodan_api_key = your_shodan_api_key

# Have I Been Pwned API key
hibp_api_key = your_hibp_api_key

# SecurityTrails API key
securitytrails_api_key = your_securitytrails_api_key

# PassiveTotal API key
passivetotal_api_key = your_passivetotal_api_key
passivetotal_username = your_passivetotal_username

# AlienVault OTX API key
otx_api_key = your_otx_api_key

# Censys API credentials
censys_api_id = your_censys_api_id
censys_api_secret = your_censys_api_secret

# Hunter.io API key
hunter_api_key = your_hunter_api_key

# Clearbit API key
clearbit_api_key = your_clearbit_api_key

Command Line Usage¶

Basic Scanning Operations¶

# Basic domain scan
spiderfoot -s example.com -t DOMAIN

# IP address scan
spiderfoot -s 192.168.1.1 -t IP_ADDRESS

# Email address investigation
spiderfoot -s user@example.com -t EMAILADDR

# Human name investigation
spiderfoot -s "John Smith" -t HUMAN_NAME

# Multiple targets scan
spiderfoot -s "example.com,192.168.1.1" -t DOMAIN,IP_ADDRESS

# Scan with specific modules only
spiderfoot -s example.com -t DOMAIN -m sfp_dnsresolve,sfp_whois,sfp_virustotal

# Exclude specific modules
spiderfoot -s example.com -t DOMAIN -x sfp_social,sfp_pgp

Advanced Scanning Options¶

# Passive scan only (no active probing)
spiderfoot -s example.com -t DOMAIN -p

# Scan with custom user agent
spiderfoot -s example.com -t DOMAIN -u "Mozilla/5.0 Custom Agent"

# Scan with proxy
spiderfoot -s example.com -t DOMAIN -y http://proxy.example.com:8080

# Scan with custom timeout
spiderfoot -s example.com -t DOMAIN -w 30

# Scan with maximum threads
spiderfoot -s example.com -t DOMAIN -T 10

# Scan with output to file
spiderfoot -s example.com -t DOMAIN -o json > scan_results.json

# Scan with specific data types
spiderfoot -s example.com -t DOMAIN -f SUBDOMAIN,IP_ADDRESS,EMAILADDR

Scan Management¶

# List available modules
spiderfoot -M

# Get module information
spiderfoot -M sfp_virustotal

# List available data types
spiderfoot -F

# List running scans
spiderfoot -l

# Stop a running scan
spiderfoot -q scan_id

# Delete scan data
spiderfoot -d scan_id

# Export scan results
spiderfoot -e scan_id -o json > results.json
spiderfoot -e scan_id -o csv > results.csv
spiderfoot -e scan_id -o xml > results.xml

Web Interface Usage¶

Starting the Web Interface¶

# Start web interface (default: http://127.0.0.1:5001)
spiderfoot -l 127.0.0.1:5001

# Start web interface on all interfaces
spiderfoot -l 0.0.0.0:5001

# Start with custom configuration
spiderfoot -c /path/to/custom.conf -l 127.0.0.1:5001

# Start in background
nohup spiderfoot -l 127.0.0.1:5001 &

# Start with Docker
docker run -p 5001:5001 -d spiderfoot/spiderfoot

# Access web interface
# Navigate to http://localhost:5001

# Main sections:
# - New Scan: Create new scans
# - Scans: View and manage existing scans
# - Browse: Browse scan results by data type
# - Search: Search across all scan data
# - Settings: Configure modules and API keys
# - About: System information and statistics

# Scan creation workflow:
# 1. Enter target (domain, IP, email, etc.)
# 2. Select scan modules or use presets
# 3. Configure scan options
# 4. Start scan and monitor progress
# 5. Review results and export data

Scan Configuration Presets¶

{
  "scan_presets": {
    "passive_recon": {
      "description": "Passive reconnaissance without active probing",
      "modules": [
        "sfp_dnsresolve",
        "sfp_whois",
        "sfp_virustotal",
        "sfp_shodan",
        "sfp_censys",
        "sfp_passivetotal",
        "sfp_securitytrails",
        "sfp_threatcrowd",
        "sfp_otx"
      ],
      "passive_only": true
    },
    "comprehensive_domain": {
      "description": "Comprehensive domain investigation",
      "modules": [
        "sfp_dnsresolve",
        "sfp_dnsbrute",
        "sfp_whois",
        "sfp_virustotal",
        "sfp_shodan",
        "sfp_censys",
        "sfp_subdomain_enum",
        "sfp_ssl_analyze",
        "sfp_port_scan",
        "sfp_web_crawl"
      ]
    },
    "email_investigation": {
      "description": "Email address and person investigation",
      "modules": [
        "sfp_hunter",
        "sfp_clearbit",
        "sfp_hibp",
        "sfp_social",
        "sfp_pgp",
        "sfp_gravatar",
        "sfp_fullcontact",
        "sfp_pipl"
      ]
    },
    "threat_intelligence": {
      "description": "Threat intelligence gathering",
      "modules": [
        "sfp_virustotal",
        "sfp_otx",
        "sfp_threatcrowd",
        "sfp_malwaredomains",
        "sfp_reputation",
        "sfp_blacklist",
        "sfp_phishing",
        "sfp_malware"
      ]
    }
  }
}

Module Configuration and Customization¶

Core Modules Overview¶

# DNS and Domain Modules
sfp_dnsresolve      # DNS resolution and record enumeration
sfp_dnsbrute        # DNS subdomain brute forcing
sfp_whois           # WHOIS information gathering
sfp_subdomain_enum  # Subdomain enumeration from various sources

# Network and Infrastructure
sfp_port_scan       # Port scanning and service detection
sfp_ssl_analyze     # SSL/TLS certificate analysis
sfp_banner_grab     # Service banner grabbing
sfp_traceroute      # Network path tracing

# Threat Intelligence
sfp_virustotal      # VirusTotal integration
sfp_otx             # AlienVault OTX integration
sfp_threatcrowd     # ThreatCrowd integration
sfp_reputation      # Reputation checking across sources

# Search Engines and OSINT
sfp_google          # Google search integration
sfp_bing            # Bing search integration
sfp_shodan          # Shodan integration
sfp_censys          # Censys integration

# Social Media and People
sfp_social          # Social media profile discovery
sfp_hunter          # Email address discovery
sfp_clearbit        # Company and person enrichment
sfp_hibp            # Have I Been Pwned integration

# Web Application
sfp_web_crawl       # Web application crawling
sfp_web_analyze     # Web technology identification
sfp_robots          # robots.txt analysis
sfp_sitemap         # Sitemap discovery and analysis

Custom Module Development¶

# Example custom module: sfp_custom_example.py
import re
import requests
from spiderfoot import SpiderFootEvent, SpiderFootPlugin

class sfp_custom_example(SpiderFootPlugin):
    """Custom SpiderFoot module example"""

    meta = {
        'name': "Custom Example Module",
        'summary': "Example custom module for demonstration",
        'flags': [""],
        'useCases': ["Investigate", "Passive"],
        'categories': ["Search Engines"],
        'dataSource': {
            'website': "https://example.com",
            'model': "FREE_NOAUTH_UNLIMITED",
            'references': ["https://example.com/api"],
            'favIcon': "https://example.com/favicon.ico",
            'logo': "https://example.com/logo.png",
            'description': "Example data source for custom module"
        }
    }

    # Default options
    opts = {
        'api_key': '',
        'max_results': 100,
        'timeout': 30
    }

    # Option descriptions
    optdescs = {
        'api_key': "API key for the service",
        'max_results': "Maximum number of results to return",
        'timeout': "Request timeout in seconds"
    }

    # What events this module accepts for input
    events = {
        'DOMAIN_NAME': ['SUBDOMAIN', 'EMAILADDR'],
        'IP_ADDRESS': ['GEOINFO', 'NETBLOCK_OWNER']
    }

    def setup(self, sfc, userOpts=dict()):
        self.sf = sfc
        self.results = self.tempStorage()

        # Override default options with user settings
        for opt in list(userOpts.keys()):
            self.opts[opt] = userOpts[opt]

    def watchedEvents(self):
        """Events this module will accept as input"""
        return list(self.events.keys())

    def producedEvents(self):
        """Events this module will produce"""
        evts = []
        for eventType in self.events:
            evts.extend(self.events[eventType])
        return evts

    def handleEvent(self, event):
        """Handle incoming events"""
        eventName = event.eventType
        srcModuleName = event.module
        eventData = event.data

        # Don't process events from ourselves
        if srcModuleName == "sfp_custom_example":
            return

        # Check if we've already processed this data
        if eventData in self.results:
            return

        self.results[eventData] = True

        self.sf.debug(f"Received event, {eventName}, from {srcModuleName}")

        # Process different event types
        if eventName == 'DOMAIN_NAME':
            self.processDomain(eventData, event)
        elif eventName == 'IP_ADDRESS':
            self.processIP(eventData, event)

    def processDomain(self, domain, parentEvent):
        """Process domain name events"""
        try:
            # Example: Query custom API for domain information
            url = f"https://api.example.com/domain/{domain}"
            headers = {'Authorization': f'Bearer {self.opts["api_key"]}'}

            response = requests.get(
                url, 
                headers=headers, 
                timeout=self.opts['timeout']
            )

            if response.status_code == 200:
                data = response.json()

                # Extract subdomains
                if 'subdomains' in data:
                    for subdomain in data['subdomains'][:self.opts['max_results']]:
                        evt = SpiderFootEvent(
                            'SUBDOMAIN', 
                            subdomain, 
                            self.__name__, 
                            parentEvent
                        )
                        self.notifyListeners(evt)

                # Extract email addresses
                if 'emails' in data:
                    for email in data['emails'][:self.opts['max_results']]:
                        evt = SpiderFootEvent(
                            'EMAILADDR', 
                            email, 
                            self.__name__, 
                            parentEvent
                        )
                        self.notifyListeners(evt)

        except Exception as e:
            self.sf.error(f"Error processing domain {domain}: {str(e)}")

    def processIP(self, ip, parentEvent):
        """Process IP address events"""
        try:
            # Example: Query custom API for IP information
            url = f"https://api.example.com/ip/{ip}"
            headers = {'Authorization': f'Bearer {self.opts["api_key"]}'}

            response = requests.get(
                url, 
                headers=headers, 
                timeout=self.opts['timeout']
            )

            if response.status_code == 200:
                data = response.json()

                # Extract geolocation info
                if 'location' in data:
                    location = f"{data['location']['city']}, {data['location']['country']}"
                    evt = SpiderFootEvent(
                        'GEOINFO', 
                        location, 
                        self.__name__, 
                        parentEvent
                    )
                    self.notifyListeners(evt)

        except Exception as e:
            self.sf.error(f"Error processing IP {ip}: {str(e)}")

# Install custom module:
# 1. Save as sfp_custom_example.py in modules/ directory
# 2. Restart SpiderFoot
# 3. Module will appear in available modules list

Module Configuration Files¶

# ~/.spiderfoot/modules.conf
# Module-specific configuration

[sfp_virustotal]
api_key = your_virustotal_api_key
max_results = 100
timeout = 30

[sfp_shodan]
api_key = your_shodan_api_key
max_results = 50
timeout = 20

[sfp_hunter]
api_key = your_hunter_api_key
max_results = 25
verify_emails = true

[sfp_censys]
api_id = your_censys_api_id
api_secret = your_censys_api_secret
max_results = 100

[sfp_passivetotal]
api_key = your_passivetotal_api_key
username = your_passivetotal_username
max_results = 200

[sfp_hibp]
api_key = your_hibp_api_key
check_pastes = true
truncate_response = false

Advanced OSINT Techniques¶

Comprehensive Domain Investigation¶

# Multi-stage domain investigation
# Stage 1: Passive reconnaissance
spiderfoot -s target.com -t DOMAIN -m sfp_dnsresolve,sfp_whois,sfp_virustotal,sfp_passivetotal -p

# Stage 2: Subdomain enumeration
spiderfoot -s target.com -t DOMAIN -m sfp_dnsbrute,sfp_subdomain_enum,sfp_crt,sfp_google

# Stage 3: Infrastructure analysis
spiderfoot -s target.com -t DOMAIN -m sfp_port_scan,sfp_ssl_analyze,sfp_banner_grab,sfp_web_analyze

# Stage 4: Threat intelligence
spiderfoot -s target.com -t DOMAIN -m sfp_reputation,sfp_blacklist,sfp_malware,sfp_phishing

# Combined comprehensive scan
spiderfoot -s target.com -t DOMAIN \
  -m sfp_dnsresolve,sfp_dnsbrute,sfp_whois,sfp_virustotal,sfp_shodan,sfp_censys,sfp_passivetotal,sfp_subdomain_enum,sfp_ssl_analyze,sfp_port_scan,sfp_web_crawl,sfp_reputation \
  -f SUBDOMAIN,IP_ADDRESS,EMAILADDR,SSL_CERTIFICATE,WEBSERVER_TECHNOLOGY,VULNERABILITY

Email and Person Investigation¶

# Email address investigation
spiderfoot -s user@target.com -t EMAILADDR \
  -m sfp_hunter,sfp_clearbit,sfp_hibp,sfp_social,sfp_pgp,sfp_gravatar,sfp_fullcontact \
  -f EMAILADDR,SOCIAL_MEDIA,PHONE_NUMBER,PHYSICAL_ADDRESS,BREACH_DATA

# Person name investigation
spiderfoot -s "John Smith" -t HUMAN_NAME \
  -m sfp_social,sfp_pipl,sfp_fullcontact,sfp_clearbit,sfp_google,sfp_bing \
  -f SOCIAL_MEDIA,EMAILADDR,PHONE_NUMBER,PHYSICAL_ADDRESS,COMPANY_NAME

# Company investigation
spiderfoot -s "Example Corp" -t COMPANY_NAME \
  -m sfp_clearbit,sfp_hunter,sfp_google,sfp_bing,sfp_social \
  -f EMAILADDR,DOMAIN_NAME,PHYSICAL_ADDRESS,PHONE_NUMBER,SOCIAL_MEDIA

IP Address and Network Analysis¶

# IP address investigation
spiderfoot -s 192.168.1.1 -t IP_ADDRESS \
  -m sfp_shodan,sfp_censys,sfp_virustotal,sfp_reputation,sfp_geoip,sfp_port_scan \
  -f GEOINFO,NETBLOCK_OWNER,OPEN_TCP_PORT,WEBSERVER_TECHNOLOGY,VULNERABILITY

# Network range investigation
spiderfoot -s 192.168.1.0/24 -t NETBLOCK_OWNER \
  -m sfp_shodan,sfp_censys,sfp_port_scan,sfp_banner_grab \
  -f IP_ADDRESS,OPEN_TCP_PORT,WEBSERVER_TECHNOLOGY,OPERATING_SYSTEM

# ASN investigation
spiderfoot -s AS12345 -t BGP_AS_OWNER \
  -m sfp_bgpview,sfp_shodan,sfp_censys \
  -f NETBLOCK_OWNER,IP_ADDRESS,DOMAIN_NAME

# Social media investigation
spiderfoot -s target.com -t DOMAIN \
  -m sfp_social,sfp_twitter,sfp_linkedin,sfp_facebook,sfp_instagram \
  -f SOCIAL_MEDIA,EMAILADDR,HUMAN_NAME,PHONE_NUMBER

# Dark web monitoring
spiderfoot -s target.com -t DOMAIN \
  -m sfp_darkweb,sfp_onion,sfp_pastebin,sfp_leakdb \
  -f DARKWEB_MENTION,BREACH_DATA,LEAKED_DOCUMENT,PASTE_SITE

# Cryptocurrency investigation
spiderfoot -s 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa -t BITCOIN_ADDRESS \
  -m sfp_blockchain,sfp_bitcoinabuse,sfp_cryptocurrency \
  -f BITCOIN_ADDRESS,CRYPTOCURRENCY_ADDRESS,MALICIOUS_CRYPTOCURRENCY

API Integration and Automation¶

Python API Usage¶

# SpiderFoot Python API integration
import requests
import json
import time
from typing import Dict, List, Optional

class SpiderFootAPI:
    def __init__(self, base_url: str = "http://localhost:5001"):
        self.base_url = base_url.rstrip('/')
        self.session = requests.Session()

    def start_scan(self, target: str, target_type: str, modules: List[str] = None) -> str:
        """Start a new scan and return scan ID"""

        scan_data = {
            'scanname': f"API Scan - {target}",
            'scantarget': target,
            'targettype': target_type,
            'modulelist': ','.join(modules) if modules else '',
            'typelist': 'all'
        }

        response = self.session.post(
            f"{self.base_url}/startscan",
            data=scan_data
        )

        if response.status_code == 200:
            result = response.json()
            return result.get('id')
        else:
            raise Exception(f"Failed to start scan: {response.text}")

    def get_scan_status(self, scan_id: str) -> Dict:
        """Get scan status and progress"""
        response = self.session.get(f"{self.base_url}/scanstatus?id={scan_id}")

        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Failed to get scan status: {response.text}")

    def wait_for_scan_completion(self, scan_id: str, timeout: int = 3600) -> Dict:
        """Wait for scan to complete with timeout"""
        start_time = time.time()

        while time.time() - start_time < timeout:
            status = self.get_scan_status(scan_id)

            if status.get('status') in ['FINISHED', 'ABORTED', 'FAILED']:
                return status

            print(f"Scan progress: {status.get('status')} - {status.get('progress', 0)}%")
            time.sleep(30)

        raise TimeoutError("Scan did not complete within timeout")

    def get_scan_results(self, scan_id: str, data_type: str = None) -> List[Dict]:
        """Get scan results, optionally filtered by data type"""
        params = {'id': scan_id}
        if data_type:
            params['type'] = data_type

        response = self.session.get(f"{self.base_url}/scaneventresults", params=params)

        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Failed to get scan results: {response.text}")

    def export_scan_results(self, scan_id: str, format: str = 'json') -> str:
        """Export scan results in specified format"""
        params = {
            'id': scan_id,
            'type': format
        }

        response = self.session.get(f"{self.base_url}/scaneventresultexport", params=params)

        if response.status_code == 200:
            return response.text
        else:
            raise Exception(f"Failed to export scan results: {response.text}")

    def delete_scan(self, scan_id: str) -> bool:
        """Delete scan and its data"""
        response = self.session.get(f"{self.base_url}/scandelete?id={scan_id}")

        return response.status_code == 200

    def get_available_modules(self) -> Dict:
        """Get list of available modules"""
        response = self.session.get(f"{self.base_url}/modules")

        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Failed to get modules: {response.text}")

    def bulk_scan(self, targets: List[Dict], wait_for_completion: bool = True) -> List[Dict]:
        """Perform bulk scanning of multiple targets"""
        scan_results = []

        for target_info in targets:
            target = target_info['target']
            target_type = target_info['type']
            modules = target_info.get('modules', [])

            print(f"Starting scan for {target}")

            try:
                scan_id = self.start_scan(target, target_type, modules)

                scan_info = {
                    'target': target,
                    'scan_id': scan_id,
                    'status': 'started'
                }

                if wait_for_completion:
                    final_status = self.wait_for_scan_completion(scan_id)
                    scan_info['status'] = final_status.get('status')
                    scan_info['results'] = self.get_scan_results(scan_id)

                scan_results.append(scan_info)

            except Exception as e:
                print(f"Failed to scan {target}: {str(e)}")
                scan_results.append({
                    'target': target,
                    'status': 'failed',
                    'error': str(e)
                })

        return scan_results

# Usage example
api = SpiderFootAPI("http://localhost:5001")

# Single target scan
scan_id = api.start_scan("example.com", "DOMAIN", ["sfp_dnsresolve", "sfp_whois", "sfp_virustotal"])
print(f"Started scan: {scan_id}")

# Wait for completion
final_status = api.wait_for_scan_completion(scan_id)
print(f"Scan completed: {final_status['status']}")

# Get results
results = api.get_scan_results(scan_id)
print(f"Found {len(results)} events")

# Export results
json_results = api.export_scan_results(scan_id, 'json')
with open(f"scan_{scan_id}_results.json", 'w') as f:
    f.write(json_results)

# Bulk scanning
targets = [
    {'target': 'example.com', 'type': 'DOMAIN', 'modules': ['sfp_dnsresolve', 'sfp_whois']},
    {'target': 'test.com', 'type': 'DOMAIN', 'modules': ['sfp_dnsresolve', 'sfp_virustotal']},
    {'target': '192.168.1.1', 'type': 'IP_ADDRESS', 'modules': ['sfp_shodan', 'sfp_geoip']}
]

bulk_results = api.bulk_scan(targets)
print(f"Completed {len(bulk_results)} scans")

Automated Reporting and Analysis¶

# Automated SpiderFoot reporting and analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter
import re

class SpiderFootAnalyzer:
    def __init__(self, api: SpiderFootAPI):
        self.api = api

    def analyze_scan_results(self, scan_id: str) -> Dict:
        """Comprehensive analysis of scan results"""

        # Get all scan results
        results = self.api.get_scan_results(scan_id)

        # Convert to DataFrame for analysis
        df = pd.DataFrame(results)

        # Basic statistics
        stats = {
            'total_events': len(results),
            'unique_data_types': df['type'].nunique(),
            'data_type_distribution': df['type'].value_counts().to_dict(),
            'module_distribution': df['module'].value_counts().to_dict(),
            'confidence_distribution': df['confidence'].value_counts().to_dict()
        }

        # Risk analysis
        risk_analysis = self.analyze_risk_indicators(results)

        # Network analysis
        network_analysis = self.analyze_network_data(results)

        # Email and person analysis
        person_analysis = self.analyze_person_data(results)

        # Threat intelligence analysis
        threat_analysis = self.analyze_threat_intelligence(results)

        return {
            'statistics': stats,
            'risk_analysis': risk_analysis,
            'network_analysis': network_analysis,
            'person_analysis': person_analysis,
            'threat_analysis': threat_analysis
        }

    def analyze_risk_indicators(self, results: List[Dict]) -> Dict:
        """Analyze risk indicators from scan results"""

        risk_indicators = {
            'high_risk': [],
            'medium_risk': [],
            'low_risk': [],
            'informational': []
        }

        # Define risk patterns
        high_risk_patterns = [
            'VULNERABILITY',
            'MALWARE',
            'BLACKLISTED',
            'BREACH_DATA',
            'DARKWEB_MENTION'
        ]

        medium_risk_patterns = [
            'OPEN_TCP_PORT',
            'SSL_CERTIFICATE_EXPIRED',
            'WEBSERVER_TECHNOLOGY',
            'OPERATING_SYSTEM'
        ]

        for result in results:
            data_type = result.get('type', '')
            data_value = result.get('data', '')

            if any(pattern in data_type for pattern in high_risk_patterns):
                risk_indicators['high_risk'].append(result)
            elif any(pattern in data_type for pattern in medium_risk_patterns):
                risk_indicators['medium_risk'].append(result)
            elif data_type in ['SUBDOMAIN', 'IP_ADDRESS', 'EMAILADDR']:
                risk_indicators['low_risk'].append(result)
            else:
                risk_indicators['informational'].append(result)

        # Calculate risk score
        risk_score = (
            len(risk_indicators['high_risk']) * 10 +
            len(risk_indicators['medium_risk']) * 5 +
            len(risk_indicators['low_risk']) * 1
        )

        return {
            'risk_score': risk_score,
            'risk_indicators': risk_indicators,
            'risk_summary': {
                'high_risk_count': len(risk_indicators['high_risk']),
                'medium_risk_count': len(risk_indicators['medium_risk']),
                'low_risk_count': len(risk_indicators['low_risk']),
                'informational_count': len(risk_indicators['informational'])
            }
        }

    def analyze_network_data(self, results: List[Dict]) -> Dict:
        """Analyze network-related data"""

        network_data = {
            'subdomains': [],
            'ip_addresses': [],
            'open_ports': [],
            'ssl_certificates': [],
            'web_technologies': []
        }

        for result in results:
            data_type = result.get('type', '')
            data_value = result.get('data', '')

            if data_type == 'SUBDOMAIN':
                network_data['subdomains'].append(data_value)
            elif data_type == 'IP_ADDRESS':
                network_data['ip_addresses'].append(data_value)
            elif data_type == 'OPEN_TCP_PORT':
                network_data['open_ports'].append(data_value)
            elif data_type == 'SSL_CERTIFICATE':
                network_data['ssl_certificates'].append(data_value)
            elif data_type == 'WEBSERVER_TECHNOLOGY':
                network_data['web_technologies'].append(data_value)

        # Analysis
        analysis = {
            'subdomain_count': len(set(network_data['subdomains'])),
            'ip_count': len(set(network_data['ip_addresses'])),
            'unique_ports': list(set([port.split(':')[-1] for port in network_data['open_ports']])),
            'technology_stack': Counter(network_data['web_technologies']),
            'attack_surface': {
                'external_subdomains': len(set(network_data['subdomains'])),
                'exposed_services': len(network_data['open_ports']),
                'ssl_endpoints': len(network_data['ssl_certificates'])
            }
        }

        return analysis

    def analyze_person_data(self, results: List[Dict]) -> Dict:
        """Analyze person and email related data"""

        person_data = {
            'email_addresses': [],
            'social_media': [],
            'phone_numbers': [],
            'physical_addresses': [],
            'breach_data': []
        }

        for result in results:
            data_type = result.get('type', '')
            data_value = result.get('data', '')

            if data_type == 'EMAILADDR':
                person_data['email_addresses'].append(data_value)
            elif data_type == 'SOCIAL_MEDIA':
                person_data['social_media'].append(data_value)
            elif data_type == 'PHONE_NUMBER':
                person_data['phone_numbers'].append(data_value)
            elif data_type == 'PHYSICAL_ADDRESS':
                person_data['physical_addresses'].append(data_value)
            elif data_type == 'BREACH_DATA':
                person_data['breach_data'].append(data_value)

        # Email domain analysis
        email_domains = [email.split('@')[1] for email in person_data['email_addresses'] if '@' in email]

        analysis = {
            'email_count': len(set(person_data['email_addresses'])),
            'email_domains': Counter(email_domains),
            'social_media_count': len(person_data['social_media']),
            'breach_exposure': len(person_data['breach_data']),
            'contact_info_exposure': {
                'emails': len(person_data['email_addresses']),
                'phones': len(person_data['phone_numbers']),
                'addresses': len(person_data['physical_addresses'])
            }
        }

        return analysis

    def generate_report(self, scan_id: str, output_file: str = None) -> str:
        """Generate comprehensive HTML report"""

        analysis = self.analyze_scan_results(scan_id)

        html_report = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>SpiderFoot Scan Analysis Report</title>
            <style>
                body {{ font-family: Arial, sans-serif; margin: 40px; }}
                .header {{ background-color: #2c3e50; color: white; padding: 20px; }}
                .section {{ margin: 20px 0; padding: 15px; border-left: 4px solid #3498db; }}
                .risk-high {{ border-left-color: #e74c3c; }}
                .risk-medium {{ border-left-color: #f39c12; }}
                .risk-low {{ border-left-color: #27ae60; }}
                table {{ border-collapse: collapse; width: 100%; }}
                th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
                th {{ background-color: #f2f2f2; }}
            </style>
        </head>
        <body>
            <div class="header">
                <h1>SpiderFoot Scan Analysis Report</h1>
                <p>Scan ID: {scan_id}</p>
                <p>Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
            </div>

            <div class="section">
                <h2>Executive Summary</h2>
                <p><strong>Risk Score:</strong> {analysis['risk_analysis']['risk_score']}</p>
                <p><strong>Total Events:</strong> {analysis['statistics']['total_events']}</p>
                <p><strong>High Risk Findings:</strong> {analysis['risk_analysis']['risk_summary']['high_risk_count']}</p>
                <p><strong>Attack Surface:</strong> {analysis['network_analysis']['attack_surface']['external_subdomains']} subdomains, {analysis['network_analysis']['attack_surface']['exposed_services']} exposed services</p>
            </div>

            <div class="section risk-high">
                <h2>High Risk Findings</h2>
                <p>Found {analysis['risk_analysis']['risk_summary']['high_risk_count']} high-risk indicators requiring immediate attention.</p>
            </div>

            <div class="section">
                <h2>Network Analysis</h2>
                <table>
                    <tr><th>Metric</th><th>Count</th></tr>
                    <tr><td>Unique Subdomains</td><td>{analysis['network_analysis']['subdomain_count']}</td></tr>
                    <tr><td>IP Addresses</td><td>{analysis['network_analysis']['ip_count']}</td></tr>
                    <tr><td>Exposed Services</td><td>{analysis['network_analysis']['attack_surface']['exposed_services']}</td></tr>
                    <tr><td>SSL Endpoints</td><td>{analysis['network_analysis']['attack_surface']['ssl_endpoints']}</td></tr>
                </table>
            </div>

            <div class="section">
                <h2>Data Exposure Analysis</h2>
                <table>
                    <tr><th>Data Type</th><th>Count</th></tr>
                    <tr><td>Email Addresses</td><td>{analysis['person_analysis']['email_count']}</td></tr>
                    <tr><td>Social Media Profiles</td><td>{analysis['person_analysis']['social_media_count']}</td></tr>
                    <tr><td>Breach Exposures</td><td>{analysis['person_analysis']['breach_exposure']}</td></tr>
                </table>
            </div>

            <div class="section">
                <h2>Recommendations</h2>
                <ul>
                    <li>Review and remediate high-risk findings immediately</li>
                    <li>Implement subdomain monitoring for {analysis['network_analysis']['subdomain_count']} discovered subdomains</li>
                    <li>Secure exposed services and unnecessary open ports</li>
                    <li>Monitor for data breaches affecting discovered email addresses</li>
                    <li>Implement security awareness training for exposed personnel</li>
                </ul>
            </div>
        </body>
        </html>
        """

        if output_file:
            with open(output_file, 'w') as f:
                f.write(html_report)
            return output_file
        else:
            return html_report

# Usage example
api = SpiderFootAPI()
analyzer = SpiderFootAnalyzer(api)

# Analyze scan results
analysis = analyzer.analyze_scan_results("scan_123")
print(f"Risk Score: {analysis['risk_analysis']['risk_score']}")
print(f"High Risk Findings: {analysis['risk_analysis']['risk_summary']['high_risk_count']}")

# Generate report
report_file = analyzer.generate_report("scan_123", "spiderfoot_analysis_report.html")
print(f"Report generated: {report_file}")

Integration with Other Tools¶

Integration with Metasploit¶

# Export SpiderFoot results for Metasploit
spiderfoot -e scan_id -o csv | grep "IP_ADDRESS\|OPEN_TCP_PORT" > targets.csv

# Convert to Metasploit workspace format
python3 << 'EOF'
import csv
import xml.etree.ElementTree as ET

# Create Metasploit XML import format
root = ET.Element("nmaprun")
hosts = {}

with open('targets.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['type'] == 'IP_ADDRESS':
            ip = row['data']
            if ip not in hosts:
                hosts[ip] = {'ports': []}
        elif row['type'] == 'OPEN_TCP_PORT':
            ip, port = row['data'].split(':')
            if ip in hosts:
                hosts[ip]['ports'].append(port)

for ip, data in hosts.items():
    host = ET.SubElement(root, "host")
    address = ET.SubElement(host, "address")
    address.set("addr", ip)
    address.set("addrtype", "ipv4")

    ports_elem = ET.SubElement(host, "ports")
    for port in data['ports']:
        port_elem = ET.SubElement(ports_elem, "port")
        port_elem.set("portid", port)
        port_elem.set("protocol", "tcp")

        state = ET.SubElement(port_elem, "state")
        state.set("state", "open")

tree = ET.ElementTree(root)
tree.write("spiderfoot_targets.xml")
print("Metasploit import file created: spiderfoot_targets.xml")
EOF

# Import into Metasploit
msfconsole -q -x "
workspace -a spiderfoot_scan;
db_import spiderfoot_targets.xml;
hosts;
services;
exit"

Integration with Nmap¶

# Extract IP addresses and ports for Nmap scanning
spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="IP_ADDRESS") | .data' | sort -u > ips.txt
spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="OPEN_TCP_PORT") | .data | split(":")[1]' | sort -u > ports.txt

# Perform targeted Nmap scan
nmap -iL ips.txt -p $(cat ports.txt | tr '\n' ',' | sed 's/,$//') -sV -sC -oA spiderfoot_nmap

# Combine results
echo "SpiderFoot discovered $(cat ips.txt | wc -l) IP addresses and $(cat ports.txt | wc -l) unique ports"
echo "Nmap scan results saved to spiderfoot_nmap.*"

Integration with TheHarvester¶

# Use SpiderFoot domains with TheHarvester
spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="DOMAIN_NAME") | .data' | sort -u > domains.txt

# Run TheHarvester on discovered domains
while read domain; do
    echo "Harvesting $domain..."
    theHarvester -d "$domain" -b all -f "${domain}_harvest"
done < domains.txt

# Combine email results
cat *_harvest.json | jq -r '.emails[]?' | sort -u > combined_emails.txt
echo "Found $(cat combined_emails.txt | wc -l) unique email addresses"

Integration with Amass¶

# Use SpiderFoot results to seed Amass enumeration
spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="DOMAIN_NAME") | .data' | sort -u > seed_domains.txt

# Run Amass with SpiderFoot seeds
amass enum -df seed_domains.txt -active -brute -o amass_results.txt

# Compare results
echo "SpiderFoot found $(spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="SUBDOMAIN") | .data' | sort -u | wc -l) subdomains"
echo "Amass found $(cat amass_results.txt | wc -l) subdomains"

# Find new subdomains discovered by Amass
comm -13 <(spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="SUBDOMAIN") | .data' | sort -u) <(sort -u amass_results.txt) > new_subdomains.txt
echo "Amass discovered $(cat new_subdomains.txt | wc -l) additional subdomains"

Performance Optimization and Troubleshooting¶

Performance Tuning¶

# Optimize SpiderFoot performance
# Edit ~/.spiderfoot/spiderfoot.conf

# Increase thread count for faster scanning
__threads = 20

# Adjust request delays to avoid rate limiting
__delay = 1

# Increase timeout for slow responses
__timeout = 30

# Optimize database settings
__database = /tmp/spiderfoot.db  # Use faster storage
__dbpragmas = journal_mode=WAL,synchronous=NORMAL,cache_size=10000

# Memory optimization
__maxmemory = 2048  # MB

# Network optimization
__useragent = Mozilla/5.0 (compatible; SpiderFoot)
__proxy = http://proxy.example.com:8080  # Use proxy for better performance

Monitoring and Logging¶

# Enable detailed logging
spiderfoot -l 127.0.0.1:5001 -d

# Monitor scan progress
tail -f ~/.spiderfoot/spiderfoot.log

# Check system resources
ps aux | grep spiderfoot
netstat -tulpn | grep 5001

# Database optimization
sqlite3 ~/.spiderfoot/spiderfoot.db "VACUUM;"
sqlite3 ~/.spiderfoot/spiderfoot.db "ANALYZE;"

# Clean old scan data
spiderfoot -D  # Delete all scan data
# Or delete specific scans via web interface

Common Issues and Solutions¶

# Issue: API rate limiting
# Solution: Configure delays and use API keys
echo "api_delay = 2" >> ~/.spiderfoot/spiderfoot.conf

# Issue: Memory usage too high
# Solution: Limit concurrent modules and use passive scanning
spiderfoot -s target.com -t DOMAIN -T 5 -p

# Issue: Slow database performance
# Solution: Use WAL mode and optimize database
sqlite3 ~/.spiderfoot/spiderfoot.db "PRAGMA journal_mode=WAL;"
sqlite3 ~/.spiderfoot/spiderfoot.db "PRAGMA synchronous=NORMAL;"

# Issue: Module errors
# Solution: Check module configuration and API keys
spiderfoot -M sfp_virustotal  # Check specific module
grep "ERROR" ~/.spiderfoot/spiderfoot.log | tail -20

# Issue: Web interface not accessible
# Solution: Check binding and firewall
netstat -tulpn | grep 5001
sudo ufw allow 5001/tcp  # If using UFW firewall

Custom Deployment and Scaling¶

# Docker deployment with custom configuration
docker run -d \
  --name spiderfoot \
  -p 5001:5001 \
  -v /path/to/config:/home/spiderfoot/.spiderfoot \
  -v /path/to/data:/home/spiderfoot/data \
  -e SF_THREADS=20 \
  -e SF_DELAY=1 \
  spiderfoot/spiderfoot

# Docker Compose for production deployment
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  spiderfoot:
    image: spiderfoot/spiderfoot:latest
    ports:
      - "5001:5001"
    volumes:
      - ./config:/home/spiderfoot/.spiderfoot
      - ./data:/home/spiderfoot/data
    environment:
      - SF_THREADS=20
      - SF_DELAY=1
      - SF_TIMEOUT=30
    restart: unless-stopped

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - spiderfoot
    restart: unless-stopped
EOF

# Kubernetes deployment
cat > spiderfoot-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spiderfoot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spiderfoot
  template:
    metadata:
      labels:
        app: spiderfoot
    spec:
      containers:
      - name: spiderfoot
        image: spiderfoot/spiderfoot:latest
        ports:
        - containerPort: 5001
        env:
        - name: SF_THREADS
          value: "20"
        - name: SF_DELAY
          value: "1"
        volumeMounts:
        - name: config
          mountPath: /home/spiderfoot/.spiderfoot
        - name: data
          mountPath: /home/spiderfoot/data
      volumes:
      - name: config
        configMap:
          name: spiderfoot-config
      - name: data
        persistentVolumeClaim:
          claimName: spiderfoot-data
---
apiVersion: v1
kind: Service
metadata:
  name: spiderfoot-service
spec:
  selector:
    app: spiderfoot
  ports:
  - port: 80
    targetPort: 5001
  type: LoadBalancer
EOF

kubectl apply -f spiderfoot-deployment.yaml

SpiderFoot Cheat Sheet¶

Overview¶

Installation and Setup¶

Installation Methods¶

Initial Configuration¶

API Keys Configuration¶

Command Line Usage¶

Basic Scanning Operations¶

Advanced Scanning Options¶

Scan Management¶

Web Interface Usage¶

Starting the Web Interface¶

Web Interface Navigation¶

Scan Configuration Presets¶

Module Configuration and Customization¶

Core Modules Overview¶

Custom Module Development¶

Module Configuration Files¶

Advanced OSINT Techniques¶

Comprehensive Domain Investigation¶

Email and Person Investigation¶

IP Address and Network Analysis¶

API Integration and Automation¶

Python API Usage¶

Automated Reporting and Analysis¶

Integration with Other Tools¶

Integration with Metasploit¶

Integration with Nmap¶

Integration with TheHarvester¶

Integration with Amass¶

Performance Optimization and Troubleshooting¶

Performance Tuning¶

Monitoring and Logging¶

Common Issues and Solutions¶

Custom Deployment and Scaling¶

Resources¶

Documentation and Community¶

Training and Tutorials¶

SpiderFoot Cheat Sheet¶

Overview¶

Installation and Setup¶

Installation Methods¶

Initial Configuration¶

API Keys Configuration¶

Command Line Usage¶

Basic Scanning Operations¶

Advanced Scanning Options¶

Scan Management¶

Web Interface Usage¶

Starting the Web Interface¶

Web Interface Navigation¶

Scan Configuration Presets¶

Module Configuration and Customization¶

Core Modules Overview¶

Custom Module Development¶

Module Configuration Files¶

Advanced OSINT Techniques¶

Comprehensive Domain Investigation¶

Email and Person Investigation¶

IP Address and Network Analysis¶

Social Media and Dark Web Investigation¶

API Integration and Automation¶

Python API Usage¶

Automated Reporting and Analysis¶

Integration with Other Tools¶

Integration with Metasploit¶

Integration with Nmap¶

Integration with TheHarvester¶

Integration with Amass¶

Performance Optimization and Troubleshooting¶

Performance Tuning¶

Monitoring and Logging¶

Common Issues and Solutions¶

Custom Deployment and Scaling¶

Resources¶

Documentation and Community¶

Training and Tutorials¶

Related Tools and Resources¶