SpiderFoot Cheat Sheet
"Clase de la hoja" id="copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada
Sinopsis
SpiderFoot es una herramienta de automatización de inteligencia de código abierto (OSINT) que realiza reconocimiento e información reuniendo objetivos como direcciones IP, nombres de dominio, direcciones de correo electrónico y nombres. Se integra con más de 200 fuentes de datos para recopilar inteligencia e identificar riesgos de seguridad, lo que lo convierte en un instrumento esencial para los testadores de penetración, investigadores de seguridad y analistas de inteligencia de amenazas.
NOVEDAD Nota: Herramienta de código abierto con edición HX comercial disponible. Siempre asegúrese de tener una autorización adecuada antes de escanear objetivos.
Instalación y configuración
Métodos de instalación
# Method 1: Install from PyPI
pip3 install spiderfoot
# Method 2: Install from GitHub (latest development version)
git clone https://github.com/smicallef/spiderfoot.git
cd spiderfoot
pip3 install -r requirements.txt
# Method 3: Docker installation
docker pull spiderfoot/spiderfoot
docker run -p 5001:5001 spiderfoot/spiderfoot
# Method 4: Using package managers
# Ubuntu/Debian
sudo apt update && sudo apt install spiderfoot
# Arch Linux
yay -S spiderfoot
# macOS with Homebrew
brew install spiderfoot
Configuración inicial
# Create configuration directory
mkdir -p ~/.spiderfoot
cd ~/.spiderfoot
# Generate default configuration
spiderfoot -C
# Edit configuration file
nano spiderfoot.conf
# Key configuration options:
# - __webaddr: Web interface bind address (default: 127.0.0.1)
# - __webport: Web interface port (default: 5001)
# - __database: Database file location
# - __logfile: Log file location
# - __modules: Module directory path
Configuración de claves de API
# Configure API keys for enhanced data sources
# Edit ~/.spiderfoot/spiderfoot.conf
# Common API configurations:
[api_keys]
# VirusTotal API key
virustotal_api_key = your_virustotal_api_key
# Shodan API key
shodan_api_key = your_shodan_api_key
# Have I Been Pwned API key
hibp_api_key = your_hibp_api_key
# SecurityTrails API key
securitytrails_api_key = your_securitytrails_api_key
# PassiveTotal API key
passivetotal_api_key = your_passivetotal_api_key
passivetotal_username = your_passivetotal_username
# AlienVault OTX API key
otx_api_key = your_otx_api_key
# Censys API credentials
censys_api_id = your_censys_api_id
censys_api_secret = your_censys_api_secret
# Hunter.io API key
hunter_api_key = your_hunter_api_key
# Clearbit API key
clearbit_api_key = your_clearbit_api_key
Uso de la línea de comandos
Operaciones básicas de exploración
# Basic domain scan
spiderfoot -s example.com -t DOMAIN
# IP address scan
spiderfoot -s 192.168.1.1 -t IP_ADDRESS
# Email address investigation
spiderfoot -s user@example.com -t EMAILADDR
# Human name investigation
spiderfoot -s "John Smith" -t HUMAN_NAME
# Multiple targets scan
spiderfoot -s "example.com,192.168.1.1" -t DOMAIN,IP_ADDRESS
# Scan with specific modules only
spiderfoot -s example.com -t DOMAIN -m sfp_dnsresolve,sfp_whois,sfp_virustotal
# Exclude specific modules
spiderfoot -s example.com -t DOMAIN -x sfp_social,sfp_pgp
Opciones avanzadas de exploración
# Passive scan only (no active probing)
spiderfoot -s example.com -t DOMAIN -p
# Scan with custom user agent
spiderfoot -s example.com -t DOMAIN -u "Mozilla/5.0 Custom Agent"
# Scan with proxy
spiderfoot -s example.com -t DOMAIN -y http://proxy.example.com:8080
# Scan with custom timeout
spiderfoot -s example.com -t DOMAIN -w 30
# Scan with maximum threads
spiderfoot -s example.com -t DOMAIN -T 10
# Scan with output to file
spiderfoot -s example.com -t DOMAIN -o json > scan_results.json
# Scan with specific data types
spiderfoot -s example.com -t DOMAIN -f SUBDOMAIN,IP_ADDRESS,EMAILADDR
Scan Management
# List available modules
spiderfoot -M
# Get module information
spiderfoot -M sfp_virustotal
# List available data types
spiderfoot -F
# List running scans
spiderfoot -l
# Stop a running scan
spiderfoot -q scan_id
# Delete scan data
spiderfoot -d scan_id
# Export scan results
spiderfoot -e scan_id -o json > results.json
spiderfoot -e scan_id -o csv > results.csv
spiderfoot -e scan_id -o xml > results.xml
Uso de la interfaz web
Inicio de la Interfaz Web
# Start web interface (default: http://127.0.0.1:5001)
spiderfoot -l 127.0.0.1:5001
# Start web interface on all interfaces
spiderfoot -l 0.0.0.0:5001
# Start with custom configuration
spiderfoot -c /path/to/custom.conf -l 127.0.0.1:5001
# Start in background
nohup spiderfoot -l 127.0.0.1:5001 &
# Start with Docker
docker run -p 5001:5001 -d spiderfoot/spiderfoot
Web Interface Navegación
# Access web interface
# Navigate to http://localhost:5001
# Main sections:
# - New Scan: Create new scans
# - Scans: View and manage existing scans
# - Browse: Browse scan results by data type
# - Search: Search across all scan data
# - Settings: Configure modules and API keys
# - About: System information and statistics
# Scan creation workflow:
# 1. Enter target (domain, IP, email, etc.)
# 2. Select scan modules or use presets
# 3. Configure scan options
# 4. Start scan and monitor progress
# 5. Review results and export data
Prestaciones de configuración de exploración
{
"scan_presets": {
"passive_recon": {
"description": "Passive reconnaissance without active probing",
"modules": [
"sfp_dnsresolve",
"sfp_whois",
"sfp_virustotal",
"sfp_shodan",
"sfp_censys",
"sfp_passivetotal",
"sfp_securitytrails",
"sfp_threatcrowd",
"sfp_otx"
],
"passive_only": true
},
"comprehensive_domain": {
"description": "Comprehensive domain investigation",
"modules": [
"sfp_dnsresolve",
"sfp_dnsbrute",
"sfp_whois",
"sfp_virustotal",
"sfp_shodan",
"sfp_censys",
"sfp_subdomain_enum",
"sfp_ssl_analyze",
"sfp_port_scan",
"sfp_web_crawl"
]
},
"email_investigation": {
"description": "Email address and person investigation",
"modules": [
"sfp_hunter",
"sfp_clearbit",
"sfp_hibp",
"sfp_social",
"sfp_pgp",
"sfp_gravatar",
"sfp_fullcontact",
"sfp_pipl"
]
},
"threat_intelligence": {
"description": "Threat intelligence gathering",
"modules": [
"sfp_virustotal",
"sfp_otx",
"sfp_threatcrowd",
"sfp_malwaredomains",
"sfp_reputation",
"sfp_blacklist",
"sfp_phishing",
"sfp_malware"
]
}
}
}
Configuración y personalización del módulo
Resumen de los módulos básicos
# DNS and Domain Modules
sfp_dnsresolve # DNS resolution and record enumeration
sfp_dnsbrute # DNS subdomain brute forcing
sfp_whois # WHOIS information gathering
sfp_subdomain_enum # Subdomain enumeration from various sources
# Network and Infrastructure
sfp_port_scan # Port scanning and service detection
sfp_ssl_analyze # SSL/TLS certificate analysis
sfp_banner_grab # Service banner grabbing
sfp_traceroute # Network path tracing
# Threat Intelligence
sfp_virustotal # VirusTotal integration
sfp_otx # AlienVault OTX integration
sfp_threatcrowd # ThreatCrowd integration
sfp_reputation # Reputation checking across sources
# Search Engines and OSINT
sfp_google # Google search integration
sfp_bing # Bing search integration
sfp_shodan # Shodan integration
sfp_censys # Censys integration
# Social Media and People
sfp_social # Social media profile discovery
sfp_hunter # Email address discovery
sfp_clearbit # Company and person enrichment
sfp_hibp # Have I Been Pwned integration
# Web Application
sfp_web_crawl # Web application crawling
sfp_web_analyze # Web technology identification
sfp_robots # robots.txt analysis
sfp_sitemap # Sitemap discovery and analysis
Desarrollo de módulos personalizados
# Example custom module: sfp_custom_example.py
import re
import requests
from spiderfoot import SpiderFootEvent, SpiderFootPlugin
class sfp_custom_example(SpiderFootPlugin):
"""Custom SpiderFoot module example"""
meta = {
'name': "Custom Example Module",
'summary': "Example custom module for demonstration",
'flags': [""],
'useCases': ["Investigate", "Passive"],
'categories': ["Search Engines"],
'dataSource': {
'website': "https://example.com",
'model': "FREE_NOAUTH_UNLIMITED",
'references': ["https://example.com/api"],
'favIcon': "https://example.com/favicon.ico",
'logo': "https://example.com/logo.png",
'description': "Example data source for custom module"
}
}
# Default options
opts = {
'api_key': '',
'max_results': 100,
'timeout': 30
}
# Option descriptions
optdescs = {
'api_key': "API key for the service",
'max_results': "Maximum number of results to return",
'timeout': "Request timeout in seconds"
}
# What events this module accepts for input
events = {
'DOMAIN_NAME': ['SUBDOMAIN', 'EMAILADDR'],
'IP_ADDRESS': ['GEOINFO', 'NETBLOCK_OWNER']
}
def setup(self, sfc, userOpts=dict()):
self.sf = sfc
self.results = self.tempStorage()
# Override default options with user settings
for opt in list(userOpts.keys()):
self.opts[opt] = userOpts[opt]
def watchedEvents(self):
"""Events this module will accept as input"""
return list(self.events.keys())
def producedEvents(self):
"""Events this module will produce"""
evts = []
for eventType in self.events:
evts.extend(self.events[eventType])
return evts
def handleEvent(self, event):
"""Handle incoming events"""
eventName = event.eventType
srcModuleName = event.module
eventData = event.data
# Don't process events from ourselves
if srcModuleName == "sfp_custom_example":
return
# Check if we've already processed this data
if eventData in self.results:
return
self.results[eventData] = True
self.sf.debug(f"Received event, {eventName}, from {srcModuleName}")
# Process different event types
if eventName == 'DOMAIN_NAME':
self.processDomain(eventData, event)
elif eventName == 'IP_ADDRESS':
self.processIP(eventData, event)
def processDomain(self, domain, parentEvent):
"""Process domain name events"""
try:
# Example: Query custom API for domain information
url = f"https://api.example.com/domain/{domain}"
headers = {'Authorization': f'Bearer {self.opts["api_key"]}'}
response = requests.get(
url,
headers=headers,
timeout=self.opts['timeout']
)
if response.status_code == 200:
data = response.json()
# Extract subdomains
if 'subdomains' in data:
for subdomain in data['subdomains'][:self.opts['max_results']]:
evt = SpiderFootEvent(
'SUBDOMAIN',
subdomain,
self.__name__,
parentEvent
)
self.notifyListeners(evt)
# Extract email addresses
if 'emails' in data:
for email in data['emails'][:self.opts['max_results']]:
evt = SpiderFootEvent(
'EMAILADDR',
email,
self.__name__,
parentEvent
)
self.notifyListeners(evt)
except Exception as e:
self.sf.error(f"Error processing domain {domain}: {str(e)}")
def processIP(self, ip, parentEvent):
"""Process IP address events"""
try:
# Example: Query custom API for IP information
url = f"https://api.example.com/ip/{ip}"
headers = {'Authorization': f'Bearer {self.opts["api_key"]}'}
response = requests.get(
url,
headers=headers,
timeout=self.opts['timeout']
)
if response.status_code == 200:
data = response.json()
# Extract geolocation info
if 'location' in data:
location = f"{data['location']['city']}, {data['location']['country']}"
evt = SpiderFootEvent(
'GEOINFO',
location,
self.__name__,
parentEvent
)
self.notifyListeners(evt)
except Exception as e:
self.sf.error(f"Error processing IP {ip}: {str(e)}")
# Install custom module:
# 1. Save as sfp_custom_example.py in modules/ directory
# 2. Restart SpiderFoot
# 3. Module will appear in available modules list
Archivos de configuración del módulo
# ~/.spiderfoot/modules.conf
# Module-specific configuration
[sfp_virustotal]
api_key = your_virustotal_api_key
max_results = 100
timeout = 30
[sfp_shodan]
api_key = your_shodan_api_key
max_results = 50
timeout = 20
[sfp_hunter]
api_key = your_hunter_api_key
max_results = 25
verify_emails = true
[sfp_censys]
api_id = your_censys_api_id
api_secret = your_censys_api_secret
max_results = 100
[sfp_passivetotal]
api_key = your_passivetotal_api_key
username = your_passivetotal_username
max_results = 200
[sfp_hibp]
api_key = your_hibp_api_key
check_pastes = true
truncate_response = false
Técnicas avanzadas de OSINT
Investigación integral del dominio
# Multi-stage domain investigation
# Stage 1: Passive reconnaissance
spiderfoot -s target.com -t DOMAIN -m sfp_dnsresolve,sfp_whois,sfp_virustotal,sfp_passivetotal -p
# Stage 2: Subdomain enumeration
spiderfoot -s target.com -t DOMAIN -m sfp_dnsbrute,sfp_subdomain_enum,sfp_crt,sfp_google
# Stage 3: Infrastructure analysis
spiderfoot -s target.com -t DOMAIN -m sfp_port_scan,sfp_ssl_analyze,sfp_banner_grab,sfp_web_analyze
# Stage 4: Threat intelligence
spiderfoot -s target.com -t DOMAIN -m sfp_reputation,sfp_blacklist,sfp_malware,sfp_phishing
# Combined comprehensive scan
spiderfoot -s target.com -t DOMAIN \
-m sfp_dnsresolve,sfp_dnsbrute,sfp_whois,sfp_virustotal,sfp_shodan,sfp_censys,sfp_passivetotal,sfp_subdomain_enum,sfp_ssl_analyze,sfp_port_scan,sfp_web_crawl,sfp_reputation \
-f SUBDOMAIN,IP_ADDRESS,EMAILADDR,SSL_CERTIFICATE,WEBSERVER_TECHNOLOGY,VULNERABILITY
Email and Person Investigation
# Email address investigation
spiderfoot -s user@target.com -t EMAILADDR \
-m sfp_hunter,sfp_clearbit,sfp_hibp,sfp_social,sfp_pgp,sfp_gravatar,sfp_fullcontact \
-f EMAILADDR,SOCIAL_MEDIA,PHONE_NUMBER,PHYSICAL_ADDRESS,BREACH_DATA
# Person name investigation
spiderfoot -s "John Smith" -t HUMAN_NAME \
-m sfp_social,sfp_pipl,sfp_fullcontact,sfp_clearbit,sfp_google,sfp_bing \
-f SOCIAL_MEDIA,EMAILADDR,PHONE_NUMBER,PHYSICAL_ADDRESS,COMPANY_NAME
# Company investigation
spiderfoot -s "Example Corp" -t COMPANY_NAME \
-m sfp_clearbit,sfp_hunter,sfp_google,sfp_bing,sfp_social \
-f EMAILADDR,DOMAIN_NAME,PHYSICAL_ADDRESS,PHONE_NUMBER,SOCIAL_MEDIA
IP Address and Network Analysis
# IP address investigation
spiderfoot -s 192.168.1.1 -t IP_ADDRESS \
-m sfp_shodan,sfp_censys,sfp_virustotal,sfp_reputation,sfp_geoip,sfp_port_scan \
-f GEOINFO,NETBLOCK_OWNER,OPEN_TCP_PORT,WEBSERVER_TECHNOLOGY,VULNERABILITY
# Network range investigation
spiderfoot -s 192.168.1.0/24 -t NETBLOCK_OWNER \
-m sfp_shodan,sfp_censys,sfp_port_scan,sfp_banner_grab \
-f IP_ADDRESS,OPEN_TCP_PORT,WEBSERVER_TECHNOLOGY,OPERATING_SYSTEM
# ASN investigation
spiderfoot -s AS12345 -t BGP_AS_OWNER \
-m sfp_bgpview,sfp_shodan,sfp_censys \
-f NETBLOCK_OWNER,IP_ADDRESS,DOMAIN_NAME
Social Media and Dark Web Investigation
# Social media investigation
spiderfoot -s target.com -t DOMAIN \
-m sfp_social,sfp_twitter,sfp_linkedin,sfp_facebook,sfp_instagram \
-f SOCIAL_MEDIA,EMAILADDR,HUMAN_NAME,PHONE_NUMBER
# Dark web monitoring
spiderfoot -s target.com -t DOMAIN \
-m sfp_darkweb,sfp_onion,sfp_pastebin,sfp_leakdb \
-f DARKWEB_MENTION,BREACH_DATA,LEAKED_DOCUMENT,PASTE_SITE
# Cryptocurrency investigation
spiderfoot -s 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa -t BITCOIN_ADDRESS \
-m sfp_blockchain,sfp_bitcoinabuse,sfp_cryptocurrency \
-f BITCOIN_ADDRESS,CRYPTOCURRENCY_ADDRESS,MALICIOUS_CRYPTOCURRENCY
Integración y automatización de API
Python API Usage
# SpiderFoot Python API integration
import requests
import json
import time
from typing import Dict, List, Optional
class SpiderFootAPI:
def __init__(self, base_url: str = "http://localhost:5001"):
self.base_url = base_url.rstrip('/')
self.session = requests.Session()
def start_scan(self, target: str, target_type: str, modules: List[str] = None) -> str:
"""Start a new scan and return scan ID"""
scan_data = {
'scanname': f"API Scan - {target}",
'scantarget': target,
'targettype': target_type,
'modulelist': ','.join(modules) if modules else '',
'typelist': 'all'
}
response = self.session.post(
f"{self.base_url}/startscan",
data=scan_data
)
if response.status_code == 200:
result = response.json()
return result.get('id')
else:
raise Exception(f"Failed to start scan: {response.text}")
def get_scan_status(self, scan_id: str) -> Dict:
"""Get scan status and progress"""
response = self.session.get(f"{self.base_url}/scanstatus?id={scan_id}")
if response.status_code == 200:
return response.json()
else:
raise Exception(f"Failed to get scan status: {response.text}")
def wait_for_scan_completion(self, scan_id: str, timeout: int = 3600) -> Dict:
"""Wait for scan to complete with timeout"""
start_time = time.time()
while time.time() - start_time < timeout:
status = self.get_scan_status(scan_id)
if status.get('status') in ['FINISHED', 'ABORTED', 'FAILED']:
return status
print(f"Scan progress: {status.get('status')} - {status.get('progress', 0)}%")
time.sleep(30)
raise TimeoutError("Scan did not complete within timeout")
def get_scan_results(self, scan_id: str, data_type: str = None) -> List[Dict]:
"""Get scan results, optionally filtered by data type"""
params = {'id': scan_id}
if data_type:
params['type'] = data_type
response = self.session.get(f"{self.base_url}/scaneventresults", params=params)
if response.status_code == 200:
return response.json()
else:
raise Exception(f"Failed to get scan results: {response.text}")
def export_scan_results(self, scan_id: str, format: str = 'json') -> str:
"""Export scan results in specified format"""
params = {
'id': scan_id,
'type': format
}
response = self.session.get(f"{self.base_url}/scaneventresultexport", params=params)
if response.status_code == 200:
return response.text
else:
raise Exception(f"Failed to export scan results: {response.text}")
def delete_scan(self, scan_id: str) -> bool:
"""Delete scan and its data"""
response = self.session.get(f"{self.base_url}/scandelete?id={scan_id}")
return response.status_code == 200
def get_available_modules(self) -> Dict:
"""Get list of available modules"""
response = self.session.get(f"{self.base_url}/modules")
if response.status_code == 200:
return response.json()
else:
raise Exception(f"Failed to get modules: {response.text}")
def bulk_scan(self, targets: List[Dict], wait_for_completion: bool = True) -> List[Dict]:
"""Perform bulk scanning of multiple targets"""
scan_results = []
for target_info in targets:
target = target_info['target']
target_type = target_info['type']
modules = target_info.get('modules', [])
print(f"Starting scan for {target}")
try:
scan_id = self.start_scan(target, target_type, modules)
scan_info = {
'target': target,
'scan_id': scan_id,
'status': 'started'
}
if wait_for_completion:
final_status = self.wait_for_scan_completion(scan_id)
scan_info['status'] = final_status.get('status')
scan_info['results'] = self.get_scan_results(scan_id)
scan_results.append(scan_info)
except Exception as e:
print(f"Failed to scan {target}: {str(e)}")
scan_results.append({
'target': target,
'status': 'failed',
'error': str(e)
})
return scan_results
# Usage example
api = SpiderFootAPI("http://localhost:5001")
# Single target scan
scan_id = api.start_scan("example.com", "DOMAIN", ["sfp_dnsresolve", "sfp_whois", "sfp_virustotal"])
print(f"Started scan: {scan_id}")
# Wait for completion
final_status = api.wait_for_scan_completion(scan_id)
print(f"Scan completed: {final_status['status']}")
# Get results
results = api.get_scan_results(scan_id)
print(f"Found {len(results)} events")
# Export results
json_results = api.export_scan_results(scan_id, 'json')
with open(f"scan_{scan_id}_results.json", 'w') as f:
f.write(json_results)
# Bulk scanning
targets = [
{'target': 'example.com', 'type': 'DOMAIN', 'modules': ['sfp_dnsresolve', 'sfp_whois']},
{'target': 'test.com', 'type': 'DOMAIN', 'modules': ['sfp_dnsresolve', 'sfp_virustotal']},
{'target': '192.168.1.1', 'type': 'IP_ADDRESS', 'modules': ['sfp_shodan', 'sfp_geoip']}
]
bulk_results = api.bulk_scan(targets)
print(f"Completed {len(bulk_results)} scans")
Informes y análisis automatizados
# Automated SpiderFoot reporting and analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter
import re
class SpiderFootAnalyzer:
def __init__(self, api: SpiderFootAPI):
self.api = api
def analyze_scan_results(self, scan_id: str) -> Dict:
"""Comprehensive analysis of scan results"""
# Get all scan results
results = self.api.get_scan_results(scan_id)
# Convert to DataFrame for analysis
df = pd.DataFrame(results)
# Basic statistics
stats = {
'total_events': len(results),
'unique_data_types': df['type'].nunique(),
'data_type_distribution': df['type'].value_counts().to_dict(),
'module_distribution': df['module'].value_counts().to_dict(),
'confidence_distribution': df['confidence'].value_counts().to_dict()
}
# Risk analysis
risk_analysis = self.analyze_risk_indicators(results)
# Network analysis
network_analysis = self.analyze_network_data(results)
# Email and person analysis
person_analysis = self.analyze_person_data(results)
# Threat intelligence analysis
threat_analysis = self.analyze_threat_intelligence(results)
return {
'statistics': stats,
'risk_analysis': risk_analysis,
'network_analysis': network_analysis,
'person_analysis': person_analysis,
'threat_analysis': threat_analysis
}
def analyze_risk_indicators(self, results: List[Dict]) -> Dict:
"""Analyze risk indicators from scan results"""
risk_indicators = {
'high_risk': [],
'medium_risk': [],
'low_risk': [],
'informational': []
}
# Define risk patterns
high_risk_patterns = [
'VULNERABILITY',
'MALWARE',
'BLACKLISTED',
'BREACH_DATA',
'DARKWEB_MENTION'
]
medium_risk_patterns = [
'OPEN_TCP_PORT',
'SSL_CERTIFICATE_EXPIRED',
'WEBSERVER_TECHNOLOGY',
'OPERATING_SYSTEM'
]
for result in results:
data_type = result.get('type', '')
data_value = result.get('data', '')
if any(pattern in data_type for pattern in high_risk_patterns):
risk_indicators['high_risk'].append(result)
elif any(pattern in data_type for pattern in medium_risk_patterns):
risk_indicators['medium_risk'].append(result)
elif data_type in ['SUBDOMAIN', 'IP_ADDRESS', 'EMAILADDR']:
risk_indicators['low_risk'].append(result)
else:
risk_indicators['informational'].append(result)
# Calculate risk score
risk_score = (
len(risk_indicators['high_risk']) * 10 +
len(risk_indicators['medium_risk']) * 5 +
len(risk_indicators['low_risk']) * 1
)
return {
'risk_score': risk_score,
'risk_indicators': risk_indicators,
'risk_summary': {
'high_risk_count': len(risk_indicators['high_risk']),
'medium_risk_count': len(risk_indicators['medium_risk']),
'low_risk_count': len(risk_indicators['low_risk']),
'informational_count': len(risk_indicators['informational'])
}
}
def analyze_network_data(self, results: List[Dict]) -> Dict:
"""Analyze network-related data"""
network_data = {
'subdomains': [],
'ip_addresses': [],
'open_ports': [],
'ssl_certificates': [],
'web_technologies': []
}
for result in results:
data_type = result.get('type', '')
data_value = result.get('data', '')
if data_type == 'SUBDOMAIN':
network_data['subdomains'].append(data_value)
elif data_type == 'IP_ADDRESS':
network_data['ip_addresses'].append(data_value)
elif data_type == 'OPEN_TCP_PORT':
network_data['open_ports'].append(data_value)
elif data_type == 'SSL_CERTIFICATE':
network_data['ssl_certificates'].append(data_value)
elif data_type == 'WEBSERVER_TECHNOLOGY':
network_data['web_technologies'].append(data_value)
# Analysis
analysis = {
'subdomain_count': len(set(network_data['subdomains'])),
'ip_count': len(set(network_data['ip_addresses'])),
'unique_ports': list(set([port.split(':')[-1] for port in network_data['open_ports']])),
'technology_stack': Counter(network_data['web_technologies']),
'attack_surface': {
'external_subdomains': len(set(network_data['subdomains'])),
'exposed_services': len(network_data['open_ports']),
'ssl_endpoints': len(network_data['ssl_certificates'])
}
}
return analysis
def analyze_person_data(self, results: List[Dict]) -> Dict:
"""Analyze person and email related data"""
person_data = {
'email_addresses': [],
'social_media': [],
'phone_numbers': [],
'physical_addresses': [],
'breach_data': []
}
for result in results:
data_type = result.get('type', '')
data_value = result.get('data', '')
if data_type == 'EMAILADDR':
person_data['email_addresses'].append(data_value)
elif data_type == 'SOCIAL_MEDIA':
person_data['social_media'].append(data_value)
elif data_type == 'PHONE_NUMBER':
person_data['phone_numbers'].append(data_value)
elif data_type == 'PHYSICAL_ADDRESS':
person_data['physical_addresses'].append(data_value)
elif data_type == 'BREACH_DATA':
person_data['breach_data'].append(data_value)
# Email domain analysis
email_domains = [email.split('@')[1] for email in person_data['email_addresses'] if '@' in email]
analysis = {
'email_count': len(set(person_data['email_addresses'])),
'email_domains': Counter(email_domains),
'social_media_count': len(person_data['social_media']),
'breach_exposure': len(person_data['breach_data']),
'contact_info_exposure': {
'emails': len(person_data['email_addresses']),
'phones': len(person_data['phone_numbers']),
'addresses': len(person_data['physical_addresses'])
}
}
return analysis
def generate_report(self, scan_id: str, output_file: str = None) -> str:
"""Generate comprehensive HTML report"""
analysis = self.analyze_scan_results(scan_id)
html_report = f"""
<!DOCTYPE html>
<html>
<head>
<title>SpiderFoot Scan Analysis Report</title>
<style>
body {{ font-family: Arial, sans-serif; margin: 40px; }}
.header {{ background-color: #2c3e50; color: white; padding: 20px; }}
.section {{ margin: 20px 0; padding: 15px; border-left: 4px solid #3498db; }}
.risk-high {{ border-left-color: #e74c3c; }}
.risk-medium {{ border-left-color: #f39c12; }}
.risk-low {{ border-left-color: #27ae60; }}
table {{ border-collapse: collapse; width: 100%; }}
th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
th {{ background-color: #f2f2f2; }}
</style>
</head>
<body>
<div class="header">
<h1>SpiderFoot Scan Analysis Report</h1>
<p>Scan ID: {scan_id}</p>
<p>Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
</div>
<div class="section">
<h2>Executive Summary</h2>
<p><strong>Risk Score:</strong> {analysis['risk_analysis']['risk_score']}</p>
<p><strong>Total Events:</strong> {analysis['statistics']['total_events']}</p>
<p><strong>High Risk Findings:</strong> {analysis['risk_analysis']['risk_summary']['high_risk_count']}</p>
<p><strong>Attack Surface:</strong> {analysis['network_analysis']['attack_surface']['external_subdomains']} subdomains, {analysis['network_analysis']['attack_surface']['exposed_services']} exposed services</p>
</div>
<div class="section risk-high">
<h2>High Risk Findings</h2>
<p>Found {analysis['risk_analysis']['risk_summary']['high_risk_count']} high-risk indicators requiring immediate attention.</p>
</div>
<div class="section">
<h2>Network Analysis</h2>
<table>
<tr><th>Metric</th><th>Count</th></tr>
<tr><td>Unique Subdomains</td><td>{analysis['network_analysis']['subdomain_count']}</td></tr>
<tr><td>IP Addresses</td><td>{analysis['network_analysis']['ip_count']}</td></tr>
<tr><td>Exposed Services</td><td>{analysis['network_analysis']['attack_surface']['exposed_services']}</td></tr>
<tr><td>SSL Endpoints</td><td>{analysis['network_analysis']['attack_surface']['ssl_endpoints']}</td></tr>
</table>
</div>
<div class="section">
<h2>Data Exposure Analysis</h2>
<table>
<tr><th>Data Type</th><th>Count</th></tr>
<tr><td>Email Addresses</td><td>{analysis['person_analysis']['email_count']}</td></tr>
<tr><td>Social Media Profiles</td><td>{analysis['person_analysis']['social_media_count']}</td></tr>
<tr><td>Breach Exposures</td><td>{analysis['person_analysis']['breach_exposure']}</td></tr>
</table>
</div>
<div class="section">
<h2>Recommendations</h2>
<ul>
<li>Review and remediate high-risk findings immediately</li>
<li>Implement subdomain monitoring for {analysis['network_analysis']['subdomain_count']} discovered subdomains</li>
<li>Secure exposed services and unnecessary open ports</li>
<li>Monitor for data breaches affecting discovered email addresses</li>
<li>Implement security awareness training for exposed personnel</li>
</ul>
</div>
</body>
</html>
"""
if output_file:
with open(output_file, 'w') as f:
f.write(html_report)
return output_file
else:
return html_report
# Usage example
api = SpiderFootAPI()
analyzer = SpiderFootAnalyzer(api)
# Analyze scan results
analysis = analyzer.analyze_scan_results("scan_123")
print(f"Risk Score: {analysis['risk_analysis']['risk_score']}")
print(f"High Risk Findings: {analysis['risk_analysis']['risk_summary']['high_risk_count']}")
# Generate report
report_file = analyzer.generate_report("scan_123", "spiderfoot_analysis_report.html")
print(f"Report generated: {report_file}")
Integración con otras herramientas
Integración con Metasploit
# Export SpiderFoot results for Metasploit
| spiderfoot -e scan_id -o csv | grep "IP_ADDRESS\ | OPEN_TCP_PORT" > targets.csv |
# Convert to Metasploit workspace format
python3 << 'EOF'
import csv
import xml.etree.ElementTree as ET
# Create Metasploit XML import format
root = ET.Element("nmaprun")
hosts = {}
with open('targets.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
if row['type'] == 'IP_ADDRESS':
ip = row['data']
if ip not in hosts:
hosts[ip] = {'ports': []}
elif row['type'] == 'OPEN_TCP_PORT':
ip, port = row['data'].split(':')
if ip in hosts:
hosts[ip]['ports'].append(port)
for ip, data in hosts.items():
host = ET.SubElement(root, "host")
address = ET.SubElement(host, "address")
address.set("addr", ip)
address.set("addrtype", "ipv4")
ports_elem = ET.SubElement(host, "ports")
for port in data['ports']:
port_elem = ET.SubElement(ports_elem, "port")
port_elem.set("portid", port)
port_elem.set("protocol", "tcp")
state = ET.SubElement(port_elem, "state")
state.set("state", "open")
tree = ET.ElementTree(root)
tree.write("spiderfoot_targets.xml")
print("Metasploit import file created: spiderfoot_targets.xml")
EOF
# Import into Metasploit
msfconsole -q -x "
workspace -a spiderfoot_scan;
db_import spiderfoot_targets.xml;
hosts;
services;
exit"
Integración con Nmap
# Extract IP addresses and ports for Nmap scanning
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="IP_ADDRESS") | .data' | sort -u > ips.txt |
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="OPEN_TCP_PORT") | .data | split(":")[1]' | sort -u > ports.txt |
# Perform targeted Nmap scan
| nmap -iL ips.txt -p $(cat ports.txt | tr '\n' ',' | sed 's/,$//') -sV -sC -oA spiderfoot_nmap |
# Combine results
| echo "SpiderFoot discovered $(cat ips.txt | wc -l) IP addresses and $(cat ports.txt | wc -l) unique ports" |
echo "Nmap scan results saved to spiderfoot_nmap.*"
Integración con TheHarvester
# Use SpiderFoot domains with TheHarvester
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="DOMAIN_NAME") | .data' | sort -u > domains.txt |
# Run TheHarvester on discovered domains
while read domain; do
echo "Harvesting $domain..."
theHarvester -d "$domain" -b all -f "${domain}_harvest"
done < domains.txt
# Combine email results
| cat *_harvest.json | jq -r '.emails[]?' | sort -u > combined_emails.txt |
echo "Found $(cat combined_emails.txt | wc -l) unique email addresses"
Integración con Amass
# Use SpiderFoot results to seed Amass enumeration
| spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="DOMAIN_NAME") | .data' | sort -u > seed_domains.txt |
# Run Amass with SpiderFoot seeds
amass enum -df seed_domains.txt -active -brute -o amass_results.txt
# Compare results
| echo "SpiderFoot found $(spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="SUBDOMAIN") | .data' | sort -u | wc -l) subdomains" |
echo "Amass found $(cat amass_results.txt | wc -l) subdomains"
# Find new subdomains discovered by Amass
| comm -13 <(spiderfoot -e scan_id -o json | jq -r '.[] | select(.type=="SUBDOMAIN") | .data' | sort -u) <(sort -u amass_results.txt) > new_subdomains.txt |
echo "Amass discovered $(cat new_subdomains.txt | wc -l) additional subdomains"
Optimización del rendimiento y solución de problemas
Performance Tuning
# Optimize SpiderFoot performance
# Edit ~/.spiderfoot/spiderfoot.conf
# Increase thread count for faster scanning
__threads = 20
# Adjust request delays to avoid rate limiting
__delay = 1
# Increase timeout for slow responses
__timeout = 30
# Optimize database settings
__database = /tmp/spiderfoot.db # Use faster storage
__dbpragmas = journal_mode=WAL,synchronous=NORMAL,cache_size=10000
# Memory optimization
__maxmemory = 2048 # MB
# Network optimization
__useragent = Mozilla/5.0 (compatible; SpiderFoot)
__proxy = http://proxy.example.com:8080 # Use proxy for better performance
Vigilancia y registro
# Enable detailed logging
spiderfoot -l 127.0.0.1:5001 -d
# Monitor scan progress
tail -f ~/.spiderfoot/spiderfoot.log
# Check system resources
ps aux | grep spiderfoot
netstat -tulpn | grep 5001
# Database optimization
sqlite3 ~/.spiderfoot/spiderfoot.db "VACUUM;"
sqlite3 ~/.spiderfoot/spiderfoot.db "ANALYZE;"
# Clean old scan data
spiderfoot -D # Delete all scan data
# Or delete specific scans via web interface
Problemas y soluciones comunes
# Issue: API rate limiting
# Solution: Configure delays and use API keys
echo "api_delay = 2" >> ~/.spiderfoot/spiderfoot.conf
# Issue: Memory usage too high
# Solution: Limit concurrent modules and use passive scanning
spiderfoot -s target.com -t DOMAIN -T 5 -p
# Issue: Slow database performance
# Solution: Use WAL mode and optimize database
sqlite3 ~/.spiderfoot/spiderfoot.db "PRAGMA journal_mode=WAL;"
sqlite3 ~/.spiderfoot/spiderfoot.db "PRAGMA synchronous=NORMAL;"
# Issue: Module errors
# Solution: Check module configuration and API keys
spiderfoot -M sfp_virustotal # Check specific module
grep "ERROR" ~/.spiderfoot/spiderfoot.log | tail -20
# Issue: Web interface not accessible
# Solution: Check binding and firewall
netstat -tulpn | grep 5001
sudo ufw allow 5001/tcp # If using UFW firewall
Despliegue y escalado personalizados
# Docker deployment with custom configuration
docker run -d \
--name spiderfoot \
-p 5001:5001 \
-v /path/to/config:/home/spiderfoot/.spiderfoot \
-v /path/to/data:/home/spiderfoot/data \
-e SF_THREADS=20 \
-e SF_DELAY=1 \
spiderfoot/spiderfoot
# Docker Compose for production deployment
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
spiderfoot:
image: spiderfoot/spiderfoot:latest
ports:
- "5001:5001"
volumes:
- ./config:/home/spiderfoot/.spiderfoot
- ./data:/home/spiderfoot/data
environment:
- SF_THREADS=20
- SF_DELAY=1
- SF_TIMEOUT=30
restart: unless-stopped
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- spiderfoot
restart: unless-stopped
EOF
# Kubernetes deployment
cat > spiderfoot-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: spiderfoot
spec:
replicas: 1
selector:
matchLabels:
app: spiderfoot
template:
metadata:
labels:
app: spiderfoot
spec:
containers:
- name: spiderfoot
image: spiderfoot/spiderfoot:latest
ports:
- containerPort: 5001
env:
- name: SF_THREADS
value: "20"
- name: SF_DELAY
value: "1"
volumeMounts:
- name: config
mountPath: /home/spiderfoot/.spiderfoot
- name: data
mountPath: /home/spiderfoot/data
volumes:
- name: config
configMap:
name: spiderfoot-config
- name: data
persistentVolumeClaim:
claimName: spiderfoot-data
---
apiVersion: v1
kind: Service
metadata:
name: spiderfoot-service
spec:
selector:
app: spiderfoot
ports:
- port: 80
targetPort: 5001
type: LoadBalancer
EOF
kubectl apply -f spiderfoot-deployment.yaml