Wappalyzer Cheat Sheet
Overview
Wappalyzer is a technology profiler that identifies the technologies used on websites. It detects content management systems, ecommerce platforms, web frameworks, server software, analytics tools, and many other technologies. Available as a browser extension, CLI tool, and API, Wappalyzer is essential for reconnaissance, competitive analysis, and security assessments.
💡 Key Features: Technology detection, browser extension, CLI tool, API access, bulk analysis, detailed reporting, and integration with security workflows.
Installation and Setup
Browser Extension Installation
# Chrome/Chromium
# Visit: https://chrome.google.com/webstore/detail/wappalyzer/gppongmhjkpfnbhagpmjfkannfbllamg
# Click "Add to Chrome"
# Firefox
# Visit: https://addons.mozilla.org/en-US/firefox/addon/wappalyzer/
# Click "Add to Firefox"
# Edge
# Visit: https://microsoftedge.microsoft.com/addons/detail/wappalyzer/mnbndgmknlpdjdnjfmfcdjoegcckoikn
# Click "Get"
# Safari
# Visit: https://apps.apple.com/app/wappalyzer/id1520333300
# Install from App Store
# Manual installation for development
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer
npm install
npm run build
# Load unpacked extension from src/drivers/webextension/
CLI Tool Installation
# Install via npm (Node.js required)
npm install -g wappalyzer
# Verify installation
wappalyzer --version
# Install specific version
npm install -g wappalyzer@6.10.66
# Install locally in project
npm install wappalyzer
npx wappalyzer --version
# Update to latest version
npm update -g wappalyzer
# Uninstall
npm uninstall -g wappalyzer
Docker Installation
# Pull official Docker image
docker pull wappalyzer/cli
# Run Wappalyzer in Docker
docker run --rm wappalyzer/cli https://example.com
# Run with volume mount for output
docker run --rm -v $(pwd):/output wappalyzer/cli https://example.com --output /output/results.json
# Create alias for easier usage
echo 'alias wappalyzer="docker run --rm -v $(pwd):/output wappalyzer/cli"' >> ~/.bashrc
source ~/.bashrc
# Build custom Docker image
cat > Dockerfile << 'EOF'
FROM node:16-alpine
RUN npm install -g wappalyzer
WORKDIR /app
ENTRYPOINT ["wappalyzer"]
EOF
docker build -t custom-wappalyzer .
API Setup
# Sign up for API access at https://www.wappalyzer.com/api/
# Get API key from dashboard
# Set environment variable
export WAPPALYZER_API_KEY="your_api_key_here"
# Test API access
curl -H "x-api-key: $WAPPALYZER_API_KEY" \
"https://api.wappalyzer.com/v2/lookup/?urls=https://example.com"
# Create configuration file
cat > ~/.wappalyzer-config.json << 'EOF'
{
"api_key": "your_api_key_here",
"api_url": "https://api.wappalyzer.com/v2/",
"timeout": 30,
"max_retries": 3,
"rate_limit": 100
}
EOF
# Set configuration path
export WAPPALYZER_CONFIG=~/.wappalyzer-config.json
Development Setup
# Clone repository for development
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Start development server
npm run dev
# Create custom technology definitions
mkdir -p custom-technologies
cat > custom-technologies/custom.json << 'EOF'
{
"Custom Framework": {
"cats": [18],
"description": "Custom web framework",
"icon": "custom.png",
"website": "https://custom-framework.com",
"headers": {
"X-Powered-By": "Custom Framework"
},
"html": "<meta name=\"generator\" content=\"Custom Framework",
"js": {
"CustomFramework": ""
},
"implies": "PHP"
}
}
EOF
# Validate custom technology definitions
npm run validate -- custom-technologies/custom.json
Basic Usage and Commands
CLI Basic Commands
# Analyze single website
wappalyzer https://example.com
# Analyze with detailed output
wappalyzer https://example.com --pretty
# Save results to file
wappalyzer https://example.com --output results.json
# Analyze multiple URLs
wappalyzer https://example.com https://test.com
# Analyze from file
echo -e "https://example.com\nhttps://test.com" > urls.txt
wappalyzer --urls-file urls.txt
# Set custom user agent
wappalyzer https://example.com --user-agent "Custom Agent 1.0"
# Set timeout
wappalyzer https://example.com --timeout 30000
# Follow redirects
wappalyzer https://example.com --follow-redirect
# Disable SSL verification
wappalyzer https://example.com --no-ssl-verify
Advanced CLI Options
# Analyze with custom headers
wappalyzer https://example.com --header "Authorization: Bearer token123"
# Set maximum pages to analyze
wappalyzer https://example.com --max-pages 10
# Set crawl depth
wappalyzer https://example.com --max-depth 3
# Analyze with proxy
wappalyzer https://example.com --proxy http://127.0.0.1:8080
# Set custom delay between requests
wappalyzer https://example.com --delay 1000
# Analyze with authentication
wappalyzer https://example.com --cookie "session=abc123; auth=xyz789"
# Output in different formats
wappalyzer https://example.com --output results.csv --format csv
wappalyzer https://example.com --output results.xml --format xml
# Verbose output for debugging
wappalyzer https://example.com --verbose
# Analyze specific categories only
wappalyzer https://example.com --categories "CMS,Web frameworks"
Bulk Analysis
# Analyze multiple domains from file
cat > domains.txt << 'EOF'
example.com
test.com
demo.com
sample.com
EOF
# Basic bulk analysis
wappalyzer --urls-file domains.txt --output bulk_results.json
# Bulk analysis with threading
wappalyzer --urls-file domains.txt --concurrent 10 --output threaded_results.json
# Bulk analysis with rate limiting
wappalyzer --urls-file domains.txt --delay 2000 --output rate_limited_results.json
# Analyze subdomains
subfinder -d example.com -silent | head -100 > subdomains.txt
wappalyzer --urls-file subdomains.txt --output subdomain_analysis.json
# Combine with other tools
echo "example.com" | subfinder -silent | httpx -silent | head -50 | while read url; do
echo "https://$url"
done > live_urls.txt
wappalyzer --urls-file live_urls.txt --output comprehensive_analysis.json
Advanced Technology Detection
Custom Technology Detection
#!/usr/bin/env python3
# Advanced Wappalyzer automation and custom detection
import json
import subprocess
import requests
import threading
import time
import re
from concurrent.futures import ThreadPoolExecutor, as_completed
from urllib.parse import urlparse, urljoin
import os
class WappalyzerAnalyzer:
def __init__(self, api_key=None, max_workers=10):
self.api_key = api_key
self.max_workers = max_workers
self.results = []
self.lock = threading.Lock()
self.api_url = "https://api.wappalyzer.com/v2/"
def analyze_url_cli(self, url, options=None):
"""Analyze URL using Wappalyzer CLI"""
if options is None:
options = {}
try:
# Build command
cmd = ['wappalyzer', url]
if options.get('timeout'):
cmd.extend(['--timeout', str(options['timeout'])])
if options.get('user_agent'):
cmd.extend(['--user-agent', options['user_agent']])
if options.get('headers'):
for header in options['headers']:
cmd.extend(['--header', header])
if options.get('proxy'):
cmd.extend(['--proxy', options['proxy']])
if options.get('delay'):
cmd.extend(['--delay', str(options['delay'])])
if options.get('max_pages'):
cmd.extend(['--max-pages', str(options['max_pages'])])
if options.get('follow_redirect'):
cmd.append('--follow-redirect')
if options.get('no_ssl_verify'):
cmd.append('--no-ssl-verify')
# Run Wappalyzer
result = subprocess.run(
cmd,
capture_output=True, text=True,
timeout=options.get('timeout', 30)
)
if result.returncode == 0:
try:
technologies = json.loads(result.stdout)
return {
'url': url,
'success': True,
'technologies': technologies,
'error': None
}
except json.JSONDecodeError:
return {
'url': url,
'success': False,
'technologies': [],
'error': 'Invalid JSON response'
}
else:
return {
'url': url,
'success': False,
'technologies': [],
'error': result.stderr
}
except subprocess.TimeoutExpired:
return {
'url': url,
'success': False,
'technologies': [],
'error': 'CLI timeout'
}
except Exception as e:
return {
'url': url,
'success': False,
'technologies': [],
'error': str(e)
}
def analyze_url_api(self, url):
"""Analyze URL using Wappalyzer API"""
if not self.api_key:
return {
'url': url,
'success': False,
'technologies': [],
'error': 'API key not provided'
}
try:
headers = {
'x-api-key': self.api_key,
'Content-Type': 'application/json'
}
response = requests.get(
f"{self.api_url}lookup/",
params={'urls': url},
headers=headers,
timeout=30
)
if response.status_code == 200:
data = response.json()
return {
'url': url,
'success': True,
'technologies': data.get(url, []),
'error': None
}
else:
return {
'url': url,
'success': False,
'technologies': [],
'error': f'API error: {response.status_code}'
}
except Exception as e:
return {
'url': url,
'success': False,
'technologies': [],
'error': str(e)
}
def custom_technology_detection(self, url):
"""Perform custom technology detection"""
custom_detections = []
try:
# Fetch page content
response = requests.get(url, timeout=30, verify=False)
content = response.text
headers = response.headers
# Custom detection rules
detections = {
'Custom Framework': {
'patterns': [
r'<meta name="generator" content="Custom Framework',
r'X-Powered-By.*Custom Framework'
],
'category': 'Web frameworks'
},
'Internal Tool': {
'patterns': [
r'<!-- Internal Tool v\d+\.\d+ -->',
r'internal-tool\.js',
r'data-internal-version'
],
'category': 'Development tools'
},
'Security Headers': {
'patterns': [
r'Content-Security-Policy',
r'X-Frame-Options',
r'X-XSS-Protection'
],
'category': 'Security'
},
'Analytics Platform': {
'patterns': [
r'analytics\.custom\.com',
r'customAnalytics\(',
r'data-analytics-id'
],
'category': 'Analytics'
},
'CDN Detection': {
'patterns': [
r'cdn\.custom\.com',
r'X-Cache.*HIT',
r'X-CDN-Provider'
],
'category': 'CDN'
}
}
# Check patterns in content and headers
for tech_name, tech_info in detections.items():
detected = False
for pattern in tech_info['patterns']:
# Check in content
if re.search(pattern, content, re.IGNORECASE):
detected = True
break
# Check in headers
for header_name, header_value in headers.items():
if re.search(pattern, f"{header_name}: {header_value}", re.IGNORECASE):
detected = True
break
if detected:
custom_detections.append({
'name': tech_name,
'category': tech_info['category'],
'confidence': 'high',
'version': None
})
return {
'url': url,
'success': True,
'custom_technologies': custom_detections,
'error': None
}
except Exception as e:
return {
'url': url,
'success': False,
'custom_technologies': [],
'error': str(e)
}
def comprehensive_analysis(self, url, use_api=False, custom_detection=True):
"""Perform comprehensive technology analysis"""
print(f"Analyzing: {url}")
results = {
'url': url,
'timestamp': time.time(),
'wappalyzer_cli': None,
'wappalyzer_api': None,
'custom_detection': None,
'combined_technologies': [],
'technology_categories': {},
'security_technologies': [],
'risk_assessment': {}
}
# CLI analysis
cli_result = self.analyze_url_cli(url, {
'timeout': 30000,
'follow_redirect': True,
'max_pages': 5
})
results['wappalyzer_cli'] = cli_result
# API analysis (if enabled)
if use_api and self.api_key:
api_result = self.analyze_url_api(url)
results['wappalyzer_api'] = api_result
# Custom detection (if enabled)
if custom_detection:
custom_result = self.custom_technology_detection(url)
results['custom_detection'] = custom_result
# Combine and analyze results
all_technologies = []
# Add CLI technologies
if cli_result['success'] and cli_result['technologies']:
for tech in cli_result['technologies']:
all_technologies.append({
'name': tech.get('name', 'Unknown'),
'category': tech.get('categories', []),
'version': tech.get('version'),
'confidence': tech.get('confidence', 100),
'source': 'wappalyzer_cli'
})
# Add API technologies
if use_api and results['wappalyzer_api'] and results['wappalyzer_api']['success']:
for tech in results['wappalyzer_api']['technologies']:
all_technologies.append({
'name': tech.get('name', 'Unknown'),
'category': tech.get('categories', []),
'version': tech.get('version'),
'confidence': tech.get('confidence', 100),
'source': 'wappalyzer_api'
})
# Add custom technologies
if custom_detection and results['custom_detection'] and results['custom_detection']['success']:
for tech in results['custom_detection']['custom_technologies']:
all_technologies.append({
'name': tech['name'],
'category': [tech['category']],
'version': tech.get('version'),
'confidence': 90, # High confidence for custom detection
'source': 'custom_detection'
})
# Remove duplicates and categorize
unique_technologies = {}
for tech in all_technologies:
tech_name = tech['name']
if tech_name not in unique_technologies:
unique_technologies[tech_name] = tech
else:
# Merge information from multiple sources
existing = unique_technologies[tech_name]
if tech['confidence'] > existing['confidence']:
unique_technologies[tech_name] = tech
results['combined_technologies'] = list(unique_technologies.values())
# Categorize technologies
categories = {}
security_techs = []
for tech in results['combined_technologies']:
tech_categories = tech.get('category', [])
if isinstance(tech_categories, str):
tech_categories = [tech_categories]
for category in tech_categories:
if category not in categories:
categories[category] = []
categories[category].append(tech['name'])
# Identify security-related technologies
if any(sec_keyword in category.lower() for sec_keyword in ['security', 'firewall', 'protection', 'ssl', 'certificate']):
security_techs.append(tech)
results['technology_categories'] = categories
results['security_technologies'] = security_techs
# Risk assessment
risk_factors = []
# Check for outdated technologies
for tech in results['combined_technologies']:
if tech.get('version'):
# This would require a database of known vulnerabilities
# For now, just flag old versions
version = tech['version']
if any(old_indicator in version.lower() for old_indicator in ['1.', '2.', '3.', '4.', '5.']):
risk_factors.append(f"Potentially outdated {tech['name']} version {version}")
# Check for missing security headers
if not security_techs:
risk_factors.append("No security technologies detected")
# Check for development/debug technologies in production
dev_categories = ['Development tools', 'Debugging', 'Testing']
for category in dev_categories:
if category in categories:
risk_factors.append(f"Development tools detected in production: {', '.join(categories[category])}")
results['risk_assessment'] = {
'risk_level': 'low' if len(risk_factors) == 0 else 'medium' if len(risk_factors) <= 2 else 'high',
'risk_factors': risk_factors,
'recommendations': self.generate_recommendations(results)
}
with self.lock:
self.results.append(results)
return results
def generate_recommendations(self, analysis_result):
"""Generate security and optimization recommendations"""
recommendations = []
technologies = analysis_result['combined_technologies']
categories = analysis_result['technology_categories']
# Security recommendations
if 'Security' not in categories:
recommendations.append("Consider implementing security headers (CSP, HSTS, X-Frame-Options)")
if 'SSL/TLS' not in categories:
recommendations.append("Ensure HTTPS is properly configured with valid SSL/TLS certificates")
if 'Web application firewall' not in categories:
recommendations.append("Consider implementing a Web Application Firewall (WAF)")
# Performance recommendations
if 'CDN' not in categories:
recommendations.append("Consider using a Content Delivery Network (CDN) for better performance")
if 'Caching' not in categories:
recommendations.append("Implement caching mechanisms to improve performance")
# Technology-specific recommendations
cms_technologies = categories.get('CMS', [])
if cms_technologies:
recommendations.append(f"Keep {', '.join(cms_technologies)} updated to the latest version")
framework_technologies = categories.get('Web frameworks', [])
if framework_technologies:
recommendations.append(f"Ensure {', '.join(framework_technologies)} are updated and properly configured")
# Analytics and privacy
analytics_technologies = categories.get('Analytics', [])
if analytics_technologies:
recommendations.append("Ensure analytics tools comply with privacy regulations (GDPR, CCPA)")
return recommendations
def bulk_analysis(self, urls, use_api=False, custom_detection=True):
"""Perform bulk technology analysis"""
print(f"Starting bulk analysis of {len(urls)} URLs")
print(f"Max workers: {self.max_workers}")
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
# Submit all tasks
future_to_url = {
executor.submit(self.comprehensive_analysis, url, use_api, custom_detection): url
for url in urls
}
# Process completed tasks
for future in as_completed(future_to_url):
url = future_to_url[future]
try:
result = future.result()
tech_count = len(result['combined_technologies'])
risk_level = result['risk_assessment']['risk_level']
print(f"✓ {url}: {tech_count} technologies, risk: {risk_level}")
except Exception as e:
print(f"✗ Error analyzing {url}: {e}")
return self.results
def generate_report(self, output_file='wappalyzer_analysis_report.json'):
"""Generate comprehensive analysis report"""
# Calculate statistics
total_urls = len(self.results)
successful_analyses = sum(1 for r in self.results if r['wappalyzer_cli']['success'])
# Technology statistics
all_technologies = {}
all_categories = {}
risk_levels = {'low': 0, 'medium': 0, 'high': 0}
for result in self.results:
# Count technologies
for tech in result['combined_technologies']:
tech_name = tech['name']
if tech_name not in all_technologies:
all_technologies[tech_name] = 0
all_technologies[tech_name] += 1
# Count categories
for category, techs in result['technology_categories'].items():
if category not in all_categories:
all_categories[category] = 0
all_categories[category] += len(techs)
# Count risk levels
risk_level = result['risk_assessment']['risk_level']
if risk_level in risk_levels:
risk_levels[risk_level] += 1
# Sort by popularity
popular_technologies = sorted(all_technologies.items(), key=lambda x: x[1], reverse=True)[:20]
popular_categories = sorted(all_categories.items(), key=lambda x: x[1], reverse=True)[:10]
report = {
'scan_summary': {
'total_urls': total_urls,
'successful_analyses': successful_analyses,
'success_rate': (successful_analyses / total_urls * 100) if total_urls > 0 else 0,
'scan_date': time.strftime('%Y-%m-%d %H:%M:%S')
},
'technology_statistics': {
'total_unique_technologies': len(all_technologies),
'popular_technologies': popular_technologies,
'popular_categories': popular_categories
},
'risk_assessment': {
'risk_distribution': risk_levels,
'high_risk_urls': [
r['url'] for r in self.results
if r['risk_assessment']['risk_level'] == 'high'
]
},
'detailed_results': self.results
}
# Save report
with open(output_file, 'w') as f:
json.dump(report, f, indent=2)
print(f"\nWappalyzer Analysis Report:")
print(f"Total URLs analyzed: {total_urls}")
print(f"Successful analyses: {successful_analyses}")
print(f"Success rate: {report['scan_summary']['success_rate']:.1f}%")
print(f"Unique technologies found: {len(all_technologies)}")
print(f"High risk URLs: {risk_levels['high']}")
print(f"Report saved to: {output_file}")
return report
# Usage example
if __name__ == "__main__":
# Create analyzer instance
analyzer = WappalyzerAnalyzer(
api_key=os.getenv('WAPPALYZER_API_KEY'),
max_workers=5
)
# URLs to analyze
urls = [
'https://example.com',
'https://test.com',
'https://demo.com'
]
# Perform bulk analysis
results = analyzer.bulk_analysis(
urls,
use_api=False, # Set to True if you have API key
custom_detection=True
)
# Generate report
report = analyzer.generate_report('comprehensive_wappalyzer_report.json')
Automation and Integration
CI/CD Integration
#!/bin/bash
# CI/CD script for technology stack analysis
set -e
TARGET_URL="$1"
OUTPUT_DIR="$2"
BASELINE_FILE="$3"
if [ -z "$TARGET_URL" ] || [ -z "$OUTPUT_DIR" ]; then
echo "Usage: $0 <target_url> <output_dir> [baseline_file]"
exit 1
fi
echo "Starting technology stack analysis..."
echo "Target: $TARGET_URL"
echo "Output directory: $OUTPUT_DIR"
mkdir -p "$OUTPUT_DIR"
# Run Wappalyzer analysis
echo "Running Wappalyzer analysis..."
wappalyzer "$TARGET_URL" \
--timeout 30000 \
--follow-redirect \
--max-pages 10 \
--output "$OUTPUT_DIR/wappalyzer_results.json" \
--pretty
# Parse results
TECH_COUNT=$(jq '. | length' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null || echo "0")
echo "Found $TECH_COUNT technologies"
# Extract technology categories
jq -r '.[].categories[]' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null | sort | uniq > "$OUTPUT_DIR/categories.txt" || touch "$OUTPUT_DIR/categories.txt"
CATEGORY_COUNT=$(wc -l < "$OUTPUT_DIR/categories.txt")
# Extract security-related technologies
jq -r '.[] | select(.categories[] | contains("Security") or contains("SSL") or contains("Certificate")) | .name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null > "$OUTPUT_DIR/security_technologies.txt" || touch "$OUTPUT_DIR/security_technologies.txt"
SECURITY_COUNT=$(wc -l < "$OUTPUT_DIR/security_technologies.txt")
# Check for development/debug technologies
jq -r '.[] | select(.categories[] | contains("Development") or contains("Debug") or contains("Testing")) | .name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null > "$OUTPUT_DIR/dev_technologies.txt" || touch "$OUTPUT_DIR/dev_technologies.txt"
DEV_COUNT=$(wc -l < "$OUTPUT_DIR/dev_technologies.txt")
# Generate summary report
cat > "$OUTPUT_DIR/technology-summary.txt" << EOF
Technology Stack Analysis Summary
================================
Date: $(date)
Target: $TARGET_URL
Total Technologies: $TECH_COUNT
Categories: $CATEGORY_COUNT
Security Technologies: $SECURITY_COUNT
Development Technologies: $DEV_COUNT
Status: $(if [ "$DEV_COUNT" -gt "0" ]; then echo "WARNING - Development tools detected"; else echo "OK"; fi)
EOF
# Compare with baseline if provided
if [ -n "$BASELINE_FILE" ] && [ -f "$BASELINE_FILE" ]; then
echo "Comparing with baseline..."
# Extract current technology names
jq -r '.[].name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null | sort > "$OUTPUT_DIR/current_technologies.txt" || touch "$OUTPUT_DIR/current_technologies.txt"
# Extract baseline technology names
jq -r '.[].name' "$BASELINE_FILE" 2>/dev/null | sort > "$OUTPUT_DIR/baseline_technologies.txt" || touch "$OUTPUT_DIR/baseline_technologies.txt"
# Find differences
comm -23 "$OUTPUT_DIR/current_technologies.txt" "$OUTPUT_DIR/baseline_technologies.txt" > "$OUTPUT_DIR/new_technologies.txt"
comm -13 "$OUTPUT_DIR/current_technologies.txt" "$OUTPUT_DIR/baseline_technologies.txt" > "$OUTPUT_DIR/removed_technologies.txt"
NEW_COUNT=$(wc -l < "$OUTPUT_DIR/new_technologies.txt")
REMOVED_COUNT=$(wc -l < "$OUTPUT_DIR/removed_technologies.txt")
echo "New technologies: $NEW_COUNT"
echo "Removed technologies: $REMOVED_COUNT"
# Add to summary
cat >> "$OUTPUT_DIR/technology-summary.txt" << EOF
Baseline Comparison:
New Technologies: $NEW_COUNT
Removed Technologies: $REMOVED_COUNT
EOF
if [ "$NEW_COUNT" -gt "0" ]; then
echo "New technologies detected:"
cat "$OUTPUT_DIR/new_technologies.txt"
fi
fi
# Generate detailed report
python3 << 'PYTHON_EOF'
import sys
import json
from datetime import datetime
output_dir = sys.argv[1]
target_url = sys.argv[2]
# Read Wappalyzer results
try:
with open(f"{output_dir}/wappalyzer_results.json", 'r') as f:
technologies = json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
technologies = []
# Categorize technologies
categories = {}
security_techs = []
dev_techs = []
outdated_techs = []
for tech in technologies:
tech_name = tech.get('name', 'Unknown')
tech_categories = tech.get('categories', [])
tech_version = tech.get('version', '')
# Categorize
for category in tech_categories:
if category not in categories:
categories[category] = []
categories[category].append(tech_name)
# Security technologies
if any(sec_keyword in category.lower() for sec_keyword in ['security', 'ssl', 'certificate', 'firewall']):
security_techs.append(tech)
# Development technologies
if any(dev_keyword in category.lower() for dev_keyword in ['development', 'debug', 'testing']):
dev_techs.append(tech)
# Check for potentially outdated versions
if tech_version and any(old_indicator in tech_version for old_indicator in ['1.', '2.', '3.', '4.', '5.']):
outdated_techs.append(tech)
# Risk assessment
risk_factors = []
if len(dev_techs) > 0:
risk_factors.append(f"Development tools detected: {', '.join([t['name'] for t in dev_techs])}")
if len(security_techs) == 0:
risk_factors.append("No security technologies detected")
if len(outdated_techs) > 0:
risk_factors.append(f"Potentially outdated technologies: {', '.join([t['name'] for t in outdated_techs])}")
risk_level = 'low' if len(risk_factors) == 0 else 'medium' if len(risk_factors) <= 2 else 'high'
# Create detailed report
report = {
'scan_info': {
'target': target_url,
'scan_date': datetime.now().isoformat(),
'technology_count': len(technologies),
'category_count': len(categories)
},
'technologies': technologies,
'categories': categories,
'security_assessment': {
'security_technologies': security_techs,
'development_technologies': dev_techs,
'outdated_technologies': outdated_techs,
'risk_level': risk_level,
'risk_factors': risk_factors
}
}
# Save detailed report
with open(f"{output_dir}/wappalyzer-detailed-report.json", 'w') as f:
json.dump(report, f, indent=2)
# Generate HTML report
html_content = f"""
<!DOCTYPE html>
<html>
<head>
<title>Technology Stack Analysis Report</title>
<style>
body {{ font-family: Arial, sans-serif; margin: 20px; }}
.header {{ background-color: #f0f0f0; padding: 20px; border-radius: 5px; }}
.category {{ margin: 10px 0; padding: 15px; border-left: 4px solid #007bff; background-color: #f8f9fa; }}
.risk-high {{ border-left-color: #dc3545; }}
.risk-medium {{ border-left-color: #ffc107; }}
.risk-low {{ border-left-color: #28a745; }}
table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
th {{ background-color: #f2f2f2; }}
</style>
</head>
<body>
<div class="header">
<h1>Technology Stack Analysis Report</h1>
<p><strong>Target:</strong> {target_url}</p>
<p><strong>Scan Date:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
<p><strong>Technologies Found:</strong> {len(technologies)}</p>
<p><strong>Risk Level:</strong> <span class="risk-{risk_level}">{risk_level.upper()}</span></p>
</div>
<h2>Technology Categories</h2>
"""
for category, techs in categories.items():
html_content += f"""
<div class="category">
<h3>{category}</h3>
<p>{', '.join(techs)}</p>
</div>
"""
html_content += """
<h2>Detailed Technologies</h2>
<table>
<tr><th>Name</th><th>Version</th><th>Categories</th><th>Confidence</th></tr>
"""
for tech in technologies:
html_content += f"""
<tr>
<td>{tech.get('name', 'Unknown')}</td>
<td>{tech.get('version', 'N/A')}</td>
<td>{', '.join(tech.get('categories', []))}</td>
<td>{tech.get('confidence', 'N/A')}%</td>
</tr>
"""
html_content += """
</table>
<h2>Risk Assessment</h2>
"""
if risk_factors:
html_content += "<ul>"
for factor in risk_factors:
html_content += f"<li>{factor}</li>"
html_content += "</ul>"
else:
html_content += "<p>No significant risk factors identified.</p>"
html_content += """
</body>
</html>
"""
with open(f"{output_dir}/wappalyzer-report.html", 'w') as f:
f.write(html_content)
print(f"Detailed reports generated:")
print(f"- JSON: {output_dir}/wappalyzer-detailed-report.json")
print(f"- HTML: {output_dir}/wappalyzer-report.html")
PYTHON_EOF
# Check for development technologies and exit
if [ "$DEV_COUNT" -gt "0" ]; then
echo "WARNING: Development technologies detected in production environment"
echo "Development technologies found:"
cat "$OUTPUT_DIR/dev_technologies.txt"
exit 1
else
echo "SUCCESS: No development technologies detected"
exit 0
fi
GitHub Actions Integration
# .github/workflows/wappalyzer-tech-analysis.yml
name: Technology Stack Analysis
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
schedule:
- cron: '0 6 * * 1' # Weekly scan on Mondays at 6 AM
jobs:
technology-analysis:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '16'
- name: Install Wappalyzer
run: |
npm install -g wappalyzer
wappalyzer --version
- name: Analyze production environment
run: |
mkdir -p analysis-results
# Analyze main application
wappalyzer ${{ vars.PRODUCTION_URL }} \
--timeout 30000 \
--follow-redirect \
--max-pages 10 \
--output analysis-results/production-tech.json \
--pretty
# Analyze staging environment
wappalyzer ${{ vars.STAGING_URL }} \
--timeout 30000 \
--follow-redirect \
--max-pages 5 \
--output analysis-results/staging-tech.json \
--pretty
# Count technologies
PROD_TECH_COUNT=$(jq '. | length' analysis-results/production-tech.json 2>/dev/null || echo "0")
STAGING_TECH_COUNT=$(jq '. | length' analysis-results/staging-tech.json 2>/dev/null || echo "0")
echo "PROD_TECH_COUNT=$PROD_TECH_COUNT" >> $GITHUB_ENV
echo "STAGING_TECH_COUNT=$STAGING_TECH_COUNT" >> $GITHUB_ENV
- name: Check for development technologies
run: |
# Check for development/debug technologies in production
DEV_TECHS=$(jq -r '.[] | select(.categories[] | contains("Development") or contains("Debug") or contains("Testing")) | .name' analysis-results/production-tech.json 2>/dev/null || echo "")
if [ -n "$DEV_TECHS" ]; then
echo "DEV_TECHS_FOUND=true" >> $GITHUB_ENV
echo "Development technologies found in production:"
echo "$DEV_TECHS"
echo "$DEV_TECHS" > analysis-results/dev-technologies.txt
else
echo "DEV_TECHS_FOUND=false" >> $GITHUB_ENV
touch analysis-results/dev-technologies.txt
fi
- name: Security technology assessment
run: |
# Check for security technologies
SECURITY_TECHS=$(jq -r '.[] | select(.categories[] | contains("Security") or contains("SSL") or contains("Certificate")) | .name' analysis-results/production-tech.json 2>/dev/null || echo "")
if [ -n "$SECURITY_TECHS" ]; then
echo "Security technologies found:"
echo "$SECURITY_TECHS"
echo "$SECURITY_TECHS" > analysis-results/security-technologies.txt
else
echo "No security technologies detected"
touch analysis-results/security-technologies.txt
fi
SECURITY_COUNT=$(echo "$SECURITY_TECHS" | wc -l)
echo "SECURITY_COUNT=$SECURITY_COUNT" >> $GITHUB_ENV
- name: Generate comparison report
run: |
# Compare production and staging
jq -r '.[].name' analysis-results/production-tech.json 2>/dev/null | sort > analysis-results/prod-techs.txt || touch analysis-results/prod-techs.txt
jq -r '.[].name' analysis-results/staging-tech.json 2>/dev/null | sort > analysis-results/staging-techs.txt || touch analysis-results/staging-techs.txt
# Find differences
comm -23 analysis-results/prod-techs.txt analysis-results/staging-techs.txt > analysis-results/prod-only.txt
comm -13 analysis-results/prod-techs.txt analysis-results/staging-techs.txt > analysis-results/staging-only.txt
# Generate summary
cat > analysis-results/summary.txt << EOF
Technology Stack Analysis Summary
================================
Production Technologies: $PROD_TECH_COUNT
Staging Technologies: $STAGING_TECH_COUNT
Security Technologies: $SECURITY_COUNT
Development Technologies in Production: $(if [ "$DEV_TECHS_FOUND" = "true" ]; then echo "YES (CRITICAL)"; else echo "NO"; fi)
Production-only Technologies: $(wc -l < analysis-results/prod-only.txt)
Staging-only Technologies: $(wc -l < analysis-results/staging-only.txt)
EOF
- name: Upload analysis results
uses: actions/upload-artifact@v3
with:
name: technology-analysis-results
path: analysis-results/
- name: Comment PR with results
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const summary = fs.readFileSync('analysis-results/summary.txt', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Technology Stack Analysis\n\n\`\`\`\n${summary}\n\`\`\``
});
- name: Fail if development technologies found
run: |
if [ "$DEV_TECHS_FOUND" = "true" ]; then
echo "CRITICAL: Development technologies detected in production!"
cat analysis-results/dev-technologies.txt
exit 1
fi
Performance Optimization and Troubleshooting
Performance Tuning
# Optimize Wappalyzer for different scenarios
# Fast analysis with minimal pages
wappalyzer https://example.com --max-pages 1 --timeout 10000
# Thorough analysis with deep crawling
wappalyzer https://example.com --max-pages 20 --max-depth 5 --timeout 60000
# Bulk analysis with rate limiting
for url in $(cat urls.txt); do
wappalyzer "$url" --output "results_$(echo $url | sed 's/[^a-zA-Z0-9]/_/g').json"
sleep 2
done
# Memory-efficient analysis for large sites
wappalyzer https://example.com --max-pages 5 --no-scripts --timeout 30000
# Performance monitoring script
#!/bin/bash
monitor_wappalyzer_performance() {
local url="$1"
local output_file="wappalyzer-performance-$(date +%s).log"
echo "Monitoring Wappalyzer performance for: $url"
# Start monitoring
{
echo "Timestamp,CPU%,Memory(MB),Status"
while true; do
if pgrep -f "wappalyzer" > /dev/null; then
local cpu=$(ps -p $(pgrep -f "wappalyzer") -o %cpu --no-headers)
local mem=$(ps -p $(pgrep -f "wappalyzer") -o rss --no-headers | awk '{print $1/1024}')
echo "$(date +%s),$cpu,$mem,running"
fi
sleep 2
done
} > "$output_file" &
local monitor_pid=$!
# Run Wappalyzer
time wappalyzer "$url" --pretty --output "performance_test_results.json"
# Stop monitoring
kill $monitor_pid 2>/dev/null
echo "Performance monitoring completed: $output_file"
}
# Usage
monitor_wappalyzer_performance "https://example.com"
Troubleshooting Common Issues
# Troubleshooting script for Wappalyzer
troubleshoot_wappalyzer() {
echo "Wappalyzer Troubleshooting Guide"
echo "==============================="
# Check if Wappalyzer is installed
if ! command -v wappalyzer &> /dev/null; then
echo "❌ Wappalyzer not found in PATH"
echo "Solution: Install Wappalyzer using 'npm install -g wappalyzer'"
return 1
fi
echo "✅ Wappalyzer found: $(which wappalyzer)"
echo "Version: $(wappalyzer --version 2>&1)"
# Check Node.js version
if ! command -v node &> /dev/null; then
echo "❌ Node.js not found"
echo "Solution: Install Node.js from https://nodejs.org/"
return 1
fi
local node_version=$(node --version)
echo "✅ Node.js version: $node_version"
# Check network connectivity
if ! curl -s --connect-timeout 5 https://httpbin.org/get > /dev/null; then
echo "❌ Network connectivity issues"
echo "Solution: Check internet connection and proxy settings"
return 1
fi
echo "✅ Network connectivity OK"
# Test basic functionality
echo "Testing basic Wappalyzer functionality..."
if timeout 60 wappalyzer https://httpbin.org/get --timeout 30000 > /dev/null 2>&1; then
echo "✅ Basic functionality test passed"
else
echo "❌ Basic functionality test failed"
echo "Solution: Check Wappalyzer installation and network settings"
return 1
fi
# Check for common configuration issues
echo "Checking for common configuration issues..."
# Check npm permissions
if [ ! -w "$(npm config get prefix)/lib/node_modules" ] 2>/dev/null; then
echo "⚠️ npm permission issues detected"
echo "Solution: Fix npm permissions or use nvm"
fi
# Check for proxy issues
if [ -n "$HTTP_PROXY" ] || [ -n "$HTTPS_PROXY" ]; then
echo "⚠️ Proxy environment variables detected"
echo "Note: Wappalyzer should respect proxy settings automatically"
fi
echo "Troubleshooting completed"
}
# Common error solutions
fix_common_wappalyzer_errors() {
echo "Common Wappalyzer Error Solutions"
echo "================================"
echo "1. 'command not found: wappalyzer'"
echo " Solution: npm install -g wappalyzer"
echo " Alternative: npx wappalyzer <url>"
echo ""
echo "2. 'EACCES: permission denied'"
echo " Solution: Fix npm permissions or use sudo"
echo " Better: Use nvm to manage Node.js versions"
echo ""
echo "3. 'timeout' or 'ETIMEDOUT'"
echo " Solution: Increase timeout with --timeout option"
echo " Example: wappalyzer <url> --timeout 60000"
echo ""
echo "4. 'SSL certificate error'"
echo " Solution: Use --no-ssl-verify (not recommended for production)"
echo ""
echo "5. 'Too many redirects'"
echo " Solution: Use --follow-redirect or check URL manually"
echo ""
echo "6. 'No technologies detected' (false negatives)"
echo " Solution: Increase --max-pages and --max-depth"
echo " Example: wappalyzer <url> --max-pages 10 --max-depth 3"
echo ""
echo "7. 'Out of memory' for large sites"
echo " Solution: Reduce --max-pages or use --no-scripts"
echo " Example: wappalyzer <url> --max-pages 5 --no-scripts"
}
# Run troubleshooting
troubleshoot_wappalyzer
fix_common_wappalyzer_errors
Resources and Documentation
Official Resources
- Wappalyzer Website - Main website and browser extensions
- Wappalyzer CLI GitHub - CLI tool repository
- API Documentation - API reference and pricing
- Technology Database - Technology definitions
Community Resources
- Browser Extension Store - Chrome Web Store
- Firefox Add-ons - Firefox extension
- Technology Discussions - Community Q&A
- Bug Reports - Issue tracking
Integration Examples
- Security Automation - Integration guides
- Competitive Analysis - Market research tools
- Technology Trends - Technology statistics
- Custom Integrations - API integration examples