Wappalyzer Cheat Sheet¶

Overview¶

Wappalyzer is a technology profiler that identifies the technologies used on websites. It detects content management systems, ecommerce platforms, web frameworks, server software, analytics tools, and many other technologies. Available as a browser extension, CLI tool, and API, Wappalyzer is essential for reconnaissance, competitive analysis, and security assessments.

💡 Key Features: Technology detection, browser extension, CLI tool, API access, bulk analysis, detailed reporting, and integration with security workflows.

Installation and Setup¶

Browser Extension Installation¶

# Chrome/Chromium
# Visit: https://chrome.google.com/webstore/detail/wappalyzer/gppongmhjkpfnbhagpmjfkannfbllamg
# Click "Add to Chrome"

# Firefox
# Visit: https://addons.mozilla.org/en-US/firefox/addon/wappalyzer/
# Click "Add to Firefox"

# Edge
# Visit: https://microsoftedge.microsoft.com/addons/detail/wappalyzer/mnbndgmknlpdjdnjfmfcdjoegcckoikn
# Click "Get"

# Safari
# Visit: https://apps.apple.com/app/wappalyzer/id1520333300
# Install from App Store

# Manual installation for development
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer
npm install
npm run build
# Load unpacked extension from src/drivers/webextension/

CLI Tool Installation¶

# Install via npm (Node.js required)
npm install -g wappalyzer

# Verify installation
wappalyzer --version

# Install specific version
npm install -g wappalyzer@6.10.66

# Install locally in project
npm install wappalyzer
npx wappalyzer --version

# Update to latest version
npm update -g wappalyzer

# Uninstall
npm uninstall -g wappalyzer

Docker Installation¶

# Pull official Docker image
docker pull wappalyzer/cli

# Run Wappalyzer in Docker
docker run --rm wappalyzer/cli https://example.com

# Run with volume mount for output
docker run --rm -v $(pwd):/output wappalyzer/cli https://example.com --output /output/results.json

# Create alias for easier usage
echo 'alias wappalyzer="docker run --rm -v $(pwd):/output wappalyzer/cli"' >> ~/.bashrc
source ~/.bashrc

# Build custom Docker image
cat > Dockerfile << 'EOF'
FROM node:16-alpine
RUN npm install -g wappalyzer
WORKDIR /app
ENTRYPOINT ["wappalyzer"]
EOF

docker build -t custom-wappalyzer .

API Setup¶

# Sign up for API access at https://www.wappalyzer.com/api/
# Get API key from dashboard

# Set environment variable
export WAPPALYZER_API_KEY="your_api_key_here"

# Test API access
curl -H "x-api-key: $WAPPALYZER_API_KEY" \
     "https://api.wappalyzer.com/v2/lookup/?urls=https://example.com"

# Create configuration file
cat > ~/.wappalyzer-config.json << 'EOF'
{
  "api_key": "your_api_key_here",
  "api_url": "https://api.wappalyzer.com/v2/",
  "timeout": 30,
  "max_retries": 3,
  "rate_limit": 100
}
EOF

# Set configuration path
export WAPPALYZER_CONFIG=~/.wappalyzer-config.json

Development Setup¶

# Clone repository for development
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer

# Install dependencies
npm install

# Build the project
npm run build

# Run tests
npm test

# Start development server
npm run dev

# Create custom technology definitions
mkdir -p custom-technologies

cat > custom-technologies/custom.json << 'EOF'
{
  "Custom Framework": {
    "cats": [18],
    "description": "Custom web framework",
    "icon": "custom.png",
    "website": "https://custom-framework.com",
    "headers": {
      "X-Powered-By": "Custom Framework"
    },
    "html": "<meta name=\"generator\" content=\"Custom Framework",
    "js": {
      "CustomFramework": ""
    },
    "implies": "PHP"
  }
}
EOF

# Validate custom technology definitions
npm run validate -- custom-technologies/custom.json

Basic Usage and Commands¶

CLI Basic Commands¶

# Analyze single website
wappalyzer https://example.com

# Analyze with detailed output
wappalyzer https://example.com --pretty

# Save results to file
wappalyzer https://example.com --output results.json

# Analyze multiple URLs
wappalyzer https://example.com https://test.com

# Analyze from file
echo -e "https://example.com\nhttps://test.com" > urls.txt
wappalyzer --urls-file urls.txt

# Set custom user agent
wappalyzer https://example.com --user-agent "Custom Agent 1.0"

# Set timeout
wappalyzer https://example.com --timeout 30000

# Follow redirects
wappalyzer https://example.com --follow-redirect

# Disable SSL verification
wappalyzer https://example.com --no-ssl-verify

Advanced CLI Options¶

# Analyze with custom headers
wappalyzer https://example.com --header "Authorization: Bearer token123"

# Set maximum pages to analyze
wappalyzer https://example.com --max-pages 10

# Set crawl depth
wappalyzer https://example.com --max-depth 3

# Analyze with proxy
wappalyzer https://example.com --proxy http://127.0.0.1:8080

# Set custom delay between requests
wappalyzer https://example.com --delay 1000

# Analyze with authentication
wappalyzer https://example.com --cookie "session=abc123; auth=xyz789"

# Output in different formats
wappalyzer https://example.com --output results.csv --format csv
wappalyzer https://example.com --output results.xml --format xml

# Verbose output for debugging
wappalyzer https://example.com --verbose

# Analyze specific categories only
wappalyzer https://example.com --categories "CMS,Web frameworks"

Bulk Analysis¶

# Analyze multiple domains from file
cat > domains.txt << 'EOF'
example.com
test.com
demo.com
sample.com
EOF

# Basic bulk analysis
wappalyzer --urls-file domains.txt --output bulk_results.json

# Bulk analysis with threading
wappalyzer --urls-file domains.txt --concurrent 10 --output threaded_results.json

# Bulk analysis with rate limiting
wappalyzer --urls-file domains.txt --delay 2000 --output rate_limited_results.json

# Analyze subdomains
subfinder -d example.com -silent | head -100 > subdomains.txt
wappalyzer --urls-file subdomains.txt --output subdomain_analysis.json

# Combine with other tools
echo "example.com" | subfinder -silent | httpx -silent | head -50 | while read url; do
    echo "https://$url"
done > live_urls.txt
wappalyzer --urls-file live_urls.txt --output comprehensive_analysis.json

Advanced Technology Detection¶

Custom Technology Detection¶

#!/usr/bin/env python3
# Advanced Wappalyzer automation and custom detection

import json
import subprocess
import requests
import threading
import time
import re
from concurrent.futures import ThreadPoolExecutor, as_completed
from urllib.parse import urlparse, urljoin
import os

class WappalyzerAnalyzer:
    def __init__(self, api_key=None, max_workers=10):
        self.api_key = api_key
        self.max_workers = max_workers
        self.results = []
        self.lock = threading.Lock()
        self.api_url = "https://api.wappalyzer.com/v2/"

    def analyze_url_cli(self, url, options=None):
        """Analyze URL using Wappalyzer CLI"""

        if options is None:
            options = {}

        try:
            # Build command
            cmd = ['wappalyzer', url]

            if options.get('timeout'):
                cmd.extend(['--timeout', str(options['timeout'])])

            if options.get('user_agent'):
                cmd.extend(['--user-agent', options['user_agent']])

            if options.get('headers'):
                for header in options['headers']:
                    cmd.extend(['--header', header])

            if options.get('proxy'):
                cmd.extend(['--proxy', options['proxy']])

            if options.get('delay'):
                cmd.extend(['--delay', str(options['delay'])])

            if options.get('max_pages'):
                cmd.extend(['--max-pages', str(options['max_pages'])])

            if options.get('follow_redirect'):
                cmd.append('--follow-redirect')

            if options.get('no_ssl_verify'):
                cmd.append('--no-ssl-verify')

            # Run Wappalyzer
            result = subprocess.run(
                cmd,
                capture_output=True, text=True,
                timeout=options.get('timeout', 30)
            )

            if result.returncode == 0:
                try:
                    technologies = json.loads(result.stdout)
                    return {
                        'url': url,
                        'success': True,
                        'technologies': technologies,
                        'error': None
                    }
                except json.JSONDecodeError:
                    return {
                        'url': url,
                        'success': False,
                        'technologies': [],
                        'error': 'Invalid JSON response'
                    }
            else:
                return {
                    'url': url,
                    'success': False,
                    'technologies': [],
                    'error': result.stderr
                }

        except subprocess.TimeoutExpired:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': 'CLI timeout'
            }
        except Exception as e:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': str(e)
            }

    def analyze_url_api(self, url):
        """Analyze URL using Wappalyzer API"""

        if not self.api_key:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': 'API key not provided'
            }

        try:
            headers = {
                'x-api-key': self.api_key,
                'Content-Type': 'application/json'
            }

            response = requests.get(
                f"{self.api_url}lookup/",
                params={'urls': url},
                headers=headers,
                timeout=30
            )

            if response.status_code == 200:
                data = response.json()
                return {
                    'url': url,
                    'success': True,
                    'technologies': data.get(url, []),
                    'error': None
                }
            else:
                return {
                    'url': url,
                    'success': False,
                    'technologies': [],
                    'error': f'API error: {response.status_code}'
                }

        except Exception as e:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': str(e)
            }

    def custom_technology_detection(self, url):
        """Perform custom technology detection"""

        custom_detections = []

        try:
            # Fetch page content
            response = requests.get(url, timeout=30, verify=False)
            content = response.text
            headers = response.headers

            # Custom detection rules
            detections = {
                'Custom Framework': {
                    'patterns': [
                        r'<meta name="generator" content="Custom Framework',
                        r'X-Powered-By.*Custom Framework'
                    ],
                    'category': 'Web frameworks'
                },
                'Internal Tool': {
                    'patterns': [
                        r'<!-- Internal Tool v\d+\.\d+ -->',
                        r'internal-tool\.js',
                        r'data-internal-version'
                    ],
                    'category': 'Development tools'
                },
                'Security Headers': {
                    'patterns': [
                        r'Content-Security-Policy',
                        r'X-Frame-Options',
                        r'X-XSS-Protection'
                    ],
                    'category': 'Security'
                },
                'Analytics Platform': {
                    'patterns': [
                        r'analytics\.custom\.com',
                        r'customAnalytics\(',
                        r'data-analytics-id'
                    ],
                    'category': 'Analytics'
                },
                'CDN Detection': {
                    'patterns': [
                        r'cdn\.custom\.com',
                        r'X-Cache.*HIT',
                        r'X-CDN-Provider'
                    ],
                    'category': 'CDN'
                }
            }

            # Check patterns in content and headers
            for tech_name, tech_info in detections.items():
                detected = False

                for pattern in tech_info['patterns']:
                    # Check in content
                    if re.search(pattern, content, re.IGNORECASE):
                        detected = True
                        break

                    # Check in headers
                    for header_name, header_value in headers.items():
                        if re.search(pattern, f"{header_name}: {header_value}", re.IGNORECASE):
                            detected = True
                            break

                if detected:
                    custom_detections.append({
                        'name': tech_name,
                        'category': tech_info['category'],
                        'confidence': 'high',
                        'version': None
                    })

            return {
                'url': url,
                'success': True,
                'custom_technologies': custom_detections,
                'error': None
            }

        except Exception as e:
            return {
                'url': url,
                'success': False,
                'custom_technologies': [],
                'error': str(e)
            }

    def comprehensive_analysis(self, url, use_api=False, custom_detection=True):
        """Perform comprehensive technology analysis"""

        print(f"Analyzing: {url}")

        results = {
            'url': url,
            'timestamp': time.time(),
            'wappalyzer_cli': None,
            'wappalyzer_api': None,
            'custom_detection': None,
            'combined_technologies': [],
            'technology_categories': {},
            'security_technologies': [],
            'risk_assessment': {}
        }

        # CLI analysis
        cli_result = self.analyze_url_cli(url, {
            'timeout': 30000,
            'follow_redirect': True,
            'max_pages': 5
        })
        results['wappalyzer_cli'] = cli_result

        # API analysis (if enabled)
        if use_api and self.api_key:
            api_result = self.analyze_url_api(url)
            results['wappalyzer_api'] = api_result

        # Custom detection (if enabled)
        if custom_detection:
            custom_result = self.custom_technology_detection(url)
            results['custom_detection'] = custom_result

        # Combine and analyze results
        all_technologies = []

        # Add CLI technologies
        if cli_result['success'] and cli_result['technologies']:
            for tech in cli_result['technologies']:
                all_technologies.append({
                    'name': tech.get('name', 'Unknown'),
                    'category': tech.get('categories', []),
                    'version': tech.get('version'),
                    'confidence': tech.get('confidence', 100),
                    'source': 'wappalyzer_cli'
                })

        # Add API technologies
        if use_api and results['wappalyzer_api'] and results['wappalyzer_api']['success']:
            for tech in results['wappalyzer_api']['technologies']:
                all_technologies.append({
                    'name': tech.get('name', 'Unknown'),
                    'category': tech.get('categories', []),
                    'version': tech.get('version'),
                    'confidence': tech.get('confidence', 100),
                    'source': 'wappalyzer_api'
                })

        # Add custom technologies
        if custom_detection and results['custom_detection'] and results['custom_detection']['success']:
            for tech in results['custom_detection']['custom_technologies']:
                all_technologies.append({
                    'name': tech['name'],
                    'category': [tech['category']],
                    'version': tech.get('version'),
                    'confidence': 90,  # High confidence for custom detection
                    'source': 'custom_detection'
                })

        # Remove duplicates and categorize
        unique_technologies = {}
        for tech in all_technologies:
            tech_name = tech['name']
            if tech_name not in unique_technologies:
                unique_technologies[tech_name] = tech
            else:
                # Merge information from multiple sources
                existing = unique_technologies[tech_name]
                if tech['confidence'] > existing['confidence']:
                    unique_technologies[tech_name] = tech

        results['combined_technologies'] = list(unique_technologies.values())

        # Categorize technologies
        categories = {}
        security_techs = []

        for tech in results['combined_technologies']:
            tech_categories = tech.get('category', [])
            if isinstance(tech_categories, str):
                tech_categories = [tech_categories]

            for category in tech_categories:
                if category not in categories:
                    categories[category] = []
                categories[category].append(tech['name'])

                # Identify security-related technologies
                if any(sec_keyword in category.lower() for sec_keyword in ['security', 'firewall', 'protection', 'ssl', 'certificate']):
                    security_techs.append(tech)

        results['technology_categories'] = categories
        results['security_technologies'] = security_techs

        # Risk assessment
        risk_factors = []

        # Check for outdated technologies
        for tech in results['combined_technologies']:
            if tech.get('version'):
                # This would require a database of known vulnerabilities
                # For now, just flag old versions
                version = tech['version']
                if any(old_indicator in version.lower() for old_indicator in ['1.', '2.', '3.', '4.', '5.']):
                    risk_factors.append(f"Potentially outdated {tech['name']} version {version}")

        # Check for missing security headers
        if not security_techs:
            risk_factors.append("No security technologies detected")

        # Check for development/debug technologies in production
        dev_categories = ['Development tools', 'Debugging', 'Testing']
        for category in dev_categories:
            if category in categories:
                risk_factors.append(f"Development tools detected in production: {', '.join(categories[category])}")

        results['risk_assessment'] = {
            'risk_level': 'low' if len(risk_factors) == 0 else 'medium' if len(risk_factors) <= 2 else 'high',
            'risk_factors': risk_factors,
            'recommendations': self.generate_recommendations(results)
        }

        with self.lock:
            self.results.append(results)

        return results

    def generate_recommendations(self, analysis_result):
        """Generate security and optimization recommendations"""

        recommendations = []
        technologies = analysis_result['combined_technologies']
        categories = analysis_result['technology_categories']

        # Security recommendations
        if 'Security' not in categories:
            recommendations.append("Consider implementing security headers (CSP, HSTS, X-Frame-Options)")

        if 'SSL/TLS' not in categories:
            recommendations.append("Ensure HTTPS is properly configured with valid SSL/TLS certificates")

        if 'Web application firewall' not in categories:
            recommendations.append("Consider implementing a Web Application Firewall (WAF)")

        # Performance recommendations
        if 'CDN' not in categories:
            recommendations.append("Consider using a Content Delivery Network (CDN) for better performance")

        if 'Caching' not in categories:
            recommendations.append("Implement caching mechanisms to improve performance")

        # Technology-specific recommendations
        cms_technologies = categories.get('CMS', [])
        if cms_technologies:
            recommendations.append(f"Keep {', '.join(cms_technologies)} updated to the latest version")

        framework_technologies = categories.get('Web frameworks', [])
        if framework_technologies:
            recommendations.append(f"Ensure {', '.join(framework_technologies)} are updated and properly configured")

        # Analytics and privacy
        analytics_technologies = categories.get('Analytics', [])
        if analytics_technologies:
            recommendations.append("Ensure analytics tools comply with privacy regulations (GDPR, CCPA)")

        return recommendations

    def bulk_analysis(self, urls, use_api=False, custom_detection=True):
        """Perform bulk technology analysis"""

        print(f"Starting bulk analysis of {len(urls)} URLs")
        print(f"Max workers: {self.max_workers}")

        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            # Submit all tasks
            future_to_url = {
                executor.submit(self.comprehensive_analysis, url, use_api, custom_detection): url 
                for url in urls
            }

            # Process completed tasks
            for future in as_completed(future_to_url):
                url = future_to_url[future]
                try:
                    result = future.result()
                    tech_count = len(result['combined_technologies'])
                    risk_level = result['risk_assessment']['risk_level']
                    print(f"✓ {url}: {tech_count} technologies, risk: {risk_level}")
                except Exception as e:
                    print(f"✗ Error analyzing {url}: {e}")

        return self.results

    def generate_report(self, output_file='wappalyzer_analysis_report.json'):
        """Generate comprehensive analysis report"""

        # Calculate statistics
        total_urls = len(self.results)
        successful_analyses = sum(1 for r in self.results if r['wappalyzer_cli']['success'])

        # Technology statistics
        all_technologies = {}
        all_categories = {}
        risk_levels = {'low': 0, 'medium': 0, 'high': 0}

        for result in self.results:
            # Count technologies
            for tech in result['combined_technologies']:
                tech_name = tech['name']
                if tech_name not in all_technologies:
                    all_technologies[tech_name] = 0
                all_technologies[tech_name] += 1

            # Count categories
            for category, techs in result['technology_categories'].items():
                if category not in all_categories:
                    all_categories[category] = 0
                all_categories[category] += len(techs)

            # Count risk levels
            risk_level = result['risk_assessment']['risk_level']
            if risk_level in risk_levels:
                risk_levels[risk_level] += 1

        # Sort by popularity
        popular_technologies = sorted(all_technologies.items(), key=lambda x: x[1], reverse=True)[:20]
        popular_categories = sorted(all_categories.items(), key=lambda x: x[1], reverse=True)[:10]

        report = {
            'scan_summary': {
                'total_urls': total_urls,
                'successful_analyses': successful_analyses,
                'success_rate': (successful_analyses / total_urls * 100) if total_urls > 0 else 0,
                'scan_date': time.strftime('%Y-%m-%d %H:%M:%S')
            },
            'technology_statistics': {
                'total_unique_technologies': len(all_technologies),
                'popular_technologies': popular_technologies,
                'popular_categories': popular_categories
            },
            'risk_assessment': {
                'risk_distribution': risk_levels,
                'high_risk_urls': [
                    r['url'] for r in self.results 
                    if r['risk_assessment']['risk_level'] == 'high'
                ]
            },
            'detailed_results': self.results
        }

        # Save report
        with open(output_file, 'w') as f:
            json.dump(report, f, indent=2)

        print(f"\nWappalyzer Analysis Report:")
        print(f"Total URLs analyzed: {total_urls}")
        print(f"Successful analyses: {successful_analyses}")
        print(f"Success rate: {report['scan_summary']['success_rate']:.1f}%")
        print(f"Unique technologies found: {len(all_technologies)}")
        print(f"High risk URLs: {risk_levels['high']}")
        print(f"Report saved to: {output_file}")

        return report

# Usage example
if __name__ == "__main__":
    # Create analyzer instance
    analyzer = WappalyzerAnalyzer(
        api_key=os.getenv('WAPPALYZER_API_KEY'),
        max_workers=5
    )

    # URLs to analyze
    urls = [
        'https://example.com',
        'https://test.com',
        'https://demo.com'
    ]

    # Perform bulk analysis
    results = analyzer.bulk_analysis(
        urls,
        use_api=False,  # Set to True if you have API key
        custom_detection=True
    )

    # Generate report
    report = analyzer.generate_report('comprehensive_wappalyzer_report.json')

Automation and Integration¶

CI/CD Integration¶

#!/bin/bash
# CI/CD script for technology stack analysis

set -e

TARGET_URL="$1"
OUTPUT_DIR="$2"
BASELINE_FILE="$3"

if [ -z "$TARGET_URL" ] || [ -z "$OUTPUT_DIR" ]; then
    echo "Usage: $0 <target_url> <output_dir> [baseline_file]"
    exit 1
fi

echo "Starting technology stack analysis..."
echo "Target: $TARGET_URL"
echo "Output directory: $OUTPUT_DIR"

mkdir -p "$OUTPUT_DIR"

# Run Wappalyzer analysis
echo "Running Wappalyzer analysis..."
wappalyzer "$TARGET_URL" \
    --timeout 30000 \
    --follow-redirect \
    --max-pages 10 \
    --output "$OUTPUT_DIR/wappalyzer_results.json" \
    --pretty

# Parse results
TECH_COUNT=$(jq '. | length' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null || echo "0")
echo "Found $TECH_COUNT technologies"

# Extract technology categories
jq -r '.[].categories[]' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null | sort | uniq > "$OUTPUT_DIR/categories.txt" || touch "$OUTPUT_DIR/categories.txt"
CATEGORY_COUNT=$(wc -l < "$OUTPUT_DIR/categories.txt")

# Extract security-related technologies
jq -r '.[] | select(.categories[] | contains("Security") or contains("SSL") or contains("Certificate")) | .name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null > "$OUTPUT_DIR/security_technologies.txt" || touch "$OUTPUT_DIR/security_technologies.txt"
SECURITY_COUNT=$(wc -l < "$OUTPUT_DIR/security_technologies.txt")

# Check for development/debug technologies
jq -r '.[] | select(.categories[] | contains("Development") or contains("Debug") or contains("Testing")) | .name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null > "$OUTPUT_DIR/dev_technologies.txt" || touch "$OUTPUT_DIR/dev_technologies.txt"
DEV_COUNT=$(wc -l < "$OUTPUT_DIR/dev_technologies.txt")

# Generate summary report
cat > "$OUTPUT_DIR/technology-summary.txt" << EOF
Technology Stack Analysis Summary
================================
Date: $(date)
Target: $TARGET_URL
Total Technologies: $TECH_COUNT
Categories: $CATEGORY_COUNT
Security Technologies: $SECURITY_COUNT
Development Technologies: $DEV_COUNT

Status: $(if [ "$DEV_COUNT" -gt "0" ]; then echo "WARNING - Development tools detected"; else echo "OK"; fi)
EOF

# Compare with baseline if provided
if [ -n "$BASELINE_FILE" ] && [ -f "$BASELINE_FILE" ]; then
    echo "Comparing with baseline..."

    # Extract current technology names
    jq -r '.[].name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null | sort > "$OUTPUT_DIR/current_technologies.txt" || touch "$OUTPUT_DIR/current_technologies.txt"

    # Extract baseline technology names
    jq -r '.[].name' "$BASELINE_FILE" 2>/dev/null | sort > "$OUTPUT_DIR/baseline_technologies.txt" || touch "$OUTPUT_DIR/baseline_technologies.txt"

    # Find differences
    comm -23 "$OUTPUT_DIR/current_technologies.txt" "$OUTPUT_DIR/baseline_technologies.txt" > "$OUTPUT_DIR/new_technologies.txt"
    comm -13 "$OUTPUT_DIR/current_technologies.txt" "$OUTPUT_DIR/baseline_technologies.txt" > "$OUTPUT_DIR/removed_technologies.txt"

    NEW_COUNT=$(wc -l < "$OUTPUT_DIR/new_technologies.txt")
    REMOVED_COUNT=$(wc -l < "$OUTPUT_DIR/removed_technologies.txt")

    echo "New technologies: $NEW_COUNT"
    echo "Removed technologies: $REMOVED_COUNT"

    # Add to summary
    cat >> "$OUTPUT_DIR/technology-summary.txt" << EOF

Baseline Comparison:
New Technologies: $NEW_COUNT
Removed Technologies: $REMOVED_COUNT
EOF

    if [ "$NEW_COUNT" -gt "0" ]; then
        echo "New technologies detected:"
        cat "$OUTPUT_DIR/new_technologies.txt"
    fi
fi

# Generate detailed report
python3 << 'PYTHON_EOF'
import sys
import json
from datetime import datetime

output_dir = sys.argv[1]
target_url = sys.argv[2]

# Read Wappalyzer results
try:
    with open(f"{output_dir}/wappalyzer_results.json", 'r') as f:
        technologies = json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
    technologies = []

# Categorize technologies
categories = {}
security_techs = []
dev_techs = []
outdated_techs = []

for tech in technologies:
    tech_name = tech.get('name', 'Unknown')
    tech_categories = tech.get('categories', [])
    tech_version = tech.get('version', '')

    # Categorize
    for category in tech_categories:
        if category not in categories:
            categories[category] = []
        categories[category].append(tech_name)

        # Security technologies
        if any(sec_keyword in category.lower() for sec_keyword in ['security', 'ssl', 'certificate', 'firewall']):
            security_techs.append(tech)

        # Development technologies
        if any(dev_keyword in category.lower() for dev_keyword in ['development', 'debug', 'testing']):
            dev_techs.append(tech)

    # Check for potentially outdated versions
    if tech_version and any(old_indicator in tech_version for old_indicator in ['1.', '2.', '3.', '4.', '5.']):
        outdated_techs.append(tech)

# Risk assessment
risk_factors = []
if len(dev_techs) > 0:
    risk_factors.append(f"Development tools detected: {', '.join([t['name'] for t in dev_techs])}")

if len(security_techs) == 0:
    risk_factors.append("No security technologies detected")

if len(outdated_techs) > 0:
    risk_factors.append(f"Potentially outdated technologies: {', '.join([t['name'] for t in outdated_techs])}")

risk_level = 'low' if len(risk_factors) == 0 else 'medium' if len(risk_factors) <= 2 else 'high'

# Create detailed report
report = {
    'scan_info': {
        'target': target_url,
        'scan_date': datetime.now().isoformat(),
        'technology_count': len(technologies),
        'category_count': len(categories)
    },
    'technologies': technologies,
    'categories': categories,
    'security_assessment': {
        'security_technologies': security_techs,
        'development_technologies': dev_techs,
        'outdated_technologies': outdated_techs,
        'risk_level': risk_level,
        'risk_factors': risk_factors
    }
}

# Save detailed report
with open(f"{output_dir}/wappalyzer-detailed-report.json", 'w') as f:
    json.dump(report, f, indent=2)

# Generate HTML report
html_content = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Technology Stack Analysis Report</title>
    <style>
        body {{ font-family: Arial, sans-serif; margin: 20px; }}
        .header {{ background-color: #f0f0f0; padding: 20px; border-radius: 5px; }}
        .category {{ margin: 10px 0; padding: 15px; border-left: 4px solid #007bff; background-color: #f8f9fa; }}
        .risk-high {{ border-left-color: #dc3545; }}
        .risk-medium {{ border-left-color: #ffc107; }}
        .risk-low {{ border-left-color: #28a745; }}
        table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
        th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
        th {{ background-color: #f2f2f2; }}
    </style>
</head>
<body>
    <div class="header">
        <h1>Technology Stack Analysis Report</h1>
        <p><strong>Target:</strong> {target_url}</p>
        <p><strong>Scan Date:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
        <p><strong>Technologies Found:</strong> {len(technologies)}</p>
        <p><strong>Risk Level:</strong> <span class="risk-{risk_level}">{risk_level.upper()}</span></p>
    </div>

    <h2>Technology Categories</h2>
"""

for category, techs in categories.items():
    html_content += f"""
    <div class="category">
        <h3>{category}</h3>
        <p>{', '.join(techs)}</p>
    </div>
    """

html_content += """
    <h2>Detailed Technologies</h2>
    <table>
        <tr><th>Name</th><th>Version</th><th>Categories</th><th>Confidence</th></tr>
"""

for tech in technologies:
    html_content += f"""
    <tr>
        <td>{tech.get('name', 'Unknown')}</td>
        <td>{tech.get('version', 'N/A')}</td>
        <td>{', '.join(tech.get('categories', []))}</td>
        <td>{tech.get('confidence', 'N/A')}%</td>
    </tr>
    """

html_content += """
    </table>

    <h2>Risk Assessment</h2>
"""

if risk_factors:
    html_content += "<ul>"
    for factor in risk_factors:
        html_content += f"<li>{factor}</li>"
    html_content += "</ul>"
else:
    html_content += "<p>No significant risk factors identified.</p>"

html_content += """
</body>
</html>
"""

with open(f"{output_dir}/wappalyzer-report.html", 'w') as f:
    f.write(html_content)

print(f"Detailed reports generated:")
print(f"- JSON: {output_dir}/wappalyzer-detailed-report.json")
print(f"- HTML: {output_dir}/wappalyzer-report.html")
PYTHON_EOF

# Check for development technologies and exit
if [ "$DEV_COUNT" -gt "0" ]; then
    echo "WARNING: Development technologies detected in production environment"
    echo "Development technologies found:"
    cat "$OUTPUT_DIR/dev_technologies.txt"
    exit 1
else
    echo "SUCCESS: No development technologies detected"
    exit 0
fi

GitHub Actions Integration¶

# .github/workflows/wappalyzer-tech-analysis.yml
name: Technology Stack Analysis

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]
  schedule:
    - cron: '0 6 * * 1'  # Weekly scan on Mondays at 6 AM

jobs:
  technology-analysis:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Setup Node.js
      uses: actions/setup-node@v3
      with:
        node-version: '16'

    - name: Install Wappalyzer
      run: |
        npm install -g wappalyzer
        wappalyzer --version

    - name: Analyze production environment
      run: |
        mkdir -p analysis-results

        # Analyze main application
        wappalyzer ${{ vars.PRODUCTION_URL }} \
          --timeout 30000 \
          --follow-redirect \
          --max-pages 10 \
          --output analysis-results/production-tech.json \
          --pretty

        # Analyze staging environment
        wappalyzer ${{ vars.STAGING_URL }} \
          --timeout 30000 \
          --follow-redirect \
          --max-pages 5 \
          --output analysis-results/staging-tech.json \
          --pretty

        # Count technologies
        PROD_TECH_COUNT=$(jq '. | length' analysis-results/production-tech.json 2>/dev/null || echo "0")
        STAGING_TECH_COUNT=$(jq '. | length' analysis-results/staging-tech.json 2>/dev/null || echo "0")

        echo "PROD_TECH_COUNT=$PROD_TECH_COUNT" >> $GITHUB_ENV
        echo "STAGING_TECH_COUNT=$STAGING_TECH_COUNT" >> $GITHUB_ENV

    - name: Check for development technologies
      run: |
        # Check for development/debug technologies in production
        DEV_TECHS=$(jq -r '.[] | select(.categories[] | contains("Development") or contains("Debug") or contains("Testing")) | .name' analysis-results/production-tech.json 2>/dev/null || echo "")

        if [ -n "$DEV_TECHS" ]; then
          echo "DEV_TECHS_FOUND=true" >> $GITHUB_ENV
          echo "Development technologies found in production:"
          echo "$DEV_TECHS"
          echo "$DEV_TECHS" > analysis-results/dev-technologies.txt
        else
          echo "DEV_TECHS_FOUND=false" >> $GITHUB_ENV
          touch analysis-results/dev-technologies.txt
        fi

    - name: Security technology assessment
      run: |
        # Check for security technologies
        SECURITY_TECHS=$(jq -r '.[] | select(.categories[] | contains("Security") or contains("SSL") or contains("Certificate")) | .name' analysis-results/production-tech.json 2>/dev/null || echo "")

        if [ -n "$SECURITY_TECHS" ]; then
          echo "Security technologies found:"
          echo "$SECURITY_TECHS"
          echo "$SECURITY_TECHS" > analysis-results/security-technologies.txt
        else
          echo "No security technologies detected"
          touch analysis-results/security-technologies.txt
        fi

        SECURITY_COUNT=$(echo "$SECURITY_TECHS" | wc -l)
        echo "SECURITY_COUNT=$SECURITY_COUNT" >> $GITHUB_ENV

    - name: Generate comparison report
      run: |
        # Compare production and staging
        jq -r '.[].name' analysis-results/production-tech.json 2>/dev/null | sort > analysis-results/prod-techs.txt || touch analysis-results/prod-techs.txt
        jq -r '.[].name' analysis-results/staging-tech.json 2>/dev/null | sort > analysis-results/staging-techs.txt || touch analysis-results/staging-techs.txt

        # Find differences
        comm -23 analysis-results/prod-techs.txt analysis-results/staging-techs.txt > analysis-results/prod-only.txt
        comm -13 analysis-results/prod-techs.txt analysis-results/staging-techs.txt > analysis-results/staging-only.txt

        # Generate summary
        cat > analysis-results/summary.txt << EOF
        Technology Stack Analysis Summary
        ================================
        Production Technologies: $PROD_TECH_COUNT
        Staging Technologies: $STAGING_TECH_COUNT
        Security Technologies: $SECURITY_COUNT
        Development Technologies in Production: $(if [ "$DEV_TECHS_FOUND" = "true" ]; then echo "YES (CRITICAL)"; else echo "NO"; fi)

        Production-only Technologies: $(wc -l < analysis-results/prod-only.txt)
        Staging-only Technologies: $(wc -l < analysis-results/staging-only.txt)
        EOF

    - name: Upload analysis results
      uses: actions/upload-artifact@v3
      with:
        name: technology-analysis-results
        path: analysis-results/

    - name: Comment PR with results
      if: github.event_name == 'pull_request'
      uses: actions/github-script@v6
      with:
        script: |
          const fs = require('fs');
          const summary = fs.readFileSync('analysis-results/summary.txt', 'utf8');

          github.rest.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: `## Technology Stack Analysis\n\n\`\`\`\n${summary}\n\`\`\``
          });

    - name: Fail if development technologies found
      run: |
        if [ "$DEV_TECHS_FOUND" = "true" ]; then
          echo "CRITICAL: Development technologies detected in production!"
          cat analysis-results/dev-technologies.txt
          exit 1
        fi

Performance Optimization and Troubleshooting¶

Performance Tuning¶

# Optimize Wappalyzer for different scenarios

# Fast analysis with minimal pages
wappalyzer https://example.com --max-pages 1 --timeout 10000

# Thorough analysis with deep crawling
wappalyzer https://example.com --max-pages 20 --max-depth 5 --timeout 60000

# Bulk analysis with rate limiting
for url in $(cat urls.txt); do
    wappalyzer "$url" --output "results_$(echo $url | sed 's/[^a-zA-Z0-9]/_/g').json"
    sleep 2
done

# Memory-efficient analysis for large sites
wappalyzer https://example.com --max-pages 5 --no-scripts --timeout 30000

# Performance monitoring script
#!/bin/bash
monitor_wappalyzer_performance() {
    local url="$1"
    local output_file="wappalyzer-performance-$(date +%s).log"

    echo "Monitoring Wappalyzer performance for: $url"

    # Start monitoring
    {
        echo "Timestamp,CPU%,Memory(MB),Status"
        while true; do
            if pgrep -f "wappalyzer" > /dev/null; then
                local cpu=$(ps -p $(pgrep -f "wappalyzer") -o %cpu --no-headers)
                local mem=$(ps -p $(pgrep -f "wappalyzer") -o rss --no-headers | awk '{print $1/1024}')
                echo "$(date +%s),$cpu,$mem,running"
            fi
            sleep 2
        done
    } > "$output_file" &

    local monitor_pid=$!

    # Run Wappalyzer
    time wappalyzer "$url" --pretty --output "performance_test_results.json"

    # Stop monitoring
    kill $monitor_pid 2>/dev/null

    echo "Performance monitoring completed: $output_file"
}

# Usage
monitor_wappalyzer_performance "https://example.com"

Troubleshooting Common Issues¶

# Troubleshooting script for Wappalyzer
troubleshoot_wappalyzer() {
    echo "Wappalyzer Troubleshooting Guide"
    echo "==============================="

    # Check if Wappalyzer is installed
    if ! command -v wappalyzer &> /dev/null; then
        echo "❌ Wappalyzer not found in PATH"
        echo "Solution: Install Wappalyzer using 'npm install -g wappalyzer'"
        return 1
    fi

    echo "✅ Wappalyzer found: $(which wappalyzer)"
    echo "Version: $(wappalyzer --version 2>&1)"

    # Check Node.js version
    if ! command -v node &> /dev/null; then
        echo "❌ Node.js not found"
        echo "Solution: Install Node.js from https://nodejs.org/"
        return 1
    fi

    local node_version=$(node --version)
    echo "✅ Node.js version: $node_version"

    # Check network connectivity
    if ! curl -s --connect-timeout 5 https://httpbin.org/get > /dev/null; then
        echo "❌ Network connectivity issues"
        echo "Solution: Check internet connection and proxy settings"
        return 1
    fi

    echo "✅ Network connectivity OK"

    # Test basic functionality
    echo "Testing basic Wappalyzer functionality..."

    if timeout 60 wappalyzer https://httpbin.org/get --timeout 30000 > /dev/null 2>&1; then
        echo "✅ Basic functionality test passed"
    else
        echo "❌ Basic functionality test failed"
        echo "Solution: Check Wappalyzer installation and network settings"
        return 1
    fi

    # Check for common configuration issues
    echo "Checking for common configuration issues..."

    # Check npm permissions
    if [ ! -w "$(npm config get prefix)/lib/node_modules" ] 2>/dev/null; then
        echo "⚠️  npm permission issues detected"
        echo "Solution: Fix npm permissions or use nvm"
    fi

    # Check for proxy issues
    if [ -n "$HTTP_PROXY" ] || [ -n "$HTTPS_PROXY" ]; then
        echo "⚠️  Proxy environment variables detected"
        echo "Note: Wappalyzer should respect proxy settings automatically"
    fi

    echo "Troubleshooting completed"
}

# Common error solutions
fix_common_wappalyzer_errors() {
    echo "Common Wappalyzer Error Solutions"
    echo "================================"

    echo "1. 'command not found: wappalyzer'"
    echo "   Solution: npm install -g wappalyzer"
    echo "   Alternative: npx wappalyzer <url>"
    echo ""

    echo "2. 'EACCES: permission denied'"
    echo "   Solution: Fix npm permissions or use sudo"
    echo "   Better: Use nvm to manage Node.js versions"
    echo ""

    echo "3. 'timeout' or 'ETIMEDOUT'"
    echo "   Solution: Increase timeout with --timeout option"
    echo "   Example: wappalyzer <url> --timeout 60000"
    echo ""

    echo "4. 'SSL certificate error'"
    echo "   Solution: Use --no-ssl-verify (not recommended for production)"
    echo ""

    echo "5. 'Too many redirects'"
    echo "   Solution: Use --follow-redirect or check URL manually"
    echo ""

    echo "6. 'No technologies detected' (false negatives)"
    echo "   Solution: Increase --max-pages and --max-depth"
    echo "   Example: wappalyzer <url> --max-pages 10 --max-depth 3"
    echo ""

    echo "7. 'Out of memory' for large sites"
    echo "   Solution: Reduce --max-pages or use --no-scripts"
    echo "   Example: wappalyzer <url> --max-pages 5 --no-scripts"
}

# Run troubleshooting
troubleshoot_wappalyzer
fix_common_wappalyzer_errors

Resources and Documentation¶

Official Resources¶

Wappalyzer Website - Main website and browser extensions
Wappalyzer CLI GitHub - CLI tool repository
API Documentation - API reference and pricing
Technology Database - Technology definitions

Community Resources¶

Browser Extension Store - Chrome Web Store
Firefox Add-ons - Firefox extension
Technology Discussions - Community Q&A
Bug Reports - Issue tracking

Integration Examples¶

Security Automation - Integration guides
Competitive Analysis - Market research tools
Technology Trends - Technology statistics
Custom Integrations - API integration examples