Saltar a contenido

Wappalyzer Cheat Sheet

"Clase de la hoja" id="copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada

Sinopsis

Wappalyzer es un perfilador tecnológico que identifica las tecnologías utilizadas en los sitios web. Detecta sistemas de gestión de contenidos, plataformas de comercio electrónico, marcos web, software del servidor, herramientas de análisis y muchas otras tecnologías. Disponible como extensión del navegador, herramienta CLI y API, Wappalyzer es esencial para el reconocimiento, análisis competitivo y evaluaciones de seguridad.

Características clave: Detección tecnológica, extensión del navegador, herramienta CLI, acceso a API, análisis a granel, informes detallados e integración con flujos de trabajo de seguridad.

Instalación y configuración

Instalación de extensión del navegador

# Chrome/Chromium
# Visit: https://chrome.google.com/webstore/detail/wappalyzer/gppongmhjkpfnbhagpmjfkannfbllamg
# Click "Add to Chrome"

# Firefox
# Visit: https://addons.mozilla.org/en-US/firefox/addon/wappalyzer/
# Click "Add to Firefox"

# Edge
# Visit: https://microsoftedge.microsoft.com/addons/detail/wappalyzer/mnbndgmknlpdjdnjfmfcdjoegcckoikn
# Click "Get"

# Safari
# Visit: https://apps.apple.com/app/wappalyzer/id1520333300
# Install from App Store

# Manual installation for development
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer
npm install
npm run build
# Load unpacked extension from src/drivers/webextension/

Instalación de herramientas CLI

# Install via npm (Node.js required)
npm install -g wappalyzer

# Verify installation
wappalyzer --version

# Install specific version
npm install -g wappalyzer@6.10.66

# Install locally in project
npm install wappalyzer
npx wappalyzer --version

# Update to latest version
npm update -g wappalyzer

# Uninstall
npm uninstall -g wappalyzer

Docker Instalación

# Pull official Docker image
docker pull wappalyzer/cli

# Run Wappalyzer in Docker
docker run --rm wappalyzer/cli https://example.com

# Run with volume mount for output
docker run --rm -v $(pwd):/output wappalyzer/cli https://example.com --output /output/results.json

# Create alias for easier usage
echo 'alias wappalyzer="docker run --rm -v $(pwd):/output wappalyzer/cli"' >> ~/.bashrc
source ~/.bashrc

# Build custom Docker image
cat > Dockerfile << 'EOF'
FROM node:16-alpine
RUN npm install -g wappalyzer
WORKDIR /app
ENTRYPOINT ["wappalyzer"]
EOF

docker build -t custom-wappalyzer .

Configuración de API

# Sign up for API access at https://www.wappalyzer.com/api/
# Get API key from dashboard

# Set environment variable
export WAPPALYZER_API_KEY="your_api_key_here"

# Test API access
curl -H "x-api-key: $WAPPALYZER_API_KEY" \
     "https://api.wappalyzer.com/v2/lookup/?urls=https://example.com"

# Create configuration file
cat > ~/.wappalyzer-config.json << 'EOF'
{
  "api_key": "your_api_key_here",
  "api_url": "https://api.wappalyzer.com/v2/",
  "timeout": 30,
  "max_retries": 3,
  "rate_limit": 100
}
EOF

# Set configuration path
export WAPPALYZER_CONFIG=~/.wappalyzer-config.json

Development Setup

# Clone repository for development
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer

# Install dependencies
npm install

# Build the project
npm run build

# Run tests
npm test

# Start development server
npm run dev

# Create custom technology definitions
mkdir -p custom-technologies

cat > custom-technologies/custom.json << 'EOF'
{
  "Custom Framework": {
    "cats": [18],
    "description": "Custom web framework",
    "icon": "custom.png",
    "website": "https://custom-framework.com",
    "headers": {
      "X-Powered-By": "Custom Framework"
    },
    "html": "<meta name=\"generator\" content=\"Custom Framework",
    "js": {
      "CustomFramework": ""
    },
    "implies": "PHP"
  }
}
EOF

# Validate custom technology definitions
npm run validate -- custom-technologies/custom.json

Uso básico y comandos

CLI Comandos básicos

# Analyze single website
wappalyzer https://example.com

# Analyze with detailed output
wappalyzer https://example.com --pretty

# Save results to file
wappalyzer https://example.com --output results.json

# Analyze multiple URLs
wappalyzer https://example.com https://test.com

# Analyze from file
echo -e "https://example.com\nhttps://test.com" > urls.txt
wappalyzer --urls-file urls.txt

# Set custom user agent
wappalyzer https://example.com --user-agent "Custom Agent 1.0"

# Set timeout
wappalyzer https://example.com --timeout 30000

# Follow redirects
wappalyzer https://example.com --follow-redirect

# Disable SSL verification
wappalyzer https://example.com --no-ssl-verify

Opciones avanzadas de CLI

# Analyze with custom headers
wappalyzer https://example.com --header "Authorization: Bearer token123"

# Set maximum pages to analyze
wappalyzer https://example.com --max-pages 10

# Set crawl depth
wappalyzer https://example.com --max-depth 3

# Analyze with proxy
wappalyzer https://example.com --proxy http://127.0.0.1:8080

# Set custom delay between requests
wappalyzer https://example.com --delay 1000

# Analyze with authentication
wappalyzer https://example.com --cookie "session=abc123; auth=xyz789"

# Output in different formats
wappalyzer https://example.com --output results.csv --format csv
wappalyzer https://example.com --output results.xml --format xml

# Verbose output for debugging
wappalyzer https://example.com --verbose

# Analyze specific categories only
wappalyzer https://example.com --categories "CMS,Web frameworks"

Análisis a granel

# Analyze multiple domains from file
cat > domains.txt << 'EOF'
example.com
test.com
demo.com
sample.com
EOF

# Basic bulk analysis
wappalyzer --urls-file domains.txt --output bulk_results.json

# Bulk analysis with threading
wappalyzer --urls-file domains.txt --concurrent 10 --output threaded_results.json

# Bulk analysis with rate limiting
wappalyzer --urls-file domains.txt --delay 2000 --output rate_limited_results.json

# Analyze subdomains
subfinder -d example.com -silent | head -100 > subdomains.txt
wappalyzer --urls-file subdomains.txt --output subdomain_analysis.json

# Combine with other tools
| echo "example.com" | subfinder -silent | httpx -silent | head -50 | while read url; do |
    echo "https://$url"
done > live_urls.txt
wappalyzer --urls-file live_urls.txt --output comprehensive_analysis.json

Detección avanzada de tecnología

Detector de Tecnología Aduanera

#!/usr/bin/env python3
# Advanced Wappalyzer automation and custom detection

import json
import subprocess
import requests
import threading
import time
import re
from concurrent.futures import ThreadPoolExecutor, as_completed
from urllib.parse import urlparse, urljoin
import os

class WappalyzerAnalyzer:
    def __init__(self, api_key=None, max_workers=10):
        self.api_key = api_key
        self.max_workers = max_workers
        self.results = []
        self.lock = threading.Lock()
        self.api_url = "https://api.wappalyzer.com/v2/"

    def analyze_url_cli(self, url, options=None):
        """Analyze URL using Wappalyzer CLI"""

        if options is None:
            options = {}

        try:
            # Build command
            cmd = ['wappalyzer', url]

            if options.get('timeout'):
                cmd.extend(['--timeout', str(options['timeout'])])

            if options.get('user_agent'):
                cmd.extend(['--user-agent', options['user_agent']])

            if options.get('headers'):
                for header in options['headers']:
                    cmd.extend(['--header', header])

            if options.get('proxy'):
                cmd.extend(['--proxy', options['proxy']])

            if options.get('delay'):
                cmd.extend(['--delay', str(options['delay'])])

            if options.get('max_pages'):
                cmd.extend(['--max-pages', str(options['max_pages'])])

            if options.get('follow_redirect'):
                cmd.append('--follow-redirect')

            if options.get('no_ssl_verify'):
                cmd.append('--no-ssl-verify')

            # Run Wappalyzer
            result = subprocess.run(
                cmd,
                capture_output=True, text=True,
                timeout=options.get('timeout', 30)
            )

            if result.returncode == 0:
                try:
                    technologies = json.loads(result.stdout)
                    return {
                        'url': url,
                        'success': True,
                        'technologies': technologies,
                        'error': None
                    }
                except json.JSONDecodeError:
                    return {
                        'url': url,
                        'success': False,
                        'technologies': [],
                        'error': 'Invalid JSON response'
                    }
            else:
                return {
                    'url': url,
                    'success': False,
                    'technologies': [],
                    'error': result.stderr
                }

        except subprocess.TimeoutExpired:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': 'CLI timeout'
            }
        except Exception as e:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': str(e)
            }

    def analyze_url_api(self, url):
        """Analyze URL using Wappalyzer API"""

        if not self.api_key:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': 'API key not provided'
            }

        try:
            headers = {
                'x-api-key': self.api_key,
                'Content-Type': 'application/json'
            }

            response = requests.get(
                f"{self.api_url}lookup/",
                params={'urls': url},
                headers=headers,
                timeout=30
            )

            if response.status_code == 200:
                data = response.json()
                return {
                    'url': url,
                    'success': True,
                    'technologies': data.get(url, []),
                    'error': None
                }
            else:
                return {
                    'url': url,
                    'success': False,
                    'technologies': [],
                    'error': f'API error: {response.status_code}'
                }

        except Exception as e:
            return {
                'url': url,
                'success': False,
                'technologies': [],
                'error': str(e)
            }

    def custom_technology_detection(self, url):
        """Perform custom technology detection"""

        custom_detections = []

        try:
            # Fetch page content
            response = requests.get(url, timeout=30, verify=False)
            content = response.text
            headers = response.headers

            # Custom detection rules
            detections = {
                'Custom Framework': {
                    'patterns': [
                        r'<meta name="generator" content="Custom Framework',
                        r'X-Powered-By.*Custom Framework'
                    ],
                    'category': 'Web frameworks'
                },
                'Internal Tool': {
                    'patterns': [
                        r'<!-- Internal Tool v\d+\.\d+ -->',
                        r'internal-tool\.js',
                        r'data-internal-version'
                    ],
                    'category': 'Development tools'
                },
                'Security Headers': {
                    'patterns': [
                        r'Content-Security-Policy',
                        r'X-Frame-Options',
                        r'X-XSS-Protection'
                    ],
                    'category': 'Security'
                },
                'Analytics Platform': {
                    'patterns': [
                        r'analytics\.custom\.com',
                        r'customAnalytics\(',
                        r'data-analytics-id'
                    ],
                    'category': 'Analytics'
                },
                'CDN Detection': {
                    'patterns': [
                        r'cdn\.custom\.com',
                        r'X-Cache.*HIT',
                        r'X-CDN-Provider'
                    ],
                    'category': 'CDN'
                }
            }

            # Check patterns in content and headers
            for tech_name, tech_info in detections.items():
                detected = False

                for pattern in tech_info['patterns']:
                    # Check in content
                    if re.search(pattern, content, re.IGNORECASE):
                        detected = True
                        break

                    # Check in headers
                    for header_name, header_value in headers.items():
                        if re.search(pattern, f"{header_name}: {header_value}", re.IGNORECASE):
                            detected = True
                            break

                if detected:
                    custom_detections.append({
                        'name': tech_name,
                        'category': tech_info['category'],
                        'confidence': 'high',
                        'version': None
                    })

            return {
                'url': url,
                'success': True,
                'custom_technologies': custom_detections,
                'error': None
            }

        except Exception as e:
            return {
                'url': url,
                'success': False,
                'custom_technologies': [],
                'error': str(e)
            }

    def comprehensive_analysis(self, url, use_api=False, custom_detection=True):
        """Perform comprehensive technology analysis"""

        print(f"Analyzing: {url}")

        results = {
            'url': url,
            'timestamp': time.time(),
            'wappalyzer_cli': None,
            'wappalyzer_api': None,
            'custom_detection': None,
            'combined_technologies': [],
            'technology_categories': {},
            'security_technologies': [],
            'risk_assessment': {}
        }

        # CLI analysis
        cli_result = self.analyze_url_cli(url, {
            'timeout': 30000,
            'follow_redirect': True,
            'max_pages': 5
        })
        results['wappalyzer_cli'] = cli_result

        # API analysis (if enabled)
        if use_api and self.api_key:
            api_result = self.analyze_url_api(url)
            results['wappalyzer_api'] = api_result

        # Custom detection (if enabled)
        if custom_detection:
            custom_result = self.custom_technology_detection(url)
            results['custom_detection'] = custom_result

        # Combine and analyze results
        all_technologies = []

        # Add CLI technologies
        if cli_result['success'] and cli_result['technologies']:
            for tech in cli_result['technologies']:
                all_technologies.append({
                    'name': tech.get('name', 'Unknown'),
                    'category': tech.get('categories', []),
                    'version': tech.get('version'),
                    'confidence': tech.get('confidence', 100),
                    'source': 'wappalyzer_cli'
                })

        # Add API technologies
        if use_api and results['wappalyzer_api'] and results['wappalyzer_api']['success']:
            for tech in results['wappalyzer_api']['technologies']:
                all_technologies.append({
                    'name': tech.get('name', 'Unknown'),
                    'category': tech.get('categories', []),
                    'version': tech.get('version'),
                    'confidence': tech.get('confidence', 100),
                    'source': 'wappalyzer_api'
                })

        # Add custom technologies
        if custom_detection and results['custom_detection'] and results['custom_detection']['success']:
            for tech in results['custom_detection']['custom_technologies']:
                all_technologies.append({
                    'name': tech['name'],
                    'category': [tech['category']],
                    'version': tech.get('version'),
                    'confidence': 90,  # High confidence for custom detection
                    'source': 'custom_detection'
                })

        # Remove duplicates and categorize
        unique_technologies = {}
        for tech in all_technologies:
            tech_name = tech['name']
            if tech_name not in unique_technologies:
                unique_technologies[tech_name] = tech
            else:
                # Merge information from multiple sources
                existing = unique_technologies[tech_name]
                if tech['confidence'] > existing['confidence']:
                    unique_technologies[tech_name] = tech

        results['combined_technologies'] = list(unique_technologies.values())

        # Categorize technologies
        categories = {}
        security_techs = []

        for tech in results['combined_technologies']:
            tech_categories = tech.get('category', [])
            if isinstance(tech_categories, str):
                tech_categories = [tech_categories]

            for category in tech_categories:
                if category not in categories:
                    categories[category] = []
                categories[category].append(tech['name'])

                # Identify security-related technologies
                if any(sec_keyword in category.lower() for sec_keyword in ['security', 'firewall', 'protection', 'ssl', 'certificate']):
                    security_techs.append(tech)

        results['technology_categories'] = categories
        results['security_technologies'] = security_techs

        # Risk assessment
        risk_factors = []

        # Check for outdated technologies
        for tech in results['combined_technologies']:
            if tech.get('version'):
                # This would require a database of known vulnerabilities
                # For now, just flag old versions
                version = tech['version']
                if any(old_indicator in version.lower() for old_indicator in ['1.', '2.', '3.', '4.', '5.']):
                    risk_factors.append(f"Potentially outdated {tech['name']} version {version}")

        # Check for missing security headers
        if not security_techs:
            risk_factors.append("No security technologies detected")

        # Check for development/debug technologies in production
        dev_categories = ['Development tools', 'Debugging', 'Testing']
        for category in dev_categories:
            if category in categories:
                risk_factors.append(f"Development tools detected in production: {', '.join(categories[category])}")

        results['risk_assessment'] = {
            'risk_level': 'low' if len(risk_factors) == 0 else 'medium' if len(risk_factors) <= 2 else 'high',
            'risk_factors': risk_factors,
            'recommendations': self.generate_recommendations(results)
        }

        with self.lock:
            self.results.append(results)

        return results

    def generate_recommendations(self, analysis_result):
        """Generate security and optimization recommendations"""

        recommendations = []
        technologies = analysis_result['combined_technologies']
        categories = analysis_result['technology_categories']

        # Security recommendations
        if 'Security' not in categories:
            recommendations.append("Consider implementing security headers (CSP, HSTS, X-Frame-Options)")

        if 'SSL/TLS' not in categories:
            recommendations.append("Ensure HTTPS is properly configured with valid SSL/TLS certificates")

        if 'Web application firewall' not in categories:
            recommendations.append("Consider implementing a Web Application Firewall (WAF)")

        # Performance recommendations
        if 'CDN' not in categories:
            recommendations.append("Consider using a Content Delivery Network (CDN) for better performance")

        if 'Caching' not in categories:
            recommendations.append("Implement caching mechanisms to improve performance")

        # Technology-specific recommendations
        cms_technologies = categories.get('CMS', [])
        if cms_technologies:
            recommendations.append(f"Keep {', '.join(cms_technologies)} updated to the latest version")

        framework_technologies = categories.get('Web frameworks', [])
        if framework_technologies:
            recommendations.append(f"Ensure {', '.join(framework_technologies)} are updated and properly configured")

        # Analytics and privacy
        analytics_technologies = categories.get('Analytics', [])
        if analytics_technologies:
            recommendations.append("Ensure analytics tools comply with privacy regulations (GDPR, CCPA)")

        return recommendations

    def bulk_analysis(self, urls, use_api=False, custom_detection=True):
        """Perform bulk technology analysis"""

        print(f"Starting bulk analysis of {len(urls)} URLs")
        print(f"Max workers: {self.max_workers}")

        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            # Submit all tasks
            future_to_url = {
                executor.submit(self.comprehensive_analysis, url, use_api, custom_detection): url 
                for url in urls
            }

            # Process completed tasks
            for future in as_completed(future_to_url):
                url = future_to_url[future]
                try:
                    result = future.result()
                    tech_count = len(result['combined_technologies'])
                    risk_level = result['risk_assessment']['risk_level']
                    print(f"✓ {url}: {tech_count} technologies, risk: {risk_level}")
                except Exception as e:
                    print(f"✗ Error analyzing {url}: {e}")

        return self.results

    def generate_report(self, output_file='wappalyzer_analysis_report.json'):
        """Generate comprehensive analysis report"""

        # Calculate statistics
        total_urls = len(self.results)
        successful_analyses = sum(1 for r in self.results if r['wappalyzer_cli']['success'])

        # Technology statistics
        all_technologies = {}
        all_categories = {}
        risk_levels = {'low': 0, 'medium': 0, 'high': 0}

        for result in self.results:
            # Count technologies
            for tech in result['combined_technologies']:
                tech_name = tech['name']
                if tech_name not in all_technologies:
                    all_technologies[tech_name] = 0
                all_technologies[tech_name] += 1

            # Count categories
            for category, techs in result['technology_categories'].items():
                if category not in all_categories:
                    all_categories[category] = 0
                all_categories[category] += len(techs)

            # Count risk levels
            risk_level = result['risk_assessment']['risk_level']
            if risk_level in risk_levels:
                risk_levels[risk_level] += 1

        # Sort by popularity
        popular_technologies = sorted(all_technologies.items(), key=lambda x: x[1], reverse=True)[:20]
        popular_categories = sorted(all_categories.items(), key=lambda x: x[1], reverse=True)[:10]

        report = {
            'scan_summary': {
                'total_urls': total_urls,
                'successful_analyses': successful_analyses,
                'success_rate': (successful_analyses / total_urls * 100) if total_urls > 0 else 0,
                'scan_date': time.strftime('%Y-%m-%d %H:%M:%S')
            },
            'technology_statistics': {
                'total_unique_technologies': len(all_technologies),
                'popular_technologies': popular_technologies,
                'popular_categories': popular_categories
            },
            'risk_assessment': {
                'risk_distribution': risk_levels,
                'high_risk_urls': [
                    r['url'] for r in self.results 
                    if r['risk_assessment']['risk_level'] == 'high'
                ]
            },
            'detailed_results': self.results
        }

        # Save report
        with open(output_file, 'w') as f:
            json.dump(report, f, indent=2)

        print(f"\nWappalyzer Analysis Report:")
        print(f"Total URLs analyzed: {total_urls}")
        print(f"Successful analyses: {successful_analyses}")
        print(f"Success rate: {report['scan_summary']['success_rate']:.1f}%")
        print(f"Unique technologies found: {len(all_technologies)}")
        print(f"High risk URLs: {risk_levels['high']}")
        print(f"Report saved to: {output_file}")

        return report

# Usage example
if __name__ == "__main__":
    # Create analyzer instance
    analyzer = WappalyzerAnalyzer(
        api_key=os.getenv('WAPPALYZER_API_KEY'),
        max_workers=5
    )

    # URLs to analyze
    urls = [
        'https://example.com',
        'https://test.com',
        'https://demo.com'
    ]

    # Perform bulk analysis
    results = analyzer.bulk_analysis(
        urls,
        use_api=False,  # Set to True if you have API key
        custom_detection=True
    )

    # Generate report
    report = analyzer.generate_report('comprehensive_wappalyzer_report.json')

Automatización e integración

CI/CD Integration

#!/bin/bash
# CI/CD script for technology stack analysis

set -e

TARGET_URL="$1"
OUTPUT_DIR="$2"
BASELINE_FILE="$3"

| if [ -z "$TARGET_URL" ] |  | [ -z "$OUTPUT_DIR" ]; then |
    echo "Usage: $0 <target_url> <output_dir> [baseline_file]"
    exit 1
fi

echo "Starting technology stack analysis..."
echo "Target: $TARGET_URL"
echo "Output directory: $OUTPUT_DIR"

mkdir -p "$OUTPUT_DIR"

# Run Wappalyzer analysis
echo "Running Wappalyzer analysis..."
wappalyzer "$TARGET_URL" \
    --timeout 30000 \
    --follow-redirect \
    --max-pages 10 \
    --output "$OUTPUT_DIR/wappalyzer_results.json" \
    --pretty

# Parse results
| TECH_COUNT=$(jq '. | length' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null |  | echo "0") |
echo "Found $TECH_COUNT technologies"

# Extract technology categories
| jq -r '.[].categories[]' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null | sort | uniq > "$OUTPUT_DIR/categories.txt" |  | touch "$OUTPUT_DIR/categories.txt" |
CATEGORY_COUNT=$(wc -l < "$OUTPUT_DIR/categories.txt")

# Extract security-related technologies
| jq -r '.[] | select(.categories[] | contains("Security") or contains("SSL") or contains("Certificate")) | .name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null > "$OUTPUT_DIR/security_technologies.txt" |  | touch "$OUTPUT_DIR/security_technologies.txt" |
SECURITY_COUNT=$(wc -l < "$OUTPUT_DIR/security_technologies.txt")

# Check for development/debug technologies
| jq -r '.[] | select(.categories[] | contains("Development") or contains("Debug") or contains("Testing")) | .name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null > "$OUTPUT_DIR/dev_technologies.txt" |  | touch "$OUTPUT_DIR/dev_technologies.txt" |
DEV_COUNT=$(wc -l < "$OUTPUT_DIR/dev_technologies.txt")

# Generate summary report
cat > "$OUTPUT_DIR/technology-summary.txt" << EOF
Technology Stack Analysis Summary
================================
Date: $(date)
Target: $TARGET_URL
Total Technologies: $TECH_COUNT
Categories: $CATEGORY_COUNT
Security Technologies: $SECURITY_COUNT
Development Technologies: $DEV_COUNT

Status: $(if [ "$DEV_COUNT" -gt "0" ]; then echo "WARNING - Development tools detected"; else echo "OK"; fi)
EOF

# Compare with baseline if provided
if [ -n "$BASELINE_FILE" ] && [ -f "$BASELINE_FILE" ]; then
    echo "Comparing with baseline..."

    # Extract current technology names
| jq -r '.[].name' "$OUTPUT_DIR/wappalyzer_results.json" 2>/dev/null | sort > "$OUTPUT_DIR/current_technologies.txt" |  | touch "$OUTPUT_DIR/current_technologies.txt" |

    # Extract baseline technology names
| jq -r '.[].name' "$BASELINE_FILE" 2>/dev/null | sort > "$OUTPUT_DIR/baseline_technologies.txt" |  | touch "$OUTPUT_DIR/baseline_technologies.txt" |

    # Find differences
    comm -23 "$OUTPUT_DIR/current_technologies.txt" "$OUTPUT_DIR/baseline_technologies.txt" > "$OUTPUT_DIR/new_technologies.txt"
    comm -13 "$OUTPUT_DIR/current_technologies.txt" "$OUTPUT_DIR/baseline_technologies.txt" > "$OUTPUT_DIR/removed_technologies.txt"

    NEW_COUNT=$(wc -l < "$OUTPUT_DIR/new_technologies.txt")
    REMOVED_COUNT=$(wc -l < "$OUTPUT_DIR/removed_technologies.txt")

    echo "New technologies: $NEW_COUNT"
    echo "Removed technologies: $REMOVED_COUNT"

    # Add to summary
    cat >> "$OUTPUT_DIR/technology-summary.txt" << EOF

Baseline Comparison:
New Technologies: $NEW_COUNT
Removed Technologies: $REMOVED_COUNT
EOF

    if [ "$NEW_COUNT" -gt "0" ]; then
        echo "New technologies detected:"
        cat "$OUTPUT_DIR/new_technologies.txt"
    fi
fi

# Generate detailed report
python3 << 'PYTHON_EOF'
import sys
import json
from datetime import datetime

output_dir = sys.argv[1]
target_url = sys.argv[2]

# Read Wappalyzer results
try:
    with open(f"{output_dir}/wappalyzer_results.json", 'r') as f:
        technologies = json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
    technologies = []

# Categorize technologies
categories = {}
security_techs = []
dev_techs = []
outdated_techs = []

for tech in technologies:
    tech_name = tech.get('name', 'Unknown')
    tech_categories = tech.get('categories', [])
    tech_version = tech.get('version', '')

    # Categorize
    for category in tech_categories:
        if category not in categories:
            categories[category] = []
        categories[category].append(tech_name)

        # Security technologies
        if any(sec_keyword in category.lower() for sec_keyword in ['security', 'ssl', 'certificate', 'firewall']):
            security_techs.append(tech)

        # Development technologies
        if any(dev_keyword in category.lower() for dev_keyword in ['development', 'debug', 'testing']):
            dev_techs.append(tech)

    # Check for potentially outdated versions
    if tech_version and any(old_indicator in tech_version for old_indicator in ['1.', '2.', '3.', '4.', '5.']):
        outdated_techs.append(tech)

# Risk assessment
risk_factors = []
if len(dev_techs) > 0:
    risk_factors.append(f"Development tools detected: {', '.join([t['name'] for t in dev_techs])}")

if len(security_techs) == 0:
    risk_factors.append("No security technologies detected")

if len(outdated_techs) > 0:
    risk_factors.append(f"Potentially outdated technologies: {', '.join([t['name'] for t in outdated_techs])}")

risk_level = 'low' if len(risk_factors) == 0 else 'medium' if len(risk_factors) <= 2 else 'high'

# Create detailed report
report = {
    'scan_info': {
        'target': target_url,
        'scan_date': datetime.now().isoformat(),
        'technology_count': len(technologies),
        'category_count': len(categories)
    },
    'technologies': technologies,
    'categories': categories,
    'security_assessment': {
        'security_technologies': security_techs,
        'development_technologies': dev_techs,
        'outdated_technologies': outdated_techs,
        'risk_level': risk_level,
        'risk_factors': risk_factors
    }
}

# Save detailed report
with open(f"{output_dir}/wappalyzer-detailed-report.json", 'w') as f:
    json.dump(report, f, indent=2)

# Generate HTML report
html_content = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Technology Stack Analysis Report</title>
    <style>
        body {{ font-family: Arial, sans-serif; margin: 20px; }}
        .header {{ background-color: #f0f0f0; padding: 20px; border-radius: 5px; }}
        .category {{ margin: 10px 0; padding: 15px; border-left: 4px solid #007bff; background-color: #f8f9fa; }}
        .risk-high {{ border-left-color: #dc3545; }}
        .risk-medium {{ border-left-color: #ffc107; }}
        .risk-low {{ border-left-color: #28a745; }}
        table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
        th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
        th {{ background-color: #f2f2f2; }}
    </style>
</head>
<body>
    <div class="header">
        <h1>Technology Stack Analysis Report</h1>
        <p><strong>Target:</strong> {target_url}</p>
        <p><strong>Scan Date:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
        <p><strong>Technologies Found:</strong> {len(technologies)}</p>
        <p><strong>Risk Level:</strong> <span class="risk-{risk_level}">{risk_level.upper()}</span></p>
    </div>

    <h2>Technology Categories</h2>
"""

for category, techs in categories.items():
    html_content += f"""
    <div class="category">
        <h3>{category}</h3>
        <p>{', '.join(techs)}</p>
    </div>
    """

html_content += """
    <h2>Detailed Technologies</h2>
    <table>
        <tr><th>Name</th><th>Version</th><th>Categories</th><th>Confidence</th></tr>
"""

for tech in technologies:
    html_content += f"""
    <tr>
        <td>{tech.get('name', 'Unknown')}</td>
        <td>{tech.get('version', 'N/A')}</td>
        <td>{', '.join(tech.get('categories', []))}</td>
        <td>{tech.get('confidence', 'N/A')}%</td>
    </tr>
    """

html_content += """
    </table>

    <h2>Risk Assessment</h2>
"""

if risk_factors:
    html_content += "<ul>"
    for factor in risk_factors:
        html_content += f"<li>{factor}</li>"
    html_content += "</ul>"
else:
    html_content += "<p>No significant risk factors identified.</p>"

html_content += """
</body>
</html>
"""

with open(f"{output_dir}/wappalyzer-report.html", 'w') as f:
    f.write(html_content)

print(f"Detailed reports generated:")
print(f"- JSON: {output_dir}/wappalyzer-detailed-report.json")
print(f"- HTML: {output_dir}/wappalyzer-report.html")
PYTHON_EOF

# Check for development technologies and exit
if [ "$DEV_COUNT" -gt "0" ]; then
    echo "WARNING: Development technologies detected in production environment"
    echo "Development technologies found:"
    cat "$OUTPUT_DIR/dev_technologies.txt"
    exit 1
else
    echo "SUCCESS: No development technologies detected"
    exit 0
fi

GitHub Actions Integration

# .github/workflows/wappalyzer-tech-analysis.yml
name: Technology Stack Analysis

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]
  schedule:
    - cron: '0 6 * * 1'  # Weekly scan on Mondays at 6 AM

jobs:
  technology-analysis:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Setup Node.js
      uses: actions/setup-node@v3
      with:
        node-version: '16'

    - name: Install Wappalyzer
      run: |
        npm install -g wappalyzer
        wappalyzer --version

    - name: Analyze production environment
      run: |
        mkdir -p analysis-results

        # Analyze main application
        wappalyzer ${{ vars.PRODUCTION_URL }} \
          --timeout 30000 \
          --follow-redirect \
          --max-pages 10 \
          --output analysis-results/production-tech.json \
          --pretty

        # Analyze staging environment
        wappalyzer ${{ vars.STAGING_URL }} \
          --timeout 30000 \
          --follow-redirect \
          --max-pages 5 \
          --output analysis-results/staging-tech.json \
          --pretty

        # Count technologies
| PROD_TECH_COUNT=$(jq '. | length' analysis-results/production-tech.json 2>/dev/null |  | echo "0") |
| STAGING_TECH_COUNT=$(jq '. | length' analysis-results/staging-tech.json 2>/dev/null |  | echo "0") |

        echo "PROD_TECH_COUNT=$PROD_TECH_COUNT" >> $GITHUB_ENV
        echo "STAGING_TECH_COUNT=$STAGING_TECH_COUNT" >> $GITHUB_ENV

    - name: Check for development technologies
      run: |
        # Check for development/debug technologies in production
| DEV_TECHS=$(jq -r '.[] | select(.categories[] | contains("Development") or contains("Debug") or contains("Testing")) | .name' analysis-results/production-tech.json 2>/dev/null |  | echo "") |

        if [ -n "$DEV_TECHS" ]; then
          echo "DEV_TECHS_FOUND=true" >> $GITHUB_ENV
          echo "Development technologies found in production: "
          echo "$DEV_TECHS"
          echo "$DEV_TECHS" > analysis-results/dev-technologies.txt
        else
          echo "DEV_TECHS_FOUND=false" >> $GITHUB_ENV
          touch analysis-results/dev-technologies.txt
        fi

    - name: Security technology assessment
      run: |
        # Check for security technologies
| SECURITY_TECHS=$(jq -r '.[] | select(.categories[] | contains("Security") or contains("SSL") or contains("Certificate")) | .name' analysis-results/production-tech.json 2>/dev/null |  | echo "") |

        if [ -n "$SECURITY_TECHS" ]; then
          echo "Security technologies found: "
          echo "$SECURITY_TECHS"
          echo "$SECURITY_TECHS" > analysis-results/security-technologies.txt
        else
          echo "No security technologies detected"
          touch analysis-results/security-technologies.txt
        fi

        SECURITY_COUNT=$(echo "$SECURITY_TECHS" | wc -l)
        echo "SECURITY_COUNT=$SECURITY_COUNT" >> $GITHUB_ENV

    - name: Generate comparison report
      run: |
        # Compare production and staging
| jq -r '.[].name' analysis-results/production-tech.json 2>/dev/null | sort > analysis-results/prod-techs.txt |  | touch analysis-results/prod-techs.txt |
| jq -r '.[].name' analysis-results/staging-tech.json 2>/dev/null | sort > analysis-results/staging-techs.txt |  | touch analysis-results/staging-techs.txt |

        # Find differences
        comm -23 analysis-results/prod-techs.txt analysis-results/staging-techs.txt > analysis-results/prod-only.txt
        comm -13 analysis-results/prod-techs.txt analysis-results/staging-techs.txt > analysis-results/staging-only.txt

        # Generate summary
        cat > analysis-results/summary.txt << EOF
        Technology Stack Analysis Summary
        ================================
        Production Technologies: $PROD_TECH_COUNT
        Staging Technologies: $STAGING_TECH_COUNT
        Security Technologies: $SECURITY_COUNT
        Development Technologies in Production: $(if [ "$DEV_TECHS_FOUND" = "true" ]; then echo "YES (CRITICAL)"; else echo "NO"; fi)

        Production-only Technologies: $(wc -l < analysis-results/prod-only.txt)
        Staging-only Technologies: $(wc -l < analysis-results/staging-only.txt)
        EOF

    - name: Upload analysis results
      uses: actions/upload-artifact@v3
      with:
        name: technology-analysis-results
        path: analysis-results/

    - name: Comment PR with results
      if: github.event_name == 'pull_request'
      uses: actions/github-script@v6
      with:
        script: |
          const fs = require('fs');
          const summary = fs.readFileSync('analysis-results/summary.txt', 'utf8');

          github.rest.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: `## Technology Stack Analysis\n\n\`\`\`\n${summary}\n\`\`\``
          });

    - name: Fail if development technologies found
      run: |
        if [ "$DEV_TECHS_FOUND" = "true" ]; then
          echo "CRITICAL: Development technologies detected in production!"
          cat analysis-results/dev-technologies.txt
          exit 1
        fi

Optimización del rendimiento y solución de problemas

Performance Tuning

# Optimize Wappalyzer for different scenarios

# Fast analysis with minimal pages
wappalyzer https://example.com --max-pages 1 --timeout 10000

# Thorough analysis with deep crawling
wappalyzer https://example.com --max-pages 20 --max-depth 5 --timeout 60000

# Bulk analysis with rate limiting
for url in $(cat urls.txt); do
    wappalyzer "$url" --output "results_$(echo $url | sed 's/[^a-zA-Z0-9]/_/g').json"
    sleep 2
done

# Memory-efficient analysis for large sites
wappalyzer https://example.com --max-pages 5 --no-scripts --timeout 30000

# Performance monitoring script
#!/bin/bash
monitor_wappalyzer_performance() {
    local url="$1"
    local output_file="wappalyzer-performance-$(date +%s).log"

    echo "Monitoring Wappalyzer performance for: $url"

    # Start monitoring
    {
        echo "Timestamp,CPU%,Memory(MB),Status"
        while true; do
            if pgrep -f "wappalyzer" > /dev/null; then
                local cpu=$(ps -p $(pgrep -f "wappalyzer") -o %cpu --no-headers)
                local mem=$(ps -p $(pgrep -f "wappalyzer") -o rss --no-headers | awk '{print $1/1024}')
                echo "$(date +%s),$cpu,$mem,running"
            fi
            sleep 2
        done
    } > "$output_file" &

    local monitor_pid=$!

    # Run Wappalyzer
    time wappalyzer "$url" --pretty --output "performance_test_results.json"

    # Stop monitoring
    kill $monitor_pid 2>/dev/null

    echo "Performance monitoring completed: $output_file"
}

# Usage
monitor_wappalyzer_performance "https://example.com"

Problemas comunes

# Troubleshooting script for Wappalyzer
troubleshoot_wappalyzer() {
    echo "Wappalyzer Troubleshooting Guide"
    echo "==============================="

    # Check if Wappalyzer is installed
    if ! command -v wappalyzer &> /dev/null; then
        echo "❌ Wappalyzer not found in PATH"
        echo "Solution: Install Wappalyzer using 'npm install -g wappalyzer'"
        return 1
    fi

    echo "✅ Wappalyzer found: $(which wappalyzer)"
    echo "Version: $(wappalyzer --version 2>&1)"

    # Check Node.js version
    if ! command -v node &> /dev/null; then
        echo "❌ Node.js not found"
        echo "Solution: Install Node.js from https://nodejs.org/"
        return 1
    fi

    local node_version=$(node --version)
    echo "✅ Node.js version: $node_version"

    # Check network connectivity
    if ! curl -s --connect-timeout 5 https://httpbin.org/get > /dev/null; then
        echo "❌ Network connectivity issues"
        echo "Solution: Check internet connection and proxy settings"
        return 1
    fi

    echo "✅ Network connectivity OK"

    # Test basic functionality
    echo "Testing basic Wappalyzer functionality..."

    if timeout 60 wappalyzer https://httpbin.org/get --timeout 30000 > /dev/null 2>&1; then
        echo "✅ Basic functionality test passed"
    else
        echo "❌ Basic functionality test failed"
        echo "Solution: Check Wappalyzer installation and network settings"
        return 1
    fi

    # Check for common configuration issues
    echo "Checking for common configuration issues..."

    # Check npm permissions
    if [ ! -w "$(npm config get prefix)/lib/node_modules" ] 2>/dev/null; then
        echo "⚠️  npm permission issues detected"
        echo "Solution: Fix npm permissions or use nvm"
    fi

    # Check for proxy issues
| if [ -n "$HTTP_PROXY" ] |  | [ -n "$HTTPS_PROXY" ]; then |
        echo "⚠️  Proxy environment variables detected"
        echo "Note: Wappalyzer should respect proxy settings automatically"
    fi

    echo "Troubleshooting completed"
}

# Common error solutions
fix_common_wappalyzer_errors() {
    echo "Common Wappalyzer Error Solutions"
    echo "================================"

    echo "1. 'command not found: wappalyzer'"
    echo "   Solution: npm install -g wappalyzer"
    echo "   Alternative: npx wappalyzer <url>"
    echo ""

    echo "2. 'EACCES: permission denied'"
    echo "   Solution: Fix npm permissions or use sudo"
    echo "   Better: Use nvm to manage Node.js versions"
    echo ""

    echo "3. 'timeout' or 'ETIMEDOUT'"
    echo "   Solution: Increase timeout with --timeout option"
    echo "   Example: wappalyzer <url> --timeout 60000"
    echo ""

    echo "4. 'SSL certificate error'"
    echo "   Solution: Use --no-ssl-verify (not recommended for production)"
    echo ""

    echo "5. 'Too many redirects'"
    echo "   Solution: Use --follow-redirect or check URL manually"
    echo ""

    echo "6. 'No technologies detected' (false negatives)"
    echo "   Solution: Increase --max-pages and --max-depth"
    echo "   Example: wappalyzer <url> --max-pages 10 --max-depth 3"
    echo ""

    echo "7. 'Out of memory' for large sites"
    echo "   Solution: Reduce --max-pages or use --no-scripts"
    echo "   Example: wappalyzer <url> --max-pages 5 --no-scripts"
}

# Run troubleshooting
troubleshoot_wappalyzer
fix_common_wappalyzer_errors

Recursos y documentación

Recursos oficiales

Recursos comunitarios

Ejemplos de integración