Parsero
Parsero is a specialized tool for parsing and analyzing robots.txt files from web applications. It extracts information about hidden paths, disallowed directories, and restricted endpoints that website administrators intended to hide from search engines, revealing potential attack surface during security assessments.
Installation
섹션 제목: “Installation”Linux Installation
섹션 제목: “Linux Installation”# Clone the repository
git clone https://github.com/behindthefirewalls/Parsero.git
cd Parsero
# Install dependencies
pip3 install -r requirements.txt
# Verify installation
python3 parsero.py --help
macOS Installation
섹션 제목: “macOS Installation”# Using Homebrew
brew tap behindthefirewalls/parsero
brew install parsero
# Or via pip3
pip3 install parsero
# Verify installation
parsero --help
Installation from PyPI
섹션 제목: “Installation from PyPI”# Install via pip3
pip3 install parsero
# Update to latest version
pip3 install --upgrade parsero
# Verify installation
python3 -m parsero --help
Manual Installation
섹션 제목: “Manual Installation”# Install Python 3.6+
python3 --version
# Install required packages
pip3 install requests
pip3 install urllib3
# Download and setup
git clone https://github.com/behindthefirewalls/Parsero.git
cd Parsero
chmod +x parsero.py
Core Concepts
섹션 제목: “Core Concepts”robots.txt Structure
섹션 제목: “robots.txt Structure”User-agent: Googlebot
Disallow: /admin
Disallow: /private
Allow: /public
User-agent: *
Disallow: /tmp
Crawl-delay: 5
User-agent: BadBot
Disallow: /
Information Types
섹션 제목: “Information Types”- Disallowed paths: Directories forbidden to search engines
- Allowed paths: Explicitly allowed despite parent restrictions
- User-agent rules: Specific bot directives
- Crawl delays: Rate-limiting hints
- Sitemaps: Reference to site structure files
Security Implications
섹션 제목: “Security Implications”- Reveals structure of sensitive areas
- Indicates hidden admin panels
- Shows private directories
- May expose test/staging environments
- Hints at API endpoints
- Reveals backup locations
Basic Usage
섹션 제목: “Basic Usage”Parse Single URL
섹션 제목: “Parse Single URL”# Parse robots.txt from website
python3 parsero.py -u http://example.com
# Verbose output
python3 parsero.py -u http://example.com -v
# Save results to file
python3 parsero.py -u http://example.com -o results.txt
Specify Different Port
섹션 제목: “Specify Different Port”# Non-standard HTTP port
python3 parsero.py -u http://example.com:8080
# HTTPS with custom port
python3 parsero.py -u https://example.com:8443
# Localhost testing
python3 parsero.py -u http://localhost:5000
Batch URL Processing
섹션 제목: “Batch URL Processing”# Parse multiple URLs from file
python3 parsero.py -u http://example1.com http://example2.com http://example3.com
# Read URLs from file
python3 parsero.py -f urls.txt
# Output to directory
python3 parsero.py -f urls.txt -o results_dir/
Output Formats
섹션 제목: “Output Formats”Standard Output
섹션 제목: “Standard Output”# Display results in terminal
python3 parsero.py -u http://example.com
# Verbose mode (detailed output)
python3 parsero.py -u http://example.com -v
# Very verbose
python3 parsero.py -u http://example.com -vv
File Output
섹션 제목: “File Output”# Save to text file
python3 parsero.py -u http://example.com -o robots_output.txt
# Append to existing file
python3 parsero.py -u http://example.com -o results.txt -a
# Output to specific directory
python3 parsero.py -u http://example.com -o /tmp/parsero_results/
CSV Export
섹션 제목: “CSV Export”# Export as CSV
python3 parsero.py -u http://example.com -f csv -o results.csv
# Multiple URLs to CSV
python3 parsero.py -f urls.txt -f csv -o all_results.csv
JSON Output
섹션 제목: “JSON Output”# Export as JSON
python3 parsero.py -u http://example.com -f json -o results.json
# Pretty JSON formatting
python3 parsero.py -u http://example.com -f json -p
Advanced Scanning Options
섹션 제목: “Advanced Scanning Options”Bypass robots.txt Restrictions
섹션 제목: “Bypass robots.txt Restrictions”# Download actual restricted files (for authorized testing)
python3 parsero.py -u http://example.com -b
# Aggressive scanning
python3 parsero.py -u http://example.com -a
# Deep crawl with discovered paths
python3 parsero.py -u http://example.com -d
Timeout and Retry Configuration
섹션 제목: “Timeout and Retry Configuration”# Set connection timeout
python3 parsero.py -u http://example.com -t 30
# Retry failed connections
python3 parsero.py -u http://example.com -r 3
# Adjust crawl delay
python3 parsero.py -u http://example.com --delay 2
Proxy Configuration
섹션 제목: “Proxy Configuration”# Use HTTP proxy
python3 parsero.py -u http://example.com --proxy http://proxy.example.com:8080
# Use HTTPS proxy
python3 parsero.py -u http://example.com --proxy https://proxy.example.com:8443
# Proxy with authentication
python3 parsero.py -u http://example.com --proxy http://user:pass@proxy.com:8080
Custom User-Agent
섹션 제목: “Custom User-Agent”# Specify custom user-agent
python3 parsero.py -u http://example.com --user-agent "CustomBot/1.0"
# Impersonate specific bot
python3 parsero.py -u http://example.com --user-agent "Googlebot/2.1"
# Use custom headers file
python3 parsero.py -u http://example.com -H headers.txt
Path Discovery
섹션 제목: “Path Discovery”Extract and List Paths
섹션 제목: “Extract and List Paths”# Extract all disallowed paths
python3 parsero.py -u http://example.com -o paths.txt
# List unique paths only
python3 parsero.py -u http://example.com | sort | uniq
# Count total paths found
python3 parsero.py -u http://example.com | wc -l
Analyze Path Patterns
섹션 제목: “Analyze Path Patterns”# Find paths containing keyword
python3 parsero.py -u http://example.com | grep admin
# Find all API endpoints
python3 parsero.py -u http://example.com | grep /api
# Identify sensitive paths
python3 parsero.py -u http://example.com | grep -E "(admin|private|backup|tmp)"
Filter Results
섹션 제목: “Filter Results”# Show only directories
python3 parsero.py -u http://example.com | grep '/$'
# Show only files
python3 parsero.py -u http://example.com | grep '\.'
# Exclude certain paths
python3 parsero.py -u http://example.com | grep -v '/search'
Reconnaissance Workflow
섹션 제목: “Reconnaissance Workflow”Multi-Target Reconnaissance
섹션 제목: “Multi-Target Reconnaissance”# Parse robots.txt from multiple sites
cat targets.txt | while read target; do
python3 parsero.py -u "$target" -o "results_${target##*/}.txt"
done
# Combine all results
cat results_*.txt > all_discovered_paths.txt
Competitive Intelligence
섹션 제목: “Competitive Intelligence”# Analyze competitor sites
python3 parsero.py -u http://competitor1.com -o competitor1.txt
python3 parsero.py -u http://competitor2.com -o competitor2.txt
# Compare discovered structures
diff competitor1.txt competitor2.txt
API Endpoint Discovery
섹션 제목: “API Endpoint Discovery”# Parse robots.txt looking for APIs
python3 parsero.py -u http://api.example.com
# Filter for API paths
python3 parsero.py -u http://example.com | grep -E "/api|/v1|/v2|/rest"
# Extract endpoint patterns
python3 parsero.py -u http://example.com | grep -oP '/api/[^?]*' | sort | uniq
Subdomain Reconnaissance
섹션 제목: “Subdomain Reconnaissance”# Check robots.txt on multiple subdomains
for sub in www api staging dev admin; do
echo "=== $sub.example.com ==="
python3 parsero.py -u http://$sub.example.com 2>/dev/null
done
# Save results
for sub in www api staging dev; do
python3 parsero.py -u http://$sub.example.com -o "$sub.txt"
done
Vulnerability Mapping
섹션 제목: “Vulnerability Mapping”Identify Information Disclosure
섹션 제목: “Identify Information Disclosure”# Find sensitive directories
python3 parsero.py -u http://example.com | grep -E "(backup|logs|config|private)"
# Identify admin panels
python3 parsero.py -u http://example.com | grep -i admin
# Find test/debug endpoints
python3 parsero.py -u http://example.com | grep -E "(test|debug|dev|staging)"
API Surface Mapping
섹션 제목: “API Surface Mapping”# Discover API structure
python3 parsero.py -u http://api.example.com -v
# Map API versions
python3 parsero.py -u http://example.com | grep -oP '/api/v[0-9]+'
# Identify deprecated APIs
python3 parsero.py -u http://example.com | grep -E "(legacy|deprecated|v1)"
Authentication Points
섹션 제목: “Authentication Points”# Find login/auth paths
python3 parsero.py -u http://example.com | grep -E "(login|auth|signin|register)"
# Identify account management
python3 parsero.py -u http://example.com | grep -E "(profile|account|user)"
# Find admin interfaces
python3 parsero.py -u http://example.com | grep -E "(admin|panel|dashboard)"
Integration with Other Tools
섹션 제목: “Integration with Other Tools”Combine with Directory Bruting
섹션 제목: “Combine with Directory Bruting”# Use Parsero output for targeted bruting
python3 parsero.py -u http://example.com -o discovered.txt
# Verify discoveries with dirbuster
dirbuster -u http://example.com -l discovered.txt -r report.html
Feed into Web Crawling
섹션 제목: “Feed into Web Crawling”# Parse robots.txt and feed to crawler
python3 parsero.py -u http://example.com -o urls.txt
# Crawl discovered URLs
wget -i urls.txt --no-parent
# Or with curl
cat urls.txt | xargs -I {} curl -I {}
Correlate with Scan Results
섹션 제목: “Correlate with Scan Results”# Compare robots.txt with actual structure
python3 parsero.py -u http://example.com -o from_robots.txt
# Use nmap/nikto to verify access
nikto -h http://example.com -o nikto_results.txt
# Cross-reference findings
comm -12 <(sort from_robots.txt) <(sort nikto_results.txt)
Batch Operations
섹션 제목: “Batch Operations”Process Multiple Sites
섹션 제목: “Process Multiple Sites”#!/bin/bash
# Batch processing script
for url in $(cat sites.txt); do
echo "Processing: $url"
python3 parsero.py -u "$url" \
-o "results/$(echo $url | cut -d'/' -f3).txt" \
-v
done
Aggregate Results
섹션 제목: “Aggregate Results”# Combine results from multiple sites
python3 parsero.py -f urls.txt -o combined.txt
# Create summary statistics
echo "Total unique paths found:"
cat combined.txt | sort | uniq | wc -l
# Find most common path patterns
cat combined.txt | grep -oP '^/[^/]+' | sort | uniq -c | sort -rn
Automated Reporting
섹션 제목: “Automated Reporting”# Create comprehensive report
python3 parsero.py -u http://example.com \
-o report.txt \
-f json \
-v
# Generate formatted output
echo "=== robots.txt Analysis ===" > full_report.txt
echo "Target: example.com" >> full_report.txt
echo "Date: $(date)" >> full_report.txt
cat report.txt >> full_report.txt
Evasion and Stealth
섹션 제목: “Evasion and Stealth”Rate Limiting
섹션 제목: “Rate Limiting”# Reduce detection likelihood
python3 parsero.py -u http://example.com --delay 5
# Multiple requests with delays
for url in $(cat urls.txt); do
python3 parsero.py -u "$url" --delay 10
sleep 30
done
User-Agent Rotation
섹션 제목: “User-Agent Rotation”# Use different user-agents
python3 parsero.py -u http://example.com --user-agent "Googlebot/2.1"
python3 parsero.py -u http://example.com --user-agent "Mozilla/5.0"
python3 parsero.py -u http://example.com --user-agent "bingbot/2.0"
Obfuscation
섹션 제목: “Obfuscation”# Use proxy to mask origin
python3 parsero.py -u http://example.com \
--proxy http://proxy.example.com:8080 \
--delay 5 \
--user-agent "Googlebot"
Data Analysis
섹션 제목: “Data Analysis”Path Frequency Analysis
섹션 제목: “Path Frequency Analysis”# Find most restricted paths
python3 parsero.py -f urls.txt -o results.txt
# Analyze frequency
cat results.txt | sort | uniq -c | sort -rn
# Export statistics
python3 << 'EOF'
import re
from collections import Counter
with open('results.txt', 'r') as f:
paths = f.readlines()
# Extract path components
components = []
for path in paths:
parts = path.strip().split('/')
components.extend([p for p in parts if p])
counter = Counter(components)
for comp, count in counter.most_common(20):
print(f"{comp}: {count}")
EOF
Structural Mapping
섹션 제목: “Structural Mapping”# Build hierarchy of paths
python3 parsero.py -u http://example.com -v | \
sort | \
sed 's|/[^/]*$||' | \
sort | uniq -c | sort -rn
Troubleshooting
섹션 제목: “Troubleshooting”Connection Issues
섹션 제목: “Connection Issues”# Test basic connectivity
python3 parsero.py -u http://example.com -t 60
# Verify robots.txt exists
curl -I http://example.com/robots.txt
# Check with specific user-agent
python3 parsero.py -u http://example.com --user-agent "Mozilla/5.0" -v
Empty Results
섹션 제목: “Empty Results”# Verify site has robots.txt
wget -q -O - http://example.com/robots.txt
# Check if site blocks parsing
python3 parsero.py -u http://example.com -vv
# Verify URL format
python3 parsero.py -u http://example.com:80 # Explicit port
Proxy Issues
섹션 제목: “Proxy Issues”# Test proxy connectivity
python3 parsero.py -u http://example.com --proxy http://127.0.0.1:8080 -vv
# Verify credentials
python3 parsero.py -u http://example.com \
--proxy http://user:password@proxy:8080
Best Practices
섹션 제목: “Best Practices”Authorized Testing
섹션 제목: “Authorized Testing”- Obtain written authorization before analysis
- Respect robots.txt directives in production
- Document all discovered information
- Follow responsible disclosure practices
- Maintain ethical standards
Effective Analysis
섹션 제목: “Effective Analysis”# Comprehensive reconnaissance workflow
python3 parsero.py -u http://target.example.com \
-o robots_analysis.txt \
-f json \
-f csv \
-v
# Create summary
echo "=== Robots.txt Analysis Summary ===" > summary.txt
echo "Total entries: $(wc -l < robots_analysis.txt)" >> summary.txt
echo "Unique paths: $(sort robots_analysis.txt | uniq | wc -l)" >> summary.txt
Legal and Ethical Considerations
섹션 제목: “Legal and Ethical Considerations”Parsero should be used:
- For authorized security assessments
- With written permission from site owners
- To improve understanding of web security
- In compliance with applicable laws
- Respecting privacy and confidentiality
Never:
- Access restricted paths without authorization
- Download sensitive files from robots.txt
- Share discovered information publicly
- Attempt to access restricted areas
- Violate applicable laws or regulations
Resources
섹션 제목: “Resources”- GitHub: https://github.com/behindthefirewalls/Parsero
- robots.txt specification: https://www.robotstxt.org/
- Web reconnaissance guides
- OWASP reconnaissance methodology
- Responsible disclosure practices