hakrawler
hakrawler is a fast, lightweight web crawler written in Go that discovers endpoints, URLs, assets, and JavaScript files within web applications. Designed for bug bounty hunters and penetration testers, it excels at quick reconnaissance and endpoint discovery during the initial phases of security assessments.
Installation
Abschnitt betitelt „Installation“Prerequisites
Abschnitt betitelt „Prerequisites“- Go 1.13 or later installed on your system
Install via Go
Abschnitt betitelt „Install via Go“# Install hakrawler directly
go install github.com/hakluke/hakrawler@latest
# Verify installation
hakrawler -h
# Check version
hakrawler -version
Manual Build from Source
Abschnitt betitelt „Manual Build from Source“# Clone the repository
git clone https://github.com/hakluke/hakrawler.git
cd hakrawler
# Build the binary
go build -o hakrawler main.go
# Run locally
./hakrawler -h
Basic Usage
Abschnitt betitelt „Basic Usage“Simple Crawl
Abschnitt betitelt „Simple Crawl“# Basic crawl of a single domain
echo "example.com" | hakrawler
# Crawl with http:// or https://
echo "https://example.com" | hakrawler
# Crawl multiple domains
echo -e "example.com\ntest.com" | hakrawler
# Crawl from a file
cat domains.txt | hakrawler
Output Format
Abschnitt betitelt „Output Format“# Default output (URLs, endpoints, assets)
hakrawler shows:
- Full URLs discovered
- JavaScript file paths
- Form endpoints
- API paths
# Pipe to see unique results
echo "example.com" | hakrawler | sort -u
# Count discovered endpoints
echo "example.com" | hakrawler | wc -l
Depth and Scope Control
Abschnitt betitelt „Depth and Scope Control“Control Crawl Depth
Abschnitt betitelt „Control Crawl Depth“# Default depth (2)
echo "example.com" | hakrawler
# Crawl only one level deep
echo "example.com" | hakrawler -depth 1
# Crawl deeper (3 levels)
echo "example.com" | hakrawler -depth 3
# Aggressive deep crawl (5 levels)
echo "example.com" | hakrawler -depth 5
Limit to Same Domain
Abschnitt betitelt „Limit to Same Domain“# Only crawl same domain (prevents scope creep)
echo "example.com" | hakrawler -sameorigin
# Crawl subdomains of example.com
echo "example.com" | hakrawler
Filter by Scope
Abschnitt betitelt „Filter by Scope“# Crawl with custom scope patterns
echo "example.com" | hakrawler -scope "example.com,cdn.example.com"
# Use regex patterns for scope (if supported by version)
echo "example.com" | hakrawler -sameorigin
JavaScript and Asset Discovery
Abschnitt betitelt „JavaScript and Asset Discovery“Extract from JavaScript Files
Abschnitt betitelt „Extract from JavaScript Files“# Parse JavaScript for URLs and endpoints
echo "example.com" | hakrawler
# hakrawler automatically extracts from:
# - Inline <script> tags
# - src attributes in <script> tags
# - .js file references
# Get only JavaScript endpoints
echo "example.com" | hakrawler | grep "\.js$"
JavaScript-Heavy Applications
Abschnitt betitelt „JavaScript-Heavy Applications“# Crawl JS-heavy SPA with longer timeouts
echo "example.com" | hakrawler -timeout 30
# For sites with heavy AJAX/API calls
echo "example.com" | hakrawler -depth 2 -timeout 30
# Extract API endpoints from JS
echo "example.com" | hakrawler | grep -E "(api|v[0-9]+)" | sort -u
Output Filtering and Plain Output
Abschnitt betitelt „Output Filtering and Plain Output“Plain Text Output
Abschnitt betitelt „Plain Text Output“# Default output is already plain text (one URL per line)
echo "example.com" | hakrawler
# Remove duplicates
echo "example.com" | hakrawler | sort -u
# Filter to specific patterns
echo "example.com" | hakrawler | grep -E "\.js$|api|admin"
# Get only HTTP endpoints (not assets)
echo "example.com" | hakrawler | grep -v -E "\.(css|jpg|png|gif|ico)$"
Combine and Deduplicate
Abschnitt betitelt „Combine and Deduplicate“# Crawl multiple domains and get unique URLs
cat domains.txt | hakrawler | sort -u > all_urls.txt
# Count unique endpoints per domain
for domain in $(cat domains.txt); do
echo "$domain: $(echo "$domain" | hakrawler | sort -u | wc -l)"
done
Proxy and Network Configuration
Abschnitt betitelt „Proxy and Network Configuration“Proxy Support
Abschnitt betitelt „Proxy Support“# Use HTTP proxy
echo "example.com" | hakrawler -proxy http://127.0.0.1:8080
# Use HTTPS proxy
echo "example.com" | hakrawler -proxy https://proxy.example.com:8080
# Proxy with authentication
echo "example.com" | hakrawler -proxy http://user:pass@proxy.example.com:8080
Custom Headers
Abschnitt betitelt „Custom Headers“# Add custom headers (version dependent)
# Note: hakrawler has limited header customization
# For advanced header control, use with other tools
# Crawl with User-Agent
echo "example.com" | hakrawler
Timeout and Rate Control
Abschnitt betitelt „Timeout and Rate Control“Request Timeouts
Abschnitt betitelt „Request Timeouts“# Default timeout (usually 10s)
echo "example.com" | hakrawler
# Increase timeout for slow servers
echo "example.com" | hakrawler -timeout 30
# Decrease timeout for quick responses
echo "example.com" | hakrawler -timeout 5
# Timeout in seconds (common in versions)
echo "example.com" | hakrawler -timeout 15
Performance Optimization
Abschnitt betitelt „Performance Optimization“# Reduce threads for lighter load
echo "example.com" | hakrawler
# Crawl multiple domains in parallel
cat domains.txt | parallel "echo {} | hakrawler"
# Rate limiting with xargs
cat domains.txt | xargs -I {} sh -c 'echo "Crawling: {}" && echo "{}" | hakrawler'
Integration with Recon Tools
Abschnitt betitelt „Integration with Recon Tools“With subfinder (Subdomain Discovery)
Abschnitt betitelt „With subfinder (Subdomain Discovery)“# Discover subdomains, then crawl each
subfinder -d example.com -silent | hakrawler
# Crawl multiple discovered subdomains
subfinder -d example.com -silent > subdomains.txt
cat subdomains.txt | hakrawler > all_urls.txt
With httpx (Probe Live Hosts)
Abschnitt betitelt „With httpx (Probe Live Hosts)“# Get only live hosts from crawled URLs
echo "example.com" | hakrawler | httpx -silent
# Check status codes on discovered endpoints
echo "example.com" | hakrawler | httpx -status-code
# Get full httpx details
echo "example.com" | hakrawler | httpx -title -status-code -content-length
With nuclei (Template Scanning)
Abschnitt betitelt „With nuclei (Template Scanning)“# Crawl, probe, then scan with templates
echo "example.com" | hakrawler | httpx -silent | nuclei -l - -t nuclei-templates/
# Scan endpoints for vulnerabilities
echo "example.com" | hakrawler | nuclei -l - -severity high
# Use with custom template directory
echo "example.com" | hakrawler | nuclei -l - -t /path/to/templates
Complete Recon Pipeline
Abschnitt betitelt „Complete Recon Pipeline“# Full recon workflow
TARGET="example.com"
# 1. Discover subdomains
subfinder -d $TARGET -silent > subdomains.txt
# 2. Crawl each subdomain for endpoints
cat subdomains.txt | hakrawler > all_endpoints.txt
# 3. Probe for live hosts
cat all_endpoints.txt | httpx -status-code -o live_endpoints.txt
# 4. Scan with nuclei templates
nuclei -l live_endpoints.txt -t nuclei-templates/ -o vulns.txt
# 5. Get unique high-value targets
grep -E "admin|api|login" live_endpoints.txt | sort -u
Common Recon Workflows
Abschnitt betitelt „Common Recon Workflows“Quick Endpoint Discovery
Abschnitt betitelt „Quick Endpoint Discovery“# Fast scan to identify key endpoints
echo "example.com" | hakrawler -depth 1 | sort -u
# Focus on common paths
echo "example.com" | hakrawler | grep -E "(admin|api|user|login|dashboard)"
Deep Asset Mapping
Abschnitt betitelt „Deep Asset Mapping“# Comprehensive asset inventory
echo "example.com" | hakrawler -depth 3 -timeout 20 | sort -u > assets.txt
# Extract by type
grep "\.js$" assets.txt > javascript.txt
grep "\.html$" assets.txt > pages.txt
grep -E "(api|v[0-9])" assets.txt > api_endpoints.txt
Subdomain Crawling
Abschnitt betitelt „Subdomain Crawling“# Get all subdomains from target
subfinder -d example.com -silent -all > all_subs.txt
# Crawl each subdomain
while read subdomain; do
echo "Crawling: $subdomain"
echo "$subdomain" | hakrawler -depth 2
done < all_subs.txt > all_discovered.txt
# Unique results
sort -u all_discovered.txt > unique_urls.txt
Parameter Discovery
Abschnitt betitelt „Parameter Discovery“# Find endpoints with parameters
echo "example.com" | hakrawler | grep "?"
# Extract unique parameters
echo "example.com" | hakrawler | grep "?" | sed 's/.*?//' | sort -u
# Analyze parameter patterns
echo "example.com" | hakrawler | grep "?" | grep -o "[a-z_]*=" | sort | uniq -c
JavaScript Endpoint Analysis
Abschnitt betitelt „JavaScript Endpoint Analysis“# Extract all JavaScript references
echo "example.com" | hakrawler | grep "\.js"
# Get unique JS files
echo "example.com" | hakrawler | grep "\.js" | sort -u
# For further JS analysis with other tools
echo "example.com" | hakrawler | grep "\.js" | sort -u > js_files.txt
# Analyze with tools like LinkFinder, SecretScanner, etc.
cat js_files.txt | while read js; do
echo "Analyzing: $js"
# Process with other tools
done
Saving Results
Abschnitt betitelt „Saving Results“# Save raw output
echo "example.com" | hakrawler > results.txt
# Save with timestamps
echo "example.com" | hakrawler > results_$(date +%s).txt
# Organize by type
echo "example.com" | hakrawler | tee >(grep "\.js" > js_endpoints.txt) \
>(grep "api" > api_endpoints.txt) \
>(grep "admin" > admin_endpoints.txt) > all.txt
Tips and Best Practices
Abschnitt betitelt „Tips and Best Practices“Performance
Abschnitt betitelt „Performance“- Use
-depth 1or-depth 2for quick scans - Increase timeout for slow or protected targets
- Crawl during off-peak hours to reduce target load
- Use proxies to avoid IP blocking
Results Quality
Abschnitt betitelt „Results Quality“- Deduplicate results with
sort -u - Filter noise (images, CSS, media files)
- Focus on interesting patterns (API, admin, user, etc.)
- Cross-validate with other tools
Safety and Ethics
Abschnitt betitelt „Safety and Ethics“- Only crawl domains you have permission to test
- Respect robots.txt if target has restrictions
- Use appropriate rate limiting to avoid DoS
- Document all discovered endpoints for reporting
Integration Tips
Abschnitt betitelt „Integration Tips“- Pipe output to other tools for chaining recon steps
- Use in automated reconnaissance pipelines
- Combine with subfinder for comprehensive domain coverage
- Feed results to security scanners for vulnerability detection