Overview
HTTrack Website Copier is a free, portable utility that downloads entire websites to your computer, creating a complete offline mirror. It’s invaluable during security assessments for analyzing web applications, discovering hidden directories and files, identifying server configurations, and understanding application architecture. HTTrack is available in Kali Linux and supports multiple platforms.
The tool handles cookies, authentication, JavaScript execution, and can filter content by MIME type, URL patterns, and file extensions. It’s particularly useful for analyzing large web applications and discovering security misconfigurations.
Installation
# Kali Linux (pre-installed)
httrack --version
# Debian/Ubuntu
sudo apt-get install httrack webhttrack
# macOS
brew install httrack
# From source
git clone https://github.com/xroche/httrack
cd httrack
./configure && make
sudo make install
Basic Usage
Command Syntax
httrack <options> <url> [<url2> ...] [-O <folder>]
Simple Website Mirroring
| Command | Description |
|---|
httrack http://example.com | Mirror entire website locally |
httrack https://example.com -O ./mirror | Save to custom directory |
httrack http://example.com/path | Mirror specific path only |
httrack --help | Display help information |
httrack --version | Show version information |
Basic Examples
# Mirror a simple website
httrack http://example.com
# Mirror with custom output directory
httrack http://example.com -O ./website_mirror
# Mirror HTTPS website
httrack https://example.com -O ./secure_mirror
# Mirror multiple URLs
httrack http://example.com http://subdomain.example.com -O ./multi_mirror
Common Options
Mirror Scope and Depth
| Option | Description | Example |
|---|
-r | Set recursion depth | httrack -r 5 http://example.com |
-m | Maximum file size (KB) | httrack -m 50000 http://example.com |
-c | Number of simultaneous connections | httrack -c 8 http://example.com |
-e | Execution level (0=none, 1=JS, etc.) | httrack -e 1 http://example.com |
File and Content Filtering
| Option | Description | Example |
|---|
-A | Accept MIME types | httrack -A text/html,text/css |
-R | Reject MIME types | httrack -R .exe,.zip,.iso |
-%F | Follow FTP links | httrack -%F http://example.com |
--spider | Spider mode (no download) | httrack --spider http://example.com |
| Option | Description | Example |
|---|
-N | Never overwrite existing files | httrack -N http://example.com |
-n | Maximum files to download | httrack -n 10000 http://example.com |
-T | Connection timeout (seconds) | httrack -T 60 http://example.com |
-I | Identify as browser/bot | httrack -I http://example.com |
Advanced Options
Authentication and Cookies
# Provide authentication
httrack http://user:password@example.com -O ./authenticated
# Cookie handling
httrack http://example.com -%c -O ./with_cookies
# Custom User-Agent
httrack http://example.com -u "Mozilla/5.0" -O ./custom_ua
URL Filtering
# Include only specific paths
httrack http://example.com/app/* http://example.com/api/* -O ./filtered
# Exclude specific paths
httrack http://example.com -* +*.jpg +*.png -O ./images_only
# Mirror external links (within domain)
httrack http://example.com -%e -O ./external
# Mirror subdomains
httrack http://example.com http://*.example.com -O ./subdomains
Advanced Mirroring Options
# Deep recursive mirror (level 10)
httrack -r 10 http://example.com -O ./deep_mirror
# Large file limits
httrack -m 500000 http://example.com -O ./large_files
# Multiple connections (faster)
httrack -c 16 http://example.com -O ./fast
# Disable Java, Flash, etc.
httrack -* +*.html +*.htm +*.css +*.js +*.jpg +*.gif +*.png http://example.com
Reconnaissance Workflows
Web Application Architecture Discovery
# Mirror target application
httrack https://target-app.com -r 8 -O ./target_mirror
# Analyze directory structure
find ./target_mirror -type d | head -20
# Identify file types
find ./target_mirror -type f | sed 's/.*\.//' | sort | uniq -c
# Extract all URLs
grep -roP 'href="[^"]*"' ./target_mirror/html | cut -d'"' -f2 | sort -u
API Endpoint Discovery
# Mirror API documentation
httrack https://api.example.com -r 6 -O ./api_mirror
# Extract API endpoints
grep -roP '/(api|v[0-9]+)/?[a-zA-Z0-9/_-]*' ./api_mirror/html | sort -u
# Find parameter patterns
grep -roP '\?[a-zA-Z0-9_&=]*' ./api_mirror/html | sort -u
Configuration and Secrets Discovery
# Mirror entire site
httrack http://example.com -r 10 -O ./full_mirror
# Search for config files
find ./full_mirror -name "*.conf" -o -name "*.config" -o -name "*.json" -o -name "*.xml"
# Look for hardcoded credentials
grep -r "password\|apikey\|token\|secret" ./full_mirror/html
# Extract JavaScript for analysis
find ./full_mirror -name "*.js" -type f | head -20
Practical Examples
Example 1: Basic Website Mirror
# Mirror simple website with default settings
httrack http://example.com -O ./mirror_$(date +%Y%m%d)
# Navigate to results
cd mirror_example.com/
ls -la
# Open in browser
firefox index.html
Example 2: Deep Application Analysis
# Mirror with aggressive settings for app discovery
httrack \
-r 10 \
-m 100000 \
-c 16 \
-T 60 \
http://target.local:8080 \
-O ./deep_analysis
# Search for interesting files
find ./deep_analysis -type f \( \
-name "*.js" -o \
-name "*.json" -o \
-name "*.xml" -o \
-name "*.config" \
\) | wc -l
Example 3: API-Focused Mirror
# Mirror API with specific patterns
httrack \
-r 6 \
"https://api.example.com/*" \
"https://api.example.com/v1/*" \
"https://api.example.com/v2/*" \
-O ./api_analysis
# Extract endpoints
grep -roP '/(api|v[0-9]+)/[a-zA-Z0-9/_-]*' ./api_analysis/html | sort -u > endpoints.txt
# Count discovered endpoints
wc -l endpoints.txt
Example 4: Selective Content Mirror
# Mirror only JavaScript and HTML files
httrack \
-r 8 \
-* \
+*.html \
+*.htm \
+*.js \
http://example.com \
-O ./js_analysis
# Analyze JavaScript sizes
du -h ./js_analysis/html -s
find ./js_analysis -name "*.js" | wc -l
Output and Analysis
Directory Structure
mirror_example.com/
├── index.html # Site homepage
├── hts-cache/ # HTTrack cache files
│ ├── new.txt # Newly discovered URLs
│ ├── seen.txt # Already processed URLs
│ └── cache.txt # Cache information
├── backblue.gif
├── cookies.txt # Saved cookies
└── html/
├── example.com/
│ ├── index.html
│ ├── about/
│ ├── contact/
│ └── ...
Useful Analysis Commands
# Count total files downloaded
find ./mirror -type f | wc -l
# Find all JavaScript files
find ./mirror -name "*.js" | wc -l
# List largest files
du -h ./mirror -S | sort -rh | head -20
# Extract all URLs from HTML
grep -roP '(href|src|action)="[^"]*"' ./mirror -h | cut -d'"' -f2 | sort -u
# Find commented-out code
grep -r "<!--" ./mirror/html | head -20
# Search for API endpoints in JS
grep -r "fetch\|XMLHttpRequest\|axios\|jQuery.ajax" ./mirror -h | head -20
HTTrack GUI (WebHTTrack)
Graphical Interface
# Launch graphical interface
webhttrack
# Or via command line
webhttrack http://example.com
GUI Features
- Point-and-click URL configuration
- Visual progress monitoring
- Pause/resume capability
- Browser-based interface
- Project management and history
GUI Usage
# Start web interface (port 8080)
webhttrack
# Access at http://localhost:8080
# Configure URLs, options, and monitor progress
Multi-Connection Mirroring
# Faster download with more connections
httrack -c 16 http://example.com -O ./fast_mirror
# For very large sites with files
httrack -c 32 -r 10 -m 200000 http://large-site.com -O ./large_mirror
Bandwidth Control
# Limit bandwidth (1MB/s)
httrack --max-rate 1000 http://example.com
# Smaller timeout for unresponsive servers
httrack -T 30 http://slow-server.com
Troubleshooting
| Issue | Solution |
|---|
| Connection refused | Check URL, firewall, or proxy settings |
| Incomplete mirror | Increase recursion depth with -r |
| Large downloads | Set size limit with -m or use file filters |
| Authentication failed | Provide credentials in URL: http://user:pass@host |
| JavaScript not executed | Enable with -e 1 flag |
| Timeout errors | Increase timeout: -T 120 |
Advanced Reconnaissance
Full Application Security Testing
# Comprehensive mirror for security analysis
httrack \
-r 10 \
-m 100000 \
-c 16 \
-T 60 \
-u "Mozilla/5.0 (Windows)" \
https://target.com \
-O ./security_assessment
# Archive the mirror
tar -czf target_mirror_$(date +%Y%m%d_%H%M%S).tar.gz ./security_assessment
# Create inventory
find ./security_assessment -type f > inventory.txt
wc -l inventory.txt
Comparing Two Website Versions
# Mirror current version
httrack https://target.com -O ./version_current
# Later, compare with previous
diff -r ./version_previous ./version_current > changes.diff
# Or use find to identify new files
find ./version_current -newer ./version_previous -type f
Security and Legal Considerations
- Authorization: Only mirror websites you own or have explicit written permission to test
- Robots.txt Compliance: HTTrack respects robots.txt by default; override with care
- Rate Limiting: Use appropriate concurrency settings to avoid DoS-like behavior
- Copyright: Respect copyright laws; use mirrors for authorized security testing only
- Confidentiality: Protect downloaded content containing sensitive information
- wget: Command-line download utility
- curl: HTTP client for single-file downloads
- Burp Suite: Professional web application security testing
- OWASP ZAP: Free automated web security scanning
- curl: HTTP client for detailed analysis
- grep/find: Content analysis and file discovery