HTTrack Website Copier is a free, portable utility that downloads entire websites to your computer, creating a complete offline mirror. It’s invaluable during security assessments for analyzing web applications, discovering hidden directories and files, identifying server configurations, and understanding application architecture. HTTrack is available in Kali Linux and supports multiple platforms.
The tool handles cookies, authentication, JavaScript execution, and can filter content by MIME type, URL patterns, and file extensions. It’s particularly useful for analyzing large web applications and discovering security misconfigurations.
# Kali Linux (pre-installed)
httrack --version
# Debian/Ubuntu
sudo apt-get install httrack webhttrack
# macOS
brew install httrack
# From source
git clone https://github.com/xroche/httrack
cd httrack
./configure && make
sudo make install
httrack <options> <url> [<url2> ...] [-O <folder>]
| Command | Description |
|---|
httrack http://example.com | Mirror entire website locally |
httrack https://example.com -O ./mirror | Save to custom directory |
httrack http://example.com/path | Mirror specific path only |
httrack --help | Display help information |
httrack --version | Show version information |
# Mirror a simple website
httrack http://example.com
# Mirror with custom output directory
httrack http://example.com -O ./website_mirror
# Mirror HTTPS website
httrack https://example.com -O ./secure_mirror
# Mirror multiple URLs
httrack http://example.com http://subdomain.example.com -O ./multi_mirror
| Option | Description | Example |
|---|
-r | Set recursion depth | httrack -r 5 http://example.com |
-m | Maximum file size (KB) | httrack -m 50000 http://example.com |
-c | Number of simultaneous connections | httrack -c 8 http://example.com |
-e | Execution level (0=none, 1=JS, etc.) | httrack -e 1 http://example.com |
| Option | Description | Example |
|---|
-A | Accept MIME types | httrack -A text/html,text/css |
-R | Reject MIME types | httrack -R .exe,.zip,.iso |
-%F | Follow FTP links | httrack -%F http://example.com |
--spider | Spider mode (no download) | httrack --spider http://example.com |
| Option | Description | Example |
|---|
-N | Never overwrite existing files | httrack -N http://example.com |
-n | Maximum files to download | httrack -n 10000 http://example.com |
-T | Connection timeout (seconds) | httrack -T 60 http://example.com |
-I | Identify as browser/bot | httrack -I http://example.com |
# Provide authentication
httrack http://user:password@example.com -O ./authenticated
# Cookie handling
httrack http://example.com -%c -O ./with_cookies
# Custom User-Agent
httrack http://example.com -u "Mozilla/5.0" -O ./custom_ua
# Include only specific paths
httrack http://example.com/app/* http://example.com/api/* -O ./filtered
# Exclude specific paths
httrack http://example.com -* +*.jpg +*.png -O ./images_only
# Mirror external links (within domain)
httrack http://example.com -%e -O ./external
# Mirror subdomains
httrack http://example.com http://*.example.com -O ./subdomains
# Deep recursive mirror (level 10)
httrack -r 10 http://example.com -O ./deep_mirror
# Large file limits
httrack -m 500000 http://example.com -O ./large_files
# Multiple connections (faster)
httrack -c 16 http://example.com -O ./fast
# Disable Java, Flash, etc.
httrack -* +*.html +*.htm +*.css +*.js +*.jpg +*.gif +*.png http://example.com
# Mirror target application
httrack https://target-app.com -r 8 -O ./target_mirror
# Analyze directory structure
find ./target_mirror -type d | head -20
# Identify file types
find ./target_mirror -type f | sed 's/.*\.//' | sort | uniq -c
# Extract all URLs
grep -roP 'href="[^"]*"' ./target_mirror/html | cut -d'"' -f2 | sort -u
# Mirror API documentation
httrack https://api.example.com -r 6 -O ./api_mirror
# Extract API endpoints
grep -roP '/(api|v[0-9]+)/?[a-zA-Z0-9/_-]*' ./api_mirror/html | sort -u
# Find parameter patterns
grep -roP '\?[a-zA-Z0-9_&=]*' ./api_mirror/html | sort -u
# Mirror entire site
httrack http://example.com -r 10 -O ./full_mirror
# Search for config files
find ./full_mirror -name "*.conf" -o -name "*.config" -o -name "*.json" -o -name "*.xml"
# Look for hardcoded credentials
grep -r "password\|apikey\|token\|secret" ./full_mirror/html
# Extract JavaScript for analysis
find ./full_mirror -name "*.js" -type f | head -20
# Mirror simple website with default settings
httrack http://example.com -O ./mirror_$(date +%Y%m%d)
# Navigate to results
cd mirror_example.com/
ls -la
# Open in browser
firefox index.html
# Mirror with aggressive settings for app discovery
httrack \
-r 10 \
-m 100000 \
-c 16 \
-T 60 \
http://target.local:8080 \
-O ./deep_analysis
# Search for interesting files
find ./deep_analysis -type f \( \
-name "*.js" -o \
-name "*.json" -o \
-name "*.xml" -o \
-name "*.config" \
\) | wc -l
# Mirror API with specific patterns
httrack \
-r 6 \
"https://api.example.com/*" \
"https://api.example.com/v1/*" \
"https://api.example.com/v2/*" \
-O ./api_analysis
# Extract endpoints
grep -roP '/(api|v[0-9]+)/[a-zA-Z0-9/_-]*' ./api_analysis/html | sort -u > endpoints.txt
# Count discovered endpoints
wc -l endpoints.txt
# Mirror only JavaScript and HTML files
httrack \
-r 8 \
-* \
+*.html \
+*.htm \
+*.js \
http://example.com \
-O ./js_analysis
# Analyze JavaScript sizes
du -h ./js_analysis/html -s
find ./js_analysis -name "*.js" | wc -l
mirror_example.com/
├── index.html # Site homepage
├── hts-cache/ # HTTrack cache files
│ ├── new.txt # Newly discovered URLs
│ ├── seen.txt # Already processed URLs
│ └── cache.txt # Cache information
├── backblue.gif
├── cookies.txt # Saved cookies
└── html/
├── example.com/
│ ├── index.html
│ ├── about/
│ ├── contact/
│ └── ...
# Count total files downloaded
find ./mirror -type f | wc -l
# Find all JavaScript files
find ./mirror -name "*.js" | wc -l
# List largest files
du -h ./mirror -S | sort -rh | head -20
# Extract all URLs from HTML
grep -roP '(href|src|action)="[^"]*"' ./mirror -h | cut -d'"' -f2 | sort -u
# Find commented-out code
grep -r "<!--" ./mirror/html | head -20
# Search for API endpoints in JS
grep -r "fetch\|XMLHttpRequest\|axios\|jQuery.ajax" ./mirror -h | head -20
# Launch graphical interface
webhttrack
# Or via command line
webhttrack http://example.com
- Point-and-click URL configuration
- Visual progress monitoring
- Pause/resume capability
- Browser-based interface
- Project management and history
# Start web interface (port 8080)
webhttrack
# Access at http://localhost:8080
# Configure URLs, options, and monitor progress
# Faster download with more connections
httrack -c 16 http://example.com -O ./fast_mirror
# For very large sites with files
httrack -c 32 -r 10 -m 200000 http://large-site.com -O ./large_mirror
# Limit bandwidth (1MB/s)
httrack --max-rate 1000 http://example.com
# Smaller timeout for unresponsive servers
httrack -T 30 http://slow-server.com
| Issue | Solution |
|---|
| Connection refused | Check URL, firewall, or proxy settings |
| Incomplete mirror | Increase recursion depth with -r |
| Large downloads | Set size limit with -m or use file filters |
| Authentication failed | Provide credentials in URL: http://user:pass@host |
| JavaScript not executed | Enable with -e 1 flag |
| Timeout errors | Increase timeout: -T 120 |
# Comprehensive mirror for security analysis
httrack \
-r 10 \
-m 100000 \
-c 16 \
-T 60 \
-u "Mozilla/5.0 (Windows)" \
https://target.com \
-O ./security_assessment
# Archive the mirror
tar -czf target_mirror_$(date +%Y%m%d_%H%M%S).tar.gz ./security_assessment
# Create inventory
find ./security_assessment -type f > inventory.txt
wc -l inventory.txt
# Mirror current version
httrack https://target.com -O ./version_current
# Later, compare with previous
diff -r ./version_previous ./version_current > changes.diff
# Or use find to identify new files
find ./version_current -newer ./version_previous -type f
- Authorization: Only mirror websites you own or have explicit written permission to test
- Robots.txt Compliance: HTTrack respects robots.txt by default; override with care
- Rate Limiting: Use appropriate concurrency settings to avoid DoS-like behavior
- Copyright: Respect copyright laws; use mirrors for authorized security testing only
- Confidentiality: Protect downloaded content containing sensitive information
- wget: Command-line download utility
- curl: HTTP client for single-file downloads
- Burp Suite: Professional web application security testing
- OWASP ZAP: Free automated web security scanning
- curl: HTTP client for detailed analysis
- grep/find: Content analysis and file discovery