Katana Web Crawler Cheat Blatt¶
Im Überblick
Katana ist ein schnelles und anpassbares Web-Crawling-Framework von Project Discovery. Es ist entworfen, um Websites effizient zu kriechen, um Informationen zu sammeln und Endpunkte zu entdecken. Katana zeichnet sich durch seine Geschwindigkeit, Flexibilität und den Fokus auf Sicherheitstests aus.
Was Katana einzigartig macht, ist seine Fähigkeit, moderne Web-Anwendungen intelligent zu kriechen, einschließlich Single-Seite-Anwendungen (SPAs), die stark auf JavaScript verlassen. Es kann komplexe Webtechnologien behandeln und wertvolle Informationen wie URLs, JavaScript-Dateien, API-Endpunkte und andere Web-Assets extrahieren. Katana ist mit Sicherheitsexperten im Verstand gebaut, so dass es ein ausgezeichnetes Werkzeug für die Aufklärung während der Sicherheitsbewertungen und Bug bounty Jagd.
Katana unterstützt verschiedene Crawling-Strategien, darunter Standard-Crawling, JavaScript-Pasing und Sitemap-basiertes Crawling. Es kann angepasst werden, um sich auf bestimmte Arten von Ressourcen zu konzentrieren oder bestimmte Muster zu folgen, so dass es an verschiedene Sicherheitstests Szenarien anpassen kann. Das Werkzeug ist leicht in Sicherheitstest-Workflows integriert und kann mit anderen Project Discovery-Tools für umfassende Aufklärung kombiniert werden.
• Installation
Verwenden von Go¶
# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
# Verify installation
katana -version
Verwenden von Docker¶
# Pull the latest Docker image
docker pull projectdiscovery/katana:latest
# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h
Verwendung von Homebrew (macOS)¶
Verwenden von PDTM (Projekt Discovery Tools Manager)¶
# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
# Install Katana using PDTM
pdtm -i katana
# Verify installation
katana -version
Auf Kali Linux
oder Basisnutzung
Crawling a Single URL¶
# Crawl a single URL
katana -u https://example.com
# Crawl with increased verbosity
katana -u https://example.com -v
# Crawl with debug information
katana -u https://example.com -debug
Crawling Mehrere URLs¶
# Crawl multiple URLs
katana -u https://example.com,https://test.com
# Crawl from a list of URLs
katana -list urls.txt
# Crawl from STDIN
cat urls.txt|katana
Ausgabeoptionen¶
# Save results to a file
katana -u https://example.com -o results.txt
# Output in JSON format
katana -u https://example.com -json -o results.json
# Silent mode (only URLs)
katana -u https://example.com -silent
Crawling Optionen
Crawling Tiefe und Umfang¶
# Set crawling depth (default: 2)
katana -u https://example.com -depth 3
# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs
# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope
# Crawl only in scope
katana -u https://example.com -crawl-scope strict
Crawling Strategies¶
# Use standard crawler
katana -u https://example.com -crawler standard
# Use JavaScript parser
katana -u https://example.com -crawler js
# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap
# Use robots.txt-based crawler
katana -u https://example.com -crawler robots
# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots
Feldauswahl¶
# Display specific fields
katana -u https://example.com -field url,path,method
# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint
/ Fortgeschrittene Nutzung
URL Filtern¶
# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"
# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"
# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"
Ressourcenfilterung¶
# Include specific file extensions
katana -u https://example.com -extension js,php,aspx
# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif
# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html
Form Filling¶
# Enable automatic form filling
katana -u https://example.com -form-fill
# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
JavaScript Parsing¶
# Enable JavaScript parsing
katana -u https://example.com -js-crawl
# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20
# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome
 Leistungsoptimierung
Concurrency and Rate Limiting¶
# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20
# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100
# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50
Timeout Optionen¶
# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10
# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30
Optimierung für große Scans¶
# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill
# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000
Integration mit anderen Tools
Pipeline mit Subfinder¶
# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent
# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js
Pipeline mit HTTPX¶
# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent
# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent
Pipeline mit Nuclei¶
# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/
# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/
/ Output Customization
Individuelle Ausgabeformat¶
# Output only URLs
katana -u https://example.com -silent
# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt
# Count discovered URLs
katana -u https://example.com -silent|wc -l
# Sort output alphabetically
katana -u https://example.com -silent|sort
Filterausgang¶
# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"
# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"
# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u
Erweiterte Filterung
URL Pattern Matching¶
# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"
# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"
# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"
Inhalt filtern¶
# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"
# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"
# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"
Proxy und Netzwerkoptionen
# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080
# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080
# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"
In den Warenkorb Eigenschaften
Automatische Formfüllung¶
# Enable automatic form filling
katana -u https://example.com -form-fill
# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
Crawling Specific Paths¶
# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard
# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt
Antworten speichern¶
# Store all responses
katana -u https://example.com -store-response
# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/
Fehlerbehebung
Häufige Fragen¶
ANHANG **JavaScript Parsing Issues*
# Increase headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 30
# Specify Chrome path manually
katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
```_
2. **Begrenzung durch Ziel* *
```bash
# Reduce concurrency
katana -u https://example.com -concurrency 5
# Add delay between requests
katana -u https://example.com -delay 500
```_
3. **Memory Issues*
```bash
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 500
# Disable JavaScript parsing
katana -u https://example.com -no-js-crawl
```_
4. **Crawling Scope Issues*
```bash
# Restrict crawling to specific domain
katana -u https://example.com -crawl-scope strict
# Allow crawling subdomains
katana -u https://example.com -crawl-scope subs
```_
### Debugging
```bash
# Enable verbose mode
katana -u https://example.com -v
# Show debug information
katana -u https://example.com -debug
# Show request and response details
katana -u https://example.com -debug -show-request -show-response
Konfiguration
Konfigurationsdatei¶
Katana verwendet eine Konfigurationsdatei unter $HOME/.config/katana/config.yaml_. Sie können verschiedene Einstellungen in dieser Datei anpassen:
# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx
Umgebungsvariablen¶
# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3
Referenz
Kommandozeilenoptionen¶
| Flag | Description |
|---|---|
| INLINE_CODE_37 | Target URL to crawl |
| INLINE_CODE_38 | File containing list of URLs to crawl |
| INLINE_CODE_39 | File to write output to |
| INLINE_CODE_40 | Write output in JSON format |
| INLINE_CODE_41 | Show only URLs in output |
| INLINE_CODE_42 | Show verbose output |
| INLINE_CODE_43 | Maximum depth to crawl (default: 2) |
| INLINE_CODE_44 | Crawling scope (strict, subs, out-of-scope) |
| INLINE_CODE_45 | Crawler types to use (standard, js, sitemap, robots) |
| INLINE_CODE_46 | Fields to display in output |
| INLINE_CODE_47 | File extensions to include |
| INLINE_CODE_48 | File extensions to exclude |
| INLINE_CODE_49 | Regex pattern to match URLs |
| INLINE_CODE_50 | Regex pattern to filter URLs |
| INLINE_CODE_51 | Condition to match URLs |
| INLINE_CODE_52 | Enable automatic form filling |
| INLINE_CODE_53 | Enable JavaScript parsing |
| INLINE_CODE_54 | Timeout for headless browser (seconds) |
| INLINE_CODE_55 | Path to Chrome browser |
| INLINE_CODE_56 | Number of concurrent requests |
| INLINE_CODE_57 | Delay between requests (milliseconds) |
| INLINE_CODE_58 | Maximum number of requests per second |
| INLINE_CODE_59 | Timeout for HTTP requests (seconds) |
| INLINE_CODE_60 | Maximum number of URLs to crawl |
| INLINE_CODE_61 | HTTP/SOCKS5 proxy to use |
| INLINE_CODE_62 | Custom header to add to all requests |
| INLINE_CODE_63 | Custom cookies to add to all requests |
| INLINE_CODE_64 | Specific paths to crawl |
| INLINE_CODE_65 | File containing paths to crawl |
| INLINE_CODE_66 | Store all responses |
| INLINE_CODE_67 | Directory to store responses |
| INLINE_CODE_68 | Show Katana version |
| _ | |
| ### Crawling Scopes |
| Scope | Description |
|---|---|
| INLINE_CODE_69 | Crawl only the exact domain provided |
| INLINE_CODE_70 | Crawl the domain and its subdomains |
| INLINE_CODE_71 | Crawl any domain, regardless of the initial domain |
Crawler Typen¶
| Type | Description |
|---|---|
| INLINE_CODE_72 | Standard HTTP crawler |
| INLINE_CODE_73 | JavaScript parser using headless browser |
| INLINE_CODE_74 | Sitemap-based crawler |
| INLINE_CODE_75 | Robots.txt-based crawler |
| _ | |
| ### Feldoptionen |
| Field | Description |
|---|---|
| INLINE_CODE_76 | Full URL |
| INLINE_CODE_77 | URL path |
| INLINE_CODE_78 | HTTP method |
| INLINE_CODE_79 | Host part of URL |
| INLINE_CODE_80 | Fully qualified domain name |
| INLINE_CODE_81 | URL scheme (http/https) |
| INLINE_CODE_82 | URL port |
| INLINE_CODE_83 | Query parameters |
| INLINE_CODE_84 | URL fragment |
| INLINE_CODE_85 | URL endpoint |
Ressourcen
- offizielle Dokumentation
- (GitHub Repository)(https://github.com/projectdiscovery/katana)
- [Project Discovery Discord](URL_88_
--
*Dieses Betrügereiblatt bietet eine umfassende Referenz für die Verwendung von Katana, von grundlegendem Raupen bis hin zu fortschrittlicher Filterung und Integration mit anderen Werkzeugen. Für die aktuellsten Informationen finden Sie immer die offizielle Dokumentation. *