Katana Web Crawler Cheat Sheet¶
Überblick¶
Katana ist ein schnelles und anpassbares Web-Crawling-Framework von Project Discovery. Es ist entworfen, um Websites effizient zu kriechen, um Informationen zu sammeln und Endpunkte zu entdecken. Katana zeichnet sich durch seine Geschwindigkeit, Flexibilität und den Fokus auf Sicherheitstests aus.
Was Katana einzigartig macht, ist seine Fähigkeit, moderne Web-Anwendungen intelligent zu kriechen, einschließlich Single-Seite-Anwendungen (SPAs), die stark auf JavaScript verlassen. Es kann komplexe Webtechnologien behandeln und wertvolle Informationen wie URLs, JavaScript-Dateien, API-Endpunkte und andere Web-Assets extrahieren. Katana ist mit Sicherheitsexperten im Verstand gebaut, so dass es ein ausgezeichnetes Werkzeug für die Aufklärung während der Sicherheitsbewertungen und Bug bounty Jagd.
Katana unterstützt verschiedene Crawling-Strategien, darunter Standard-Crawling, JavaScript-Pasing und Sitemap-basiertes Crawling. Es kann angepasst werden, um sich auf bestimmte Arten von Ressourcen zu konzentrieren oder bestimmte Muster zu folgen, so dass es an verschiedene Sicherheitstests Szenarien anpassen kann. Das Werkzeug ist leicht in Sicherheitstest-Workflows integriert und kann mit anderen Project Discovery-Tools für umfassende Aufklärung kombiniert werden.
Installation¶
Verwenden Sie Go¶
```bash
Install using Go (requires Go 1.20 or later)¶
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
Verify installation¶
katana -version ```_
Verwendung von Docker¶
```bash
Pull the latest Docker image¶
docker pull projectdiscovery/katana:latest
Run Katana using Docker¶
docker run -it projectdiscovery/katana:latest -h ```_
Verwendung von Homebrew (macOS)¶
```bash
Install using Homebrew¶
brew install katana
Verify installation¶
katana -version ```_
Verwendung von PDTM (Projekt Discovery Tools Manager)¶
```bash
Install PDTM first if not already installed¶
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
Install Katana using PDTM¶
pdtm -i katana
Verify installation¶
katana -version ```_
Auf Kali Linux¶
```bash
Install using apt¶
sudo apt install katana
Verify installation¶
katana -version ```_
Basisnutzung¶
Crawling a Single URL¶
```bash
Crawl a single URL¶
katana -u https://example.com
Crawl with increased verbosity¶
katana -u https://example.com -v
Crawl with debug information¶
katana -u https://example.com -debug ```_
Crawling Mehrere URLs¶
```bash
Crawl multiple URLs¶
katana -u https://example.com,https://test.com
Crawl from a list of URLs¶
katana -list urls.txt
Crawl from STDIN¶
cat urls.txt|katana ```_
Ausgabeoptionen¶
```bash
Save results to a file¶
katana -u https://example.com -o results.txt
Output in JSON format¶
katana -u https://example.com -json -o results.json
Silent mode (only URLs)¶
katana -u https://example.com -silent ```_
Crawling Optionen¶
Crawling Tiefe und Umfang¶
```bash
Set crawling depth (default: 2)¶
katana -u https://example.com -depth 3
Crawl subdomains (default: false)¶
katana -u https://example.com -crawl-scope subs
Crawl out of scope (default: false)¶
katana -u https://example.com -crawl-scope out-of-scope
Crawl only in scope¶
katana -u https://example.com -crawl-scope strict ```_
Crawling Strategien¶
```bash
Use standard crawler¶
katana -u https://example.com -crawler standard
Use JavaScript parser¶
katana -u https://example.com -crawler js
Use sitemap-based crawler¶
katana -u https://example.com -crawler sitemap
Use robots.txt-based crawler¶
katana -u https://example.com -crawler robots
Use all crawlers¶
katana -u https://example.com -crawler standard,js,sitemap,robots ```_
Feldauswahl¶
```bash
Display specific fields¶
katana -u https://example.com -field url,path,method
Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint¶
```_
Erweiterte Nutzung¶
URL Filtern¶
```bash
Match URLs by regex¶
katana -u https://example.com -match-regex "admin|login|dashboard"
Filter URLs by regex¶
katana -u https://example.com -filter-regex "logout|static|images"
Match URLs by condition¶
katana -u https://example.com -field url -match-condition "contains('admin')" ```_
Ressourcenfilterung¶
```bash
Include specific file extensions¶
katana -u https://example.com -extension js,php,aspx
Exclude specific file extensions¶
katana -u https://example.com -exclude-extension png,jpg,gif
Include specific MIME types¶
katana -u https://example.com -mime-type application/json,text/html ```_
Form Füllung¶
```bash
Enable automatic form filling¶
katana -u https://example.com -form-fill
Use custom form values¶
katana -u https://example.com -form-fill -field-name "username=admin&password=admin" ```_
JavaScript Parsing¶
```bash
Enable JavaScript parsing¶
katana -u https://example.com -js-crawl
Set headless browser timeout¶
katana -u https://example.com -js-crawl -headless-timeout 20
Set browser path¶
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome ```_
Leistungsoptimierung¶
Concurrency und Rate Limiting¶
```bash
Set concurrency (default: 10)¶
katana -u https://example.com -concurrency 20
Set delay between requests (milliseconds)¶
katana -u https://example.com -delay 100
Set rate limit (requests per second)¶
katana -u https://example.com -rate-limit 50 ```_
Timeout Optionen¶
```bash
Set timeout for HTTP requests (seconds)¶
katana -u https://example.com -timeout 10
Set timeout for headless browser (seconds)¶
katana -u https://example.com -js-crawl -headless-timeout 30 ```_
Optimierung für große Scans¶
```bash
Disable automatic form filling for faster crawling¶
katana -u https://example.com -no-form-fill
Disable JavaScript parsing for faster crawling¶
katana -u https://example.com -no-js-crawl
Limit maximum URLs to crawl¶
katana -u https://example.com -max-urls 1000 ```_
Integration mit anderen Tools¶
Pipeline mit Subfinder¶
```bash
Find subdomains and crawl them¶
subfinder -d example.com -silent|katana -silent
Find subdomains, crawl them, and extract JavaScript files¶
subfinder -d example.com -silent|katana -silent -extension js ```_
Pipeline mit HTTPX¶
```bash
Probe URLs and crawl active ones¶
httpx -l urls.txt -silent|katana -silent
Crawl and then probe discovered endpoints¶
katana -u https://example.com -silent|httpx -silent ```_
Pipeline mit Nuclei¶
```bash
Crawl and scan for vulnerabilities¶
katana -u https://example.com -silent|nuclei -t cves/
Crawl, extract JavaScript files, and scan for vulnerabilities¶
katana -u https://example.com -silent -extension js|nuclei -t exposures/ ```_
Produktionsanpassung¶
Zollausgabe Format¶
```bash
Output only URLs¶
katana -u https://example.com -silent
Output URLs with specific fields¶
katana -u https://example.com -field url,path,method -o results.txt
Count discovered URLs¶
katana -u https://example.com -silent|wc -l
Sort output alphabetically¶
katana -u https://example.com -silent|sort ```_
Filterausgang¶
```bash
Filter by file extension¶
katana -u https://example.com -silent|grep ".js$"
Filter by endpoint pattern¶
katana -u https://example.com -silent|grep "/api/"
Find unique domains¶
katana -u https://example.com -silent|awk -F/ '\\{print $3\\}'|sort -u ```_
Erweiterte Filterung¶
URL Musteranpassung¶
```bash
Match specific URL patterns¶
katana -u https://example.com -match-regex "^https://example.com/admin"
Filter out specific URL patterns¶
katana -u https://example.com -filter-regex "^https://example.com/static"
Match URLs containing specific query parameters¶
katana -u https://example.com -match-regex "id=[0-9]+" ```_
Inhalt filtern¶
```bash
Match responses containing specific content¶
katana -u https://example.com -match-condition "contains(body, 'admin')"
Filter responses by status code¶
katana -u https://example.com -match-condition "status == 200"
Match responses by content type¶
katana -u https://example.com -match-condition "contains(content_type, 'application/json')" ```_
Proxy und Netzwerkoptionen¶
```bash
Use HTTP proxy¶
katana -u https://example.com -proxy http://127.0.0.1:8080
Use SOCKS5 proxy¶
katana -u https://example.com -proxy socks5://127.0.0.1:1080
Set custom headers¶
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
Set custom cookies¶
katana -u https://example.com -cookie "session=123456; user=admin" ```_
Verschiedenes Eigenschaften¶
Automatische Formfüllung¶
```bash
Enable automatic form filling¶
katana -u https://example.com -form-fill
Set custom form values¶
katana -u https://example.com -form-fill -field-name "username=admin&password=admin" ```_
Crawling Spezifische Pfade¶
```bash
Crawl specific paths¶
katana -u https://example.com -paths /admin,/login,/dashboard
Crawl from a file containing paths¶
katana -u https://example.com -paths-file paths.txt ```_
Antworten zum Thema¶
```bash
Store all responses¶
katana -u https://example.com -store-response
Specify response storage directory¶
katana -u https://example.com -store-response -store-response-dir responses/ ```_
Fehlerbehebung¶
Gemeinsame Themen¶
- **JavaScript Parsing Issues* ```bash # Increase headless browser timeout katana -u https://example.com -js-crawl -headless-timeout 30
# Specify Chrome path manually katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome ```_
- **Begrenzung durch Ziel* * ```bash # Reduce concurrency katana -u https://example.com -concurrency 5
# Add delay between requests katana -u https://example.com -delay 500 ```_
- **Memory Issues* ```bash # Limit maximum URLs to crawl katana -u https://example.com -max-urls 500
# Disable JavaScript parsing katana -u https://example.com -no-js-crawl ```_
- **Crawling Scope Issues* ```bash # Restrict crawling to specific domain katana -u https://example.com -crawl-scope strict
# Allow crawling subdomains katana -u https://example.com -crawl-scope subs ```_
Debugging¶
```bash
Enable verbose mode¶
katana -u https://example.com -v
Show debug information¶
katana -u https://example.com -debug
Show request and response details¶
katana -u https://example.com -debug -show-request -show-response ```_
Konfiguration¶
Datei konfigurieren¶
Katana verwendet eine Konfigurationsdatei unter $HOME/.config/katana/config.yaml
_. Sie können verschiedene Einstellungen in dieser Datei anpassen:
```yaml
Example configuration file¶
concurrency: 10 delay: 100 timeout: 10 max-depth: 3 crawl-scope: strict crawl-duration: 0 field: url,path,method extensions: js,php,aspx ```_
Umweltvariablen¶
```bash
Set Katana configuration via environment variables¶
export KATANA_CONCURRENCY=10 export KATANA_DELAY=100 export KATANA_TIMEOUT=10 export KATANA_MAX_DEPTH=3 ```_
Sachgebiet¶
Kommandozeilenoptionen¶
Flag | Description |
---|---|
-u, -url |
Target URL to crawl |
-list, -l |
File containing list of URLs to crawl |
-o, -output |
File to write output to |
-json |
Write output in JSON format |
-silent |
Show only URLs in output |
-v, -verbose |
Show verbose output |
-depth |
Maximum depth to crawl (default: 2) |
-crawl-scope |
Crawling scope (strict, subs, out-of-scope) |
-crawler |
Crawler types to use (standard, js, sitemap, robots) |
-field |
Fields to display in output |
-extension |
File extensions to include |
-exclude-extension |
File extensions to exclude |
-match-regex |
Regex pattern to match URLs |
-filter-regex |
Regex pattern to filter URLs |
-match-condition |
Condition to match URLs |
-form-fill |
Enable automatic form filling |
-js-crawl |
Enable JavaScript parsing |
-headless-timeout |
Timeout for headless browser (seconds) |
-chrome-path |
Path to Chrome browser |
-concurrency |
Number of concurrent requests |
-delay |
Delay between requests (milliseconds) |
-rate-limit |
Maximum number of requests per second |
-timeout |
Timeout for HTTP requests (seconds) |
-max-urls |
Maximum number of URLs to crawl |
-proxy |
HTTP/SOCKS5 proxy to use |
-header |
Custom header to add to all requests |
-cookie |
Custom cookies to add to all requests |
-paths |
Specific paths to crawl |
-paths-file |
File containing paths to crawl |
-store-response |
Store all responses |
-store-response-dir |
Directory to store responses |
-version |
Show Katana version |
Crawling Scopes¶
Scope | Description |
---|---|
strict |
Crawl only the exact domain provided |
subs |
Crawl the domain and its subdomains |
out-of-scope |
Crawl any domain, regardless of the initial domain |
Raupenarten¶
Type | Description |
---|---|
standard |
Standard HTTP crawler |
js |
JavaScript parser using headless browser |
sitemap |
Sitemap-based crawler |
robots |
Robots.txt-based crawler |
Feldoptionen¶
Field | Description |
---|---|
url |
Full URL |
path |
URL path |
method |
HTTP method |
host |
Host part of URL |
fqdn |
Fully qualified domain name |
scheme |
URL scheme (http/https) |
port |
URL port |
query |
Query parameters |
fragment |
URL fragment |
endpoint |
URL endpoint |
Ressourcen¶
- [offizielle Dokumentation](LINK_3_
- [GitHub Repository](LINK_3_
- [Project Discovery Discord](LINK_3_
--
*Dieses Betrügereiblatt bietet eine umfassende Referenz für die Verwendung von Katana, von grundlegendem Raupen bis hin zu fortschrittlicher Filterung und Integration mit anderen Werkzeugen. Für die aktuellsten Informationen finden Sie immer die offizielle Dokumentation. *