Katana Web Crawler Cheat Sheet
Überblick
Katana ist ein schnelles und anpassbares Web-Crawling-Framework von Project Discovery. Es ist entworfen, um Websites effizient zu kriechen, um Informationen zu sammeln und Endpunkte zu entdecken. Katana zeichnet sich durch seine Geschwindigkeit, Flexibilität und den Fokus auf Sicherheitstests aus.
Was Katana einzigartig macht, ist seine Fähigkeit, moderne Web-Anwendungen intelligent zu kriechen, einschließlich Single-Seite-Anwendungen (SPAs), die stark auf JavaScript verlassen. Es kann komplexe Webtechnologien behandeln und wertvolle Informationen wie URLs, JavaScript-Dateien, API-Endpunkte und andere Web-Assets extrahieren. Katana ist mit Sicherheitsexperten im Verstand gebaut, so dass es ein ausgezeichnetes Werkzeug für die Aufklärung während der Sicherheitsbewertungen und Bug bounty Jagd.
Katana unterstützt verschiedene Crawling-Strategien, darunter Standard-Crawling, JavaScript-Pasing und Sitemap-basiertes Crawling. Es kann angepasst werden, um sich auf bestimmte Arten von Ressourcen zu konzentrieren oder bestimmte Muster zu folgen, so dass es an verschiedene Sicherheitstests Szenarien anpassen kann. Das Werkzeug ist leicht in Sicherheitstest-Workflows integriert und kann mit anderen Project Discovery-Tools für umfassende Aufklärung kombiniert werden.
Installation
Verwenden Sie Go
```bash
Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
Verify installation
katana -version ```_
Verwendung von Docker
```bash
Pull the latest Docker image
docker pull projectdiscovery/katana:latest
Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h ```_
Verwendung von Homebrew (macOS)
```bash
Install using Homebrew
brew install katana
Verify installation
katana -version ```_
Verwendung von PDTM (Projekt Discovery Tools Manager)
```bash
Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
Install Katana using PDTM
pdtm -i katana
Verify installation
katana -version ```_
Auf Kali Linux
```bash
Install using apt
sudo apt install katana
Verify installation
katana -version ```_
Basisnutzung
Crawling a Single URL
```bash
Crawl a single URL
katana -u https://example.com
Crawl with increased verbosity
katana -u https://example.com -v
Crawl with debug information
katana -u https://example.com -debug ```_
Crawling Mehrere URLs
```bash
Crawl multiple URLs
katana -u https://example.com,https://test.com
Crawl from a list of URLs
katana -list urls.txt
Crawl from STDIN
cat urls.txt|katana ```_
Ausgabeoptionen
```bash
Save results to a file
katana -u https://example.com -o results.txt
Output in JSON format
katana -u https://example.com -json -o results.json
Silent mode (only URLs)
katana -u https://example.com -silent ```_
Crawling Optionen
Crawling Tiefe und Umfang
```bash
Set crawling depth (default: 2)
katana -u https://example.com -depth 3
Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs
Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope
Crawl only in scope
katana -u https://example.com -crawl-scope strict ```_
Crawling Strategien
```bash
Use standard crawler
katana -u https://example.com -crawler standard
Use JavaScript parser
katana -u https://example.com -crawler js
Use sitemap-based crawler
katana -u https://example.com -crawler sitemap
Use robots.txt-based crawler
katana -u https://example.com -crawler robots
Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots ```_
Feldauswahl
```bash
Display specific fields
katana -u https://example.com -field url,path,method
Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint
```_
Erweiterte Nutzung
URL Filtern
```bash
Match URLs by regex
| katana -u https://example.com -match-regex "admin | login | dashboard" |
Filter URLs by regex
| katana -u https://example.com -filter-regex "logout | static | images" |
Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')" ```_
Ressourcenfilterung
```bash
Include specific file extensions
katana -u https://example.com -extension js,php,aspx
Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif
Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html ```_
Form Füllung
```bash
Enable automatic form filling
katana -u https://example.com -form-fill
Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password;=admin" ```_
JavaScript Parsing
```bash
Enable JavaScript parsing
katana -u https://example.com -js-crawl
Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20
Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome ```_
Leistungsoptimierung
Concurrency und Rate Limiting
```bash
Set concurrency (default: 10)
katana -u https://example.com -concurrency 20
Set delay between requests (milliseconds)
katana -u https://example.com -delay 100
Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50 ```_
Timeout Optionen
```bash
Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10
Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30 ```_
Optimierung für große Scans
```bash
Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill
Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl
Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000 ```_
Integration mit anderen Tools
Pipeline mit Subfinder
```bash
Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent
Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js ```_
Pipeline mit HTTPX
```bash
Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent
Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent ```_
Pipeline mit Nuclei
```bash
Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/
Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/ ```_
Produktionsanpassung
Zollausgabe Format
```bash
Output only URLs
katana -u https://example.com -silent
Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt
Count discovered URLs
katana -u https://example.com -silent|wc -l
Sort output alphabetically
katana -u https://example.com -silent|sort ```_
Filterausgang
```bash
Filter by file extension
katana -u https://example.com -silent|grep ".js$"
Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"
Find unique domains
| katana -u https://example.com -silent | awk -F/ '\\{print $3\\}' | sort -u | ```_
Erweiterte Filterung
URL Musteranpassung
```bash
Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"
Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"
Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+" ```_
Inhalt filtern
```bash
Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"
Filter responses by status code
katana -u https://example.com -match-condition "status == 200"
Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')" ```_
Proxy und Netzwerkoptionen
```bash
Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080
Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080
Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin" ```_
Verschiedenes Eigenschaften
Automatische Formfüllung
```bash
Enable automatic form filling
katana -u https://example.com -form-fill
Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password;=admin" ```_
Crawling Spezifische Pfade
```bash
Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard
Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt ```_
Antworten zum Thema
```bash
Store all responses
katana -u https://example.com -store-response
Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/ ```_
Fehlerbehebung
Gemeinsame Themen
- *JavaScript Parsing Issues ```bash # Increase headless browser timeout katana -u https://example.com -js-crawl -headless-timeout 30
# Specify Chrome path manually katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
```_
- *Begrenzung durch Ziel * ```bash # Reduce concurrency katana -u https://example.com -concurrency 5
# Add delay between requests katana -u https://example.com -delay 500
```_
- *Memory Issues ```bash # Limit maximum URLs to crawl katana -u https://example.com -max-urls 500
# Disable JavaScript parsing katana -u https://example.com -no-js-crawl
```_
- *Crawling Scope Issues ```bash # Restrict crawling to specific domain katana -u https://example.com -crawl-scope strict
# Allow crawling subdomains katana -u https://example.com -crawl-scope subs
```_
Debugging
```bash
Enable verbose mode
katana -u https://example.com -v
Show debug information
katana -u https://example.com -debug
Show request and response details
katana -u https://example.com -debug -show-request -show-response ```_
Konfiguration
Datei konfigurieren
Katana verwendet eine Konfigurationsdatei unter $HOME/.config/katana/config.yaml
_. Sie können verschiedene Einstellungen in dieser Datei anpassen:
```yaml
Example configuration file
concurrency: 10 delay: 100 timeout: 10 max-depth: 3 crawl-scope: strict crawl-duration: 0 field: url,path,method extensions: js,php,aspx ```_
Umweltvariablen
```bash
Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10 export KATANA_DELAY=100 export KATANA_TIMEOUT=10 export KATANA_MAX_DEPTH=3 ```_
Sachgebiet
Kommandozeilenoptionen
| | Flag | Description | |
| --- | --- |
| | -u, -url
| Target URL to crawl | |
| | -list, -l
| File containing list of URLs to crawl | |
| | -o, -output
| File to write output to | |
| | -json
| Write output in JSON format | |
| | -silent
| Show only URLs in output | |
| | -v, -verbose
| Show verbose output | |
| | -depth
| Maximum depth to crawl (default: 2) | |
| | -crawl-scope
| Crawling scope (strict, subs, out-of-scope) | |
| | -crawler
| Crawler types to use (standard, js, sitemap, robots) | |
| | -field
| Fields to display in output | |
| | -extension
| File extensions to include | |
| | -exclude-extension
| File extensions to exclude | |
| | -match-regex
| Regex pattern to match URLs | |
| | -filter-regex
| Regex pattern to filter URLs | |
| | -match-condition
| Condition to match URLs | |
| | -form-fill
| Enable automatic form filling | |
| | -js-crawl
| Enable JavaScript parsing | |
| | -headless-timeout
| Timeout for headless browser (seconds) | |
| | -chrome-path
| Path to Chrome browser | |
| | -concurrency
| Number of concurrent requests | |
| | -delay
| Delay between requests (milliseconds) | |
| | -rate-limit
| Maximum number of requests per second | |
| | -timeout
| Timeout for HTTP requests (seconds) | |
| | -max-urls
| Maximum number of URLs to crawl | |
| | -proxy
| HTTP/SOCKS5 proxy to use | |
| | -header
| Custom header to add to all requests | |
| | -cookie
| Custom cookies to add to all requests | |
| | -paths
| Specific paths to crawl | |
| | -paths-file
| File containing paths to crawl | |
| | -store-response
| Store all responses | |
| | -store-response-dir
| Directory to store responses | |
| | -version
| Show Katana version | |
Crawling Scopes
| | Scope | Description | |
| --- | --- |
| | strict
| Crawl only the exact domain provided | |
| | subs
| Crawl the domain and its subdomains | |
| | out-of-scope
| Crawl any domain, regardless of the initial domain | |
Raupenarten
| | Type | Description | |
| --- | --- |
| | standard
| Standard HTTP crawler | |
| | js
| JavaScript parser using headless browser | |
| | sitemap
| Sitemap-based crawler | |
| | robots
| Robots.txt-based crawler | |
Feldoptionen
| | Field | Description | |
| --- | --- |
| | url
| Full URL | |
| | path
| URL path | |
| | method
| HTTP method | |
| | host
| Host part of URL | |
| | fqdn
| Fully qualified domain name | |
| | scheme
| URL scheme (http/https) | |
| | port
| URL port | |
| | query
| Query parameters | |
| | fragment
| URL fragment | |
| | endpoint
| URL endpoint | |
Ressourcen
- [offizielle Dokumentation](__LINK_3___
- [GitHub Repository](_LINK_3__
- [Project Discovery Discord](__LINK_3___
--
*Dieses Betrügereiblatt bietet eine umfassende Referenz für die Verwendung von Katana, von grundlegendem Raupen bis hin zu fortschrittlicher Filterung und Integration mit anderen Werkzeugen. Für die aktuellsten Informationen finden Sie immer die offizielle Dokumentation. *