Katana Web Crawler Cheat Sheet¶

Überblick¶

Katana ist ein schnelles und anpassbares Web-Crawling-Framework von Project Discovery. Es ist entworfen, um Websites effizient zu kriechen, um Informationen zu sammeln und Endpunkte zu entdecken. Katana zeichnet sich durch seine Geschwindigkeit, Flexibilität und den Fokus auf Sicherheitstests aus.

Was Katana einzigartig macht, ist seine Fähigkeit, moderne Web-Anwendungen intelligent zu kriechen, einschließlich Single-Seite-Anwendungen (SPAs), die stark auf JavaScript verlassen. Es kann komplexe Webtechnologien behandeln und wertvolle Informationen wie URLs, JavaScript-Dateien, API-Endpunkte und andere Web-Assets extrahieren. Katana ist mit Sicherheitsexperten im Verstand gebaut, so dass es ein ausgezeichnetes Werkzeug für die Aufklärung während der Sicherheitsbewertungen und Bug bounty Jagd.

Katana unterstützt verschiedene Crawling-Strategien, darunter Standard-Crawling, JavaScript-Pasing und Sitemap-basiertes Crawling. Es kann angepasst werden, um sich auf bestimmte Arten von Ressourcen zu konzentrieren oder bestimmte Muster zu folgen, so dass es an verschiedene Sicherheitstests Szenarien anpassen kann. Das Werkzeug ist leicht in Sicherheitstest-Workflows integriert und kann mit anderen Project Discovery-Tools für umfassende Aufklärung kombiniert werden.

Installation¶

Verwenden Sie Go¶

```bash

Install using Go (requires Go 1.20 or later)¶

go install -v github.com/projectdiscovery/katana/cmd/katana@latest

Verify installation¶

katana -version ```_

Verwendung von Docker¶

```bash

Pull the latest Docker image¶

docker pull projectdiscovery/katana:latest

Run Katana using Docker¶

docker run -it projectdiscovery/katana:latest -h ```_

Verwendung von Homebrew (macOS)¶

```bash

Install using Homebrew¶

brew install katana

Verify installation¶

katana -version ```_

Verwendung von PDTM (Projekt Discovery Tools Manager)¶

```bash

Install PDTM first if not already installed¶

go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

Install Katana using PDTM¶

pdtm -i katana

Verify installation¶

katana -version ```_

Auf Kali Linux¶

```bash

Install using apt¶

sudo apt install katana

Verify installation¶

katana -version ```_

Basisnutzung¶

Crawling a Single URL¶

```bash

Crawl a single URL¶

katana -u https://example.com

Crawl with increased verbosity¶

katana -u https://example.com -v

Crawl with debug information¶

katana -u https://example.com -debug ```_

Crawling Mehrere URLs¶

```bash

Crawl multiple URLs¶

katana -u https://example.com,https://test.com

Crawl from a list of URLs¶

katana -list urls.txt

Crawl from STDIN¶

cat urls.txt|katana ```_

Ausgabeoptionen¶

```bash

Save results to a file¶

katana -u https://example.com -o results.txt

Output in JSON format¶

katana -u https://example.com -json -o results.json

Silent mode (only URLs)¶

katana -u https://example.com -silent ```_

Crawling Optionen¶

Crawling Tiefe und Umfang¶

```bash

Set crawling depth (default: 2)¶

katana -u https://example.com -depth 3

Crawl subdomains (default: false)¶

katana -u https://example.com -crawl-scope subs

Crawl out of scope (default: false)¶

katana -u https://example.com -crawl-scope out-of-scope

Crawl only in scope¶

katana -u https://example.com -crawl-scope strict ```_

Crawling Strategien¶

```bash

Use standard crawler¶

katana -u https://example.com -crawler standard

Use JavaScript parser¶

katana -u https://example.com -crawler js

Use sitemap-based crawler¶

katana -u https://example.com -crawler sitemap

Use robots.txt-based crawler¶

katana -u https://example.com -crawler robots

Use all crawlers¶

katana -u https://example.com -crawler standard,js,sitemap,robots ```_

Feldauswahl¶

```bash

Display specific fields¶

katana -u https://example.com -field url,path,method

Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint¶

```_

Erweiterte Nutzung¶

URL Filtern¶

```bash

Match URLs by regex¶

katana -u https://example.com -match-regex "admin|login|dashboard"

Filter URLs by regex¶

katana -u https://example.com -filter-regex "logout|static|images"

Match URLs by condition¶

katana -u https://example.com -field url -match-condition "contains('admin')" ```_

Ressourcenfilterung¶

```bash

Include specific file extensions¶

katana -u https://example.com -extension js,php,aspx

Exclude specific file extensions¶

katana -u https://example.com -exclude-extension png,jpg,gif

Include specific MIME types¶

katana -u https://example.com -mime-type application/json,text/html ```_

Form Füllung¶

```bash

Enable automatic form filling¶

katana -u https://example.com -form-fill

Use custom form values¶

katana -u https://example.com -form-fill -field-name "username=admin&password=admin" ```_

JavaScript Parsing¶

```bash

Enable JavaScript parsing¶

katana -u https://example.com -js-crawl

Set headless browser timeout¶

katana -u https://example.com -js-crawl -headless-timeout 20

Set browser path¶

katana -u https://example.com -js-crawl -chrome-path /path/to/chrome ```_

Leistungsoptimierung¶

Concurrency und Rate Limiting¶

```bash

Set concurrency (default: 10)¶

katana -u https://example.com -concurrency 20

Set delay between requests (milliseconds)¶

katana -u https://example.com -delay 100

Set rate limit (requests per second)¶

katana -u https://example.com -rate-limit 50 ```_

Timeout Optionen¶

```bash

Set timeout for HTTP requests (seconds)¶

katana -u https://example.com -timeout 10

Set timeout for headless browser (seconds)¶

katana -u https://example.com -js-crawl -headless-timeout 30 ```_

Optimierung für große Scans¶

```bash

Disable automatic form filling for faster crawling¶

katana -u https://example.com -no-form-fill

Disable JavaScript parsing for faster crawling¶

katana -u https://example.com -no-js-crawl

Limit maximum URLs to crawl¶

katana -u https://example.com -max-urls 1000 ```_

Integration mit anderen Tools¶

Pipeline mit Subfinder¶

```bash

Find subdomains and crawl them¶

subfinder -d example.com -silent|katana -silent

Find subdomains, crawl them, and extract JavaScript files¶

subfinder -d example.com -silent|katana -silent -extension js ```_

Pipeline mit HTTPX¶

```bash

Probe URLs and crawl active ones¶

httpx -l urls.txt -silent|katana -silent

Crawl and then probe discovered endpoints¶

katana -u https://example.com -silent|httpx -silent ```_

Pipeline mit Nuclei¶

```bash

Crawl and scan for vulnerabilities¶

katana -u https://example.com -silent|nuclei -t cves/

Crawl, extract JavaScript files, and scan for vulnerabilities¶

katana -u https://example.com -silent -extension js|nuclei -t exposures/ ```_

Produktionsanpassung¶

Zollausgabe Format¶

```bash

Output only URLs¶

katana -u https://example.com -silent

Output URLs with specific fields¶

katana -u https://example.com -field url,path,method -o results.txt

Count discovered URLs¶

katana -u https://example.com -silent|wc -l

Sort output alphabetically¶

katana -u https://example.com -silent|sort ```_

Filterausgang¶

```bash

Filter by file extension¶

katana -u https://example.com -silent|grep ".js$"

Filter by endpoint pattern¶

katana -u https://example.com -silent|grep "/api/"

Find unique domains¶

katana -u https://example.com -silent|awk -F/ '\\{print $3\\}'|sort -u ```_

Erweiterte Filterung¶

URL Musteranpassung¶

```bash

Match specific URL patterns¶

katana -u https://example.com -match-regex "^https://example.com/admin"

Filter out specific URL patterns¶

katana -u https://example.com -filter-regex "^https://example.com/static"

Match URLs containing specific query parameters¶

katana -u https://example.com -match-regex "id=[0-9]+" ```_

Inhalt filtern¶

```bash

Match responses containing specific content¶

katana -u https://example.com -match-condition "contains(body, 'admin')"

Filter responses by status code¶

katana -u https://example.com -match-condition "status == 200"

Match responses by content type¶

katana -u https://example.com -match-condition "contains(content_type, 'application/json')" ```_

Proxy und Netzwerkoptionen¶

```bash

Use HTTP proxy¶

katana -u https://example.com -proxy http://127.0.0.1:8080

Use SOCKS5 proxy¶

katana -u https://example.com -proxy socks5://127.0.0.1:1080

Set custom headers¶

katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

Set custom cookies¶

katana -u https://example.com -cookie "session=123456; user=admin" ```_

Verschiedenes Eigenschaften¶

Automatische Formfüllung¶

```bash

Enable automatic form filling¶

katana -u https://example.com -form-fill

Set custom form values¶

katana -u https://example.com -form-fill -field-name "username=admin&password=admin" ```_

Crawling Spezifische Pfade¶

```bash

Crawl specific paths¶

katana -u https://example.com -paths /admin,/login,/dashboard

Crawl from a file containing paths¶

katana -u https://example.com -paths-file paths.txt ```_

Antworten zum Thema¶

```bash

Store all responses¶

katana -u https://example.com -store-response

Specify response storage directory¶

katana -u https://example.com -store-response -store-response-dir responses/ ```_

Fehlerbehebung¶

Gemeinsame Themen¶

**JavaScript Parsing Issues* ```bash # Increase headless browser timeout katana -u https://example.com -js-crawl -headless-timeout 30

# Specify Chrome path manually katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome ```_

**Begrenzung durch Ziel* * ```bash # Reduce concurrency katana -u https://example.com -concurrency 5

# Add delay between requests katana -u https://example.com -delay 500 ```_

**Memory Issues* ```bash # Limit maximum URLs to crawl katana -u https://example.com -max-urls 500

# Disable JavaScript parsing katana -u https://example.com -no-js-crawl ```_

**Crawling Scope Issues* ```bash # Restrict crawling to specific domain katana -u https://example.com -crawl-scope strict

# Allow crawling subdomains katana -u https://example.com -crawl-scope subs ```_

Debugging¶

```bash

Enable verbose mode¶

katana -u https://example.com -v

Show debug information¶

katana -u https://example.com -debug

Show request and response details¶

katana -u https://example.com -debug -show-request -show-response ```_

Konfiguration¶

Datei konfigurieren¶

Katana verwendet eine Konfigurationsdatei unter $HOME/.config/katana/config.yaml_. Sie können verschiedene Einstellungen in dieser Datei anpassen:

```yaml

Example configuration file¶

concurrency: 10 delay: 100 timeout: 10 max-depth: 3 crawl-scope: strict crawl-duration: 0 field: url,path,method extensions: js,php,aspx ```_

Umweltvariablen¶

```bash

Set Katana configuration via environment variables¶

export KATANA_CONCURRENCY=10 export KATANA_DELAY=100 export KATANA_TIMEOUT=10 export KATANA_MAX_DEPTH=3 ```_

Sachgebiet¶

Kommandozeilenoptionen¶

Flag	Description
`-u, -url`	Target URL to crawl
`-list, -l`	File containing list of URLs to crawl
`-o, -output`	File to write output to
`-json`	Write output in JSON format
`-silent`	Show only URLs in output
`-v, -verbose`	Show verbose output
`-depth`	Maximum depth to crawl (default: 2)
`-crawl-scope`	Crawling scope (strict, subs, out-of-scope)
`-crawler`	Crawler types to use (standard, js, sitemap, robots)
`-field`	Fields to display in output
`-extension`	File extensions to include
`-exclude-extension`	File extensions to exclude
`-match-regex`	Regex pattern to match URLs
`-filter-regex`	Regex pattern to filter URLs
`-match-condition`	Condition to match URLs
`-form-fill`	Enable automatic form filling
`-js-crawl`	Enable JavaScript parsing
`-headless-timeout`	Timeout for headless browser (seconds)
`-chrome-path`	Path to Chrome browser
`-concurrency`	Number of concurrent requests
`-delay`	Delay between requests (milliseconds)
`-rate-limit`	Maximum number of requests per second
`-timeout`	Timeout for HTTP requests (seconds)
`-max-urls`	Maximum number of URLs to crawl
`-proxy`	HTTP/SOCKS5 proxy to use
`-header`	Custom header to add to all requests
`-cookie`	Custom cookies to add to all requests
`-paths`	Specific paths to crawl
`-paths-file`	File containing paths to crawl
`-store-response`	Store all responses
`-store-response-dir`	Directory to store responses
`-version`	Show Katana version

Crawling Scopes¶

Scope	Description
`strict`	Crawl only the exact domain provided
`subs`	Crawl the domain and its subdomains
`out-of-scope`	Crawl any domain, regardless of the initial domain

Raupenarten¶

Type	Description
`standard`	Standard HTTP crawler
`js`	JavaScript parser using headless browser
`sitemap`	Sitemap-based crawler
`robots`	Robots.txt-based crawler

Feldoptionen¶

Field	Description
`url`	Full URL
`path`	URL path
`method`	HTTP method
`host`	Host part of URL
`fqdn`	Fully qualified domain name
`scheme`	URL scheme (http/https)
`port`	URL port
`query`	Query parameters
`fragment`	URL fragment
`endpoint`	URL endpoint

Ressourcen¶

[offizielle Dokumentation](URL_0_
GitHub Repository
[Projekt Discovery Discord](URL_2

--

*Dieses Betrügereiblatt bietet eine umfassende Referenz für die Verwendung von Katana, von grundlegendem Raupen bis hin zu fortschrittlicher Filterung und Integration mit anderen Werkzeugen. Für die aktuellsten Informationen finden Sie immer die offizielle Dokumentation. *