Zum Inhalt

Katana Web Crawler Cheat Sheet

Überblick

Katana ist ein schnelles und anpassbares Web-Crawling-Framework von Project Discovery. Es ist entworfen, um Websites effizient zu kriechen, um Informationen zu sammeln und Endpunkte zu entdecken. Katana zeichnet sich durch seine Geschwindigkeit, Flexibilität und den Fokus auf Sicherheitstests aus.

Was Katana einzigartig macht, ist seine Fähigkeit, moderne Web-Anwendungen intelligent zu kriechen, einschließlich Single-Seite-Anwendungen (SPAs), die stark auf JavaScript verlassen. Es kann komplexe Webtechnologien behandeln und wertvolle Informationen wie URLs, JavaScript-Dateien, API-Endpunkte und andere Web-Assets extrahieren. Katana ist mit Sicherheitsexperten im Verstand gebaut, so dass es ein ausgezeichnetes Werkzeug für die Aufklärung während der Sicherheitsbewertungen und Bug bounty Jagd.

Katana unterstützt verschiedene Crawling-Strategien, darunter Standard-Crawling, JavaScript-Pasing und Sitemap-basiertes Crawling. Es kann angepasst werden, um sich auf bestimmte Arten von Ressourcen zu konzentrieren oder bestimmte Muster zu folgen, so dass es an verschiedene Sicherheitstests Szenarien anpassen kann. Das Werkzeug ist leicht in Sicherheitstest-Workflows integriert und kann mit anderen Project Discovery-Tools für umfassende Aufklärung kombiniert werden.

Installation

Verwenden Sie Go

```bash

Install using Go (requires Go 1.20 or later)

go install -v github.com/projectdiscovery/katana/cmd/katana@latest

Verify installation

katana -version ```_

Verwendung von Docker

```bash

Pull the latest Docker image

docker pull projectdiscovery/katana:latest

Run Katana using Docker

docker run -it projectdiscovery/katana:latest -h ```_

Verwendung von Homebrew (macOS)

```bash

Install using Homebrew

brew install katana

Verify installation

katana -version ```_

Verwendung von PDTM (Projekt Discovery Tools Manager)

```bash

Install PDTM first if not already installed

go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

Install Katana using PDTM

pdtm -i katana

Verify installation

katana -version ```_

Auf Kali Linux

```bash

Install using apt

sudo apt install katana

Verify installation

katana -version ```_

Basisnutzung

Crawling a Single URL

```bash

Crawl a single URL

katana -u https://example.com

Crawl with increased verbosity

katana -u https://example.com -v

Crawl with debug information

katana -u https://example.com -debug ```_

Crawling Mehrere URLs

```bash

Crawl multiple URLs

katana -u https://example.com,https://test.com

Crawl from a list of URLs

katana -list urls.txt

Crawl from STDIN

cat urls.txt|katana ```_

Ausgabeoptionen

```bash

Save results to a file

katana -u https://example.com -o results.txt

Output in JSON format

katana -u https://example.com -json -o results.json

Silent mode (only URLs)

katana -u https://example.com -silent ```_

Crawling Optionen

Crawling Tiefe und Umfang

```bash

Set crawling depth (default: 2)

katana -u https://example.com -depth 3

Crawl subdomains (default: false)

katana -u https://example.com -crawl-scope subs

Crawl out of scope (default: false)

katana -u https://example.com -crawl-scope out-of-scope

Crawl only in scope

katana -u https://example.com -crawl-scope strict ```_

Crawling Strategien

```bash

Use standard crawler

katana -u https://example.com -crawler standard

Use JavaScript parser

katana -u https://example.com -crawler js

Use sitemap-based crawler

katana -u https://example.com -crawler sitemap

Use robots.txt-based crawler

katana -u https://example.com -crawler robots

Use all crawlers

katana -u https://example.com -crawler standard,js,sitemap,robots ```_

Feldauswahl

```bash

Display specific fields

katana -u https://example.com -field url,path,method

Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint

```_

Erweiterte Nutzung

URL Filtern

```bash

Match URLs by regex

| katana -u https://example.com -match-regex "admin | login | dashboard" |

Filter URLs by regex

| katana -u https://example.com -filter-regex "logout | static | images" |

Match URLs by condition

katana -u https://example.com -field url -match-condition "contains('admin')" ```_

Ressourcenfilterung

```bash

Include specific file extensions

katana -u https://example.com -extension js,php,aspx

Exclude specific file extensions

katana -u https://example.com -exclude-extension png,jpg,gif

Include specific MIME types

katana -u https://example.com -mime-type application/json,text/html ```_

Form Füllung

```bash

Enable automatic form filling

katana -u https://example.com -form-fill

Use custom form values

katana -u https://example.com -form-fill -field-name "username=admin&password;=admin" ```_

JavaScript Parsing

```bash

Enable JavaScript parsing

katana -u https://example.com -js-crawl

Set headless browser timeout

katana -u https://example.com -js-crawl -headless-timeout 20

Set browser path

katana -u https://example.com -js-crawl -chrome-path /path/to/chrome ```_

Leistungsoptimierung

Concurrency und Rate Limiting

```bash

Set concurrency (default: 10)

katana -u https://example.com -concurrency 20

Set delay between requests (milliseconds)

katana -u https://example.com -delay 100

Set rate limit (requests per second)

katana -u https://example.com -rate-limit 50 ```_

Timeout Optionen

```bash

Set timeout for HTTP requests (seconds)

katana -u https://example.com -timeout 10

Set timeout for headless browser (seconds)

katana -u https://example.com -js-crawl -headless-timeout 30 ```_

Optimierung für große Scans

```bash

Disable automatic form filling for faster crawling

katana -u https://example.com -no-form-fill

Disable JavaScript parsing for faster crawling

katana -u https://example.com -no-js-crawl

Limit maximum URLs to crawl

katana -u https://example.com -max-urls 1000 ```_

Integration mit anderen Tools

Pipeline mit Subfinder

```bash

Find subdomains and crawl them

subfinder -d example.com -silent|katana -silent

Find subdomains, crawl them, and extract JavaScript files

subfinder -d example.com -silent|katana -silent -extension js ```_

Pipeline mit HTTPX

```bash

Probe URLs and crawl active ones

httpx -l urls.txt -silent|katana -silent

Crawl and then probe discovered endpoints

katana -u https://example.com -silent|httpx -silent ```_

Pipeline mit Nuclei

```bash

Crawl and scan for vulnerabilities

katana -u https://example.com -silent|nuclei -t cves/

Crawl, extract JavaScript files, and scan for vulnerabilities

katana -u https://example.com -silent -extension js|nuclei -t exposures/ ```_

Produktionsanpassung

Zollausgabe Format

```bash

Output only URLs

katana -u https://example.com -silent

Output URLs with specific fields

katana -u https://example.com -field url,path,method -o results.txt

Count discovered URLs

katana -u https://example.com -silent|wc -l

Sort output alphabetically

katana -u https://example.com -silent|sort ```_

Filterausgang

```bash

Filter by file extension

katana -u https://example.com -silent|grep ".js$"

Filter by endpoint pattern

katana -u https://example.com -silent|grep "/api/"

Find unique domains

| katana -u https://example.com -silent | awk -F/ '\\{print $3\\}' | sort -u | ```_

Erweiterte Filterung

URL Musteranpassung

```bash

Match specific URL patterns

katana -u https://example.com -match-regex "^https://example.com/admin"

Filter out specific URL patterns

katana -u https://example.com -filter-regex "^https://example.com/static"

Match URLs containing specific query parameters

katana -u https://example.com -match-regex "id=[0-9]+" ```_

Inhalt filtern

```bash

Match responses containing specific content

katana -u https://example.com -match-condition "contains(body, 'admin')"

Filter responses by status code

katana -u https://example.com -match-condition "status == 200"

Match responses by content type

katana -u https://example.com -match-condition "contains(content_type, 'application/json')" ```_

Proxy und Netzwerkoptionen

```bash

Use HTTP proxy

katana -u https://example.com -proxy http://127.0.0.1:8080

Use SOCKS5 proxy

katana -u https://example.com -proxy socks5://127.0.0.1:1080

Set custom headers

katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

Set custom cookies

katana -u https://example.com -cookie "session=123456; user=admin" ```_

Verschiedenes Eigenschaften

Automatische Formfüllung

```bash

Enable automatic form filling

katana -u https://example.com -form-fill

Set custom form values

katana -u https://example.com -form-fill -field-name "username=admin&password;=admin" ```_

Crawling Spezifische Pfade

```bash

Crawl specific paths

katana -u https://example.com -paths /admin,/login,/dashboard

Crawl from a file containing paths

katana -u https://example.com -paths-file paths.txt ```_

Antworten zum Thema

```bash

Store all responses

katana -u https://example.com -store-response

Specify response storage directory

katana -u https://example.com -store-response -store-response-dir responses/ ```_

Fehlerbehebung

Gemeinsame Themen

  1. *JavaScript Parsing Issues ```bash # Increase headless browser timeout katana -u https://example.com -js-crawl -headless-timeout 30

# Specify Chrome path manually katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome

```_

  1. *Begrenzung durch Ziel * ```bash # Reduce concurrency katana -u https://example.com -concurrency 5

# Add delay between requests katana -u https://example.com -delay 500

```_

  1. *Memory Issues ```bash # Limit maximum URLs to crawl katana -u https://example.com -max-urls 500

# Disable JavaScript parsing katana -u https://example.com -no-js-crawl

```_

  1. *Crawling Scope Issues ```bash # Restrict crawling to specific domain katana -u https://example.com -crawl-scope strict

# Allow crawling subdomains katana -u https://example.com -crawl-scope subs

```_

Debugging

```bash

Enable verbose mode

katana -u https://example.com -v

Show debug information

katana -u https://example.com -debug

Show request and response details

katana -u https://example.com -debug -show-request -show-response ```_

Konfiguration

Datei konfigurieren

Katana verwendet eine Konfigurationsdatei unter $HOME/.config/katana/config.yaml_. Sie können verschiedene Einstellungen in dieser Datei anpassen:

```yaml

Example configuration file

concurrency: 10 delay: 100 timeout: 10 max-depth: 3 crawl-scope: strict crawl-duration: 0 field: url,path,method extensions: js,php,aspx ```_

Umweltvariablen

```bash

Set Katana configuration via environment variables

export KATANA_CONCURRENCY=10 export KATANA_DELAY=100 export KATANA_TIMEOUT=10 export KATANA_MAX_DEPTH=3 ```_

Sachgebiet

Kommandozeilenoptionen

| | Flag | Description | | | --- | --- | | | -u, -url | Target URL to crawl | | | | -list, -l | File containing list of URLs to crawl | | | | -o, -output | File to write output to | | | | -json | Write output in JSON format | | | | -silent | Show only URLs in output | | | | -v, -verbose | Show verbose output | | | | -depth | Maximum depth to crawl (default: 2) | | | | -crawl-scope | Crawling scope (strict, subs, out-of-scope) | | | | -crawler | Crawler types to use (standard, js, sitemap, robots) | | | | -field | Fields to display in output | | | | -extension | File extensions to include | | | | -exclude-extension | File extensions to exclude | | | | -match-regex | Regex pattern to match URLs | | | | -filter-regex | Regex pattern to filter URLs | | | | -match-condition | Condition to match URLs | | | | -form-fill | Enable automatic form filling | | | | -js-crawl | Enable JavaScript parsing | | | | -headless-timeout | Timeout for headless browser (seconds) | | | | -chrome-path | Path to Chrome browser | | | | -concurrency | Number of concurrent requests | | | | -delay | Delay between requests (milliseconds) | | | | -rate-limit | Maximum number of requests per second | | | | -timeout | Timeout for HTTP requests (seconds) | | | | -max-urls | Maximum number of URLs to crawl | | | | -proxy | HTTP/SOCKS5 proxy to use | | | | -header | Custom header to add to all requests | | | | -cookie | Custom cookies to add to all requests | | | | -paths | Specific paths to crawl | | | | -paths-file | File containing paths to crawl | | | | -store-response | Store all responses | | | | -store-response-dir | Directory to store responses | | | | -version | Show Katana version | |

Crawling Scopes

| | Scope | Description | | | --- | --- | | | strict | Crawl only the exact domain provided | | | | subs | Crawl the domain and its subdomains | | | | out-of-scope | Crawl any domain, regardless of the initial domain | |

Raupenarten

| | Type | Description | | | --- | --- | | | standard | Standard HTTP crawler | | | | js | JavaScript parser using headless browser | | | | sitemap | Sitemap-based crawler | | | | robots | Robots.txt-based crawler | |

Feldoptionen

| | Field | Description | | | --- | --- | | | url | Full URL | | | | path | URL path | | | | method | HTTP method | | | | host | Host part of URL | | | | fqdn | Fully qualified domain name | | | | scheme | URL scheme (http/https) | | | | port | URL port | | | | query | Query parameters | | | | fragment | URL fragment | | | | endpoint | URL endpoint | |

Ressourcen

  • [offizielle Dokumentation](__LINK_3___
  • [GitHub Repository](_LINK_3__
  • [Project Discovery Discord](__LINK_3___

--

*Dieses Betrügereiblatt bietet eine umfassende Referenz für die Verwendung von Katana, von grundlegendem Raupen bis hin zu fortschrittlicher Filterung und Integration mit anderen Werkzeugen. Für die aktuellsten Informationen finden Sie immer die offizielle Dokumentation. *