Zum Inhalt

Katana Web Crawler Cheat Blatt

Im Überblick

Katana ist ein schnelles und anpassbares Web-Crawling-Framework von Project Discovery. Es ist entworfen, um Websites effizient zu kriechen, um Informationen zu sammeln und Endpunkte zu entdecken. Katana zeichnet sich durch seine Geschwindigkeit, Flexibilität und den Fokus auf Sicherheitstests aus.

Was Katana einzigartig macht, ist seine Fähigkeit, moderne Web-Anwendungen intelligent zu kriechen, einschließlich Single-Seite-Anwendungen (SPAs), die stark auf JavaScript verlassen. Es kann komplexe Webtechnologien behandeln und wertvolle Informationen wie URLs, JavaScript-Dateien, API-Endpunkte und andere Web-Assets extrahieren. Katana ist mit Sicherheitsexperten im Verstand gebaut, so dass es ein ausgezeichnetes Werkzeug für die Aufklärung während der Sicherheitsbewertungen und Bug bounty Jagd.

Katana unterstützt verschiedene Crawling-Strategien, darunter Standard-Crawling, JavaScript-Pasing und Sitemap-basiertes Crawling. Es kann angepasst werden, um sich auf bestimmte Arten von Ressourcen zu konzentrieren oder bestimmte Muster zu folgen, so dass es an verschiedene Sicherheitstests Szenarien anpassen kann. Das Werkzeug ist leicht in Sicherheitstest-Workflows integriert und kann mit anderen Project Discovery-Tools für umfassende Aufklärung kombiniert werden.

• Installation

Verwenden von Go

# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest

# Verify installation
katana -version

Verwenden von Docker

# Pull the latest Docker image
docker pull projectdiscovery/katana:latest

# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h

Verwendung von Homebrew (macOS)

# Install using Homebrew
brew install katana

# Verify installation
katana -version

Verwenden von PDTM (Projekt Discovery Tools Manager)

# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest

# Install Katana using PDTM
pdtm -i katana

# Verify installation
katana -version

Auf Kali Linux

# Install using apt
sudo apt install katana

# Verify installation
katana -version

oder Basisnutzung

Crawling a Single URL

# Crawl a single URL
katana -u https://example.com

# Crawl with increased verbosity
katana -u https://example.com -v

# Crawl with debug information
katana -u https://example.com -debug

Crawling Mehrere URLs

# Crawl multiple URLs
katana -u https://example.com,https://test.com

# Crawl from a list of URLs
katana -list urls.txt

# Crawl from STDIN
cat urls.txt|katana

Ausgabeoptionen

# Save results to a file
katana -u https://example.com -o results.txt

# Output in JSON format
katana -u https://example.com -json -o results.json

# Silent mode (only URLs)
katana -u https://example.com -silent

Crawling Optionen

Crawling Tiefe und Umfang

# Set crawling depth (default: 2)
katana -u https://example.com -depth 3

# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs

# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope

# Crawl only in scope
katana -u https://example.com -crawl-scope strict

Crawling Strategies

# Use standard crawler
katana -u https://example.com -crawler standard

# Use JavaScript parser
katana -u https://example.com -crawler js

# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap

# Use robots.txt-based crawler
katana -u https://example.com -crawler robots

# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots

Feldauswahl

# Display specific fields
katana -u https://example.com -field url,path,method

# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint

/ Fortgeschrittene Nutzung

URL Filtern

# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"

# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"

# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"

Ressourcenfilterung

# Include specific file extensions
katana -u https://example.com -extension js,php,aspx

# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif

# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html

Form Filling

# Enable automatic form filling
katana -u https://example.com -form-fill

# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

JavaScript Parsing

# Enable JavaScript parsing
katana -u https://example.com -js-crawl

# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20

# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome

 Leistungsoptimierung

Concurrency and Rate Limiting

# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20

# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100

# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50

Timeout Optionen

# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10

# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30

Optimierung für große Scans

# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill

# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl

# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000

Integration mit anderen Tools

Pipeline mit Subfinder

# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent

# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js

Pipeline mit HTTPX

# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent

# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent

Pipeline mit Nuclei

# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/

# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/

/ Output Customization

Individuelle Ausgabeformat

# Output only URLs
katana -u https://example.com -silent

# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt

# Count discovered URLs
katana -u https://example.com -silent|wc -l

# Sort output alphabetically
katana -u https://example.com -silent|sort

Filterausgang

# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"

# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"

# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u

Erweiterte Filterung

URL Pattern Matching

# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"

# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"

# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"

Inhalt filtern

# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"

# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"

# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"

Proxy und Netzwerkoptionen

# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080

# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080

# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"

# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"

In den Warenkorb Eigenschaften

Automatische Formfüllung

# Enable automatic form filling
katana -u https://example.com -form-fill

# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"

Crawling Specific Paths

# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard

# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt

Antworten speichern

# Store all responses
katana -u https://example.com -store-response

# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/

Fehlerbehebung

Häufige Fragen

ANHANG **JavaScript Parsing Issues*

   # Increase headless browser timeout
   katana -u https://example.com -js-crawl -headless-timeout 30

   # Specify Chrome path manually
   katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
   ```_

2. **Begrenzung durch Ziel* *
```bash
   # Reduce concurrency
   katana -u https://example.com -concurrency 5

   # Add delay between requests
   katana -u https://example.com -delay 500
   ```_

3. **Memory Issues*
```bash
   # Limit maximum URLs to crawl
   katana -u https://example.com -max-urls 500

   # Disable JavaScript parsing
   katana -u https://example.com -no-js-crawl
   ```_

4. **Crawling Scope Issues*
```bash
   # Restrict crawling to specific domain
   katana -u https://example.com -crawl-scope strict

   # Allow crawling subdomains
   katana -u https://example.com -crawl-scope subs
   ```_

### Debugging

```bash
# Enable verbose mode
katana -u https://example.com -v

# Show debug information
katana -u https://example.com -debug

# Show request and response details
katana -u https://example.com -debug -show-request -show-response

Konfiguration

Konfigurationsdatei

Katana verwendet eine Konfigurationsdatei unter $HOME/.config/katana/config.yaml_. Sie können verschiedene Einstellungen in dieser Datei anpassen:

# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx

Umgebungsvariablen

# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3

Referenz

Kommandozeilenoptionen

Flag Description
INLINE_CODE_37 Target URL to crawl
INLINE_CODE_38 File containing list of URLs to crawl
INLINE_CODE_39 File to write output to
INLINE_CODE_40 Write output in JSON format
INLINE_CODE_41 Show only URLs in output
INLINE_CODE_42 Show verbose output
INLINE_CODE_43 Maximum depth to crawl (default: 2)
INLINE_CODE_44 Crawling scope (strict, subs, out-of-scope)
INLINE_CODE_45 Crawler types to use (standard, js, sitemap, robots)
INLINE_CODE_46 Fields to display in output
INLINE_CODE_47 File extensions to include
INLINE_CODE_48 File extensions to exclude
INLINE_CODE_49 Regex pattern to match URLs
INLINE_CODE_50 Regex pattern to filter URLs
INLINE_CODE_51 Condition to match URLs
INLINE_CODE_52 Enable automatic form filling
INLINE_CODE_53 Enable JavaScript parsing
INLINE_CODE_54 Timeout for headless browser (seconds)
INLINE_CODE_55 Path to Chrome browser
INLINE_CODE_56 Number of concurrent requests
INLINE_CODE_57 Delay between requests (milliseconds)
INLINE_CODE_58 Maximum number of requests per second
INLINE_CODE_59 Timeout for HTTP requests (seconds)
INLINE_CODE_60 Maximum number of URLs to crawl
INLINE_CODE_61 HTTP/SOCKS5 proxy to use
INLINE_CODE_62 Custom header to add to all requests
INLINE_CODE_63 Custom cookies to add to all requests
INLINE_CODE_64 Specific paths to crawl
INLINE_CODE_65 File containing paths to crawl
INLINE_CODE_66 Store all responses
INLINE_CODE_67 Directory to store responses
INLINE_CODE_68 Show Katana version
_
### Crawling Scopes
Scope Description
INLINE_CODE_69 Crawl only the exact domain provided
INLINE_CODE_70 Crawl the domain and its subdomains
INLINE_CODE_71 Crawl any domain, regardless of the initial domain

Crawler Typen

Type Description
INLINE_CODE_72 Standard HTTP crawler
INLINE_CODE_73 JavaScript parser using headless browser
INLINE_CODE_74 Sitemap-based crawler
INLINE_CODE_75 Robots.txt-based crawler
_
### Feldoptionen
Field Description
INLINE_CODE_76 Full URL
INLINE_CODE_77 URL path
INLINE_CODE_78 HTTP method
INLINE_CODE_79 Host part of URL
INLINE_CODE_80 Fully qualified domain name
INLINE_CODE_81 URL scheme (http/https)
INLINE_CODE_82 URL port
INLINE_CODE_83 Query parameters
INLINE_CODE_84 URL fragment
INLINE_CODE_85 URL endpoint

Ressourcen

--

*Dieses Betrügereiblatt bietet eine umfassende Referenz für die Verwendung von Katana, von grundlegendem Raupen bis hin zu fortschrittlicher Filterung und Integration mit anderen Werkzeugen. Für die aktuellsten Informationen finden Sie immer die offizielle Dokumentation. *