Katana Web Crawler Feuille de chaleur
Aperçu général
Katana est un cadre de crawling web rapide et personnalisable développé par Project Discovery. Il est conçu pour ramper les sites Web efficacement pour recueillir des informations et découvrir les paramètres. Katana se distingue par sa rapidité, sa flexibilité et sa concentration sur les cas d'utilisation de tests de sécurité.
Ce qui rend Katana unique est sa capacité à ramper intelligemment les applications web modernes, y compris les applications à une page (SPA) qui dépendent fortement de JavaScript. Il peut gérer des technologies Web complexes et extraire des informations précieuses telles que les URL, les fichiers JavaScript, les terminaux API et d'autres actifs Web. Katana est construit en ayant à l'esprit les professionnels de la sécurité, ce qui en fait un excellent outil de reconnaissance lors des évaluations de sécurité et de la chasse au bounty.
Katana prend en charge diverses stratégies de rampage, notamment le rampage standard, l'analyse JavaScript et le rampage basé sur la carte du site. Il peut être personnalisé pour se concentrer sur des types spécifiques de ressources ou suivre des modèles particuliers, le rendant adaptable à différents scénarios de tests de sécurité. L'outil est conçu pour être facilement intégré dans les processus de test de sécurité et peut être combiné avec d'autres outils de découverte de projet pour une reconnaissance complète.
Installation
Utilisation de Go
# Install using Go (requires Go 1.20 or later)
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
# Verify installation
katana -version
Utilisation de Docker
# Pull the latest Docker image
docker pull projectdiscovery/katana:latest
# Run Katana using Docker
docker run -it projectdiscovery/katana:latest -h
```_
### Utilisation de Homebrew (macOS)
```bash
# Install using Homebrew
brew install katana
# Verify installation
katana -version
```_
### Utilisation de PDTM (Project Discovery Tools Manager)
```bash
# Install PDTM first if not already installed
go install -v github.com/projectdiscovery/pdtm/cmd/pdtm@latest
# Install Katana using PDTM
pdtm -i katana
# Verify installation
katana -version
Sur Kali Linux
# Install using apt
sudo apt install katana
# Verify installation
katana -version
Utilisation de base
Dessiner une URL unique
# Crawl a single URL
katana -u https://example.com
# Crawl with increased verbosity
katana -u https://example.com -v
# Crawl with debug information
katana -u https://example.com -debug
Dessiner plusieurs URLs
# Crawl multiple URLs
katana -u https://example.com,https://test.com
# Crawl from a list of URLs
katana -list urls.txt
# Crawl from STDIN
cat urls.txt|katana
Options de sortie
# Save results to a file
katana -u https://example.com -o results.txt
# Output in JSON format
katana -u https://example.com -json -o results.json
# Silent mode (only URLs)
katana -u https://example.com -silent
Options de calibrage
Profondeur et portée
# Set crawling depth (default: 2)
katana -u https://example.com -depth 3
# Crawl subdomains (default: false)
katana -u https://example.com -crawl-scope subs
# Crawl out of scope (default: false)
katana -u https://example.com -crawl-scope out-of-scope
# Crawl only in scope
katana -u https://example.com -crawl-scope strict
Stratégies de calibrage
# Use standard crawler
katana -u https://example.com -crawler standard
# Use JavaScript parser
katana -u https://example.com -crawler js
# Use sitemap-based crawler
katana -u https://example.com -crawler sitemap
# Use robots.txt-based crawler
katana -u https://example.com -crawler robots
# Use all crawlers
katana -u https://example.com -crawler standard,js,sitemap,robots
Sélection du champ
# Display specific fields
katana -u https://example.com -field url,path,method
# Available fields: url, path, method, host, fqdn, scheme, port, query, fragment, endpoint
Utilisation avancée
Filtre URL
# Match URLs by regex
katana -u https://example.com -match-regex "admin|login|dashboard"
# Filter URLs by regex
katana -u https://example.com -filter-regex "logout|static|images"
# Match URLs by condition
katana -u https://example.com -field url -match-condition "contains('admin')"
Filtrage des ressources
# Include specific file extensions
katana -u https://example.com -extension js,php,aspx
# Exclude specific file extensions
katana -u https://example.com -exclude-extension png,jpg,gif
# Include specific MIME types
katana -u https://example.com -mime-type application/json,text/html
Formulaire de remplissage
# Enable automatic form filling
katana -u https://example.com -form-fill
# Use custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
Parsing JavaScript
# Enable JavaScript parsing
katana -u https://example.com -js-crawl
# Set headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 20
# Set browser path
katana -u https://example.com -js-crawl -chrome-path /path/to/chrome
Optimisation des performances
Concurrence et limitation des taux
# Set concurrency (default: 10)
katana -u https://example.com -concurrency 20
# Set delay between requests (milliseconds)
katana -u https://example.com -delay 100
# Set rate limit (requests per second)
katana -u https://example.com -rate-limit 50
Options de délai
# Set timeout for HTTP requests (seconds)
katana -u https://example.com -timeout 10
# Set timeout for headless browser (seconds)
katana -u https://example.com -js-crawl -headless-timeout 30
Optimisation pour les grands balayages
# Disable automatic form filling for faster crawling
katana -u https://example.com -no-form-fill
# Disable JavaScript parsing for faster crawling
katana -u https://example.com -no-js-crawl
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 1000
Intégration avec d'autres outils
Pipeline avec sous-marin
# Find subdomains and crawl them
subfinder -d example.com -silent|katana -silent
# Find subdomains, crawl them, and extract JavaScript files
subfinder -d example.com -silent|katana -silent -extension js
Pipeline avec HTTPX
# Probe URLs and crawl active ones
httpx -l urls.txt -silent|katana -silent
# Crawl and then probe discovered endpoints
katana -u https://example.com -silent|httpx -silent
Pipeline avec Nuclei
# Crawl and scan for vulnerabilities
katana -u https://example.com -silent|nuclei -t cves/
# Crawl, extract JavaScript files, and scan for vulnerabilities
katana -u https://example.com -silent -extension js|nuclei -t exposures/
Personnalisation des sorties
Format de sortie personnalisé
# Output only URLs
katana -u https://example.com -silent
# Output URLs with specific fields
katana -u https://example.com -field url,path,method -o results.txt
# Count discovered URLs
katana -u https://example.com -silent|wc -l
# Sort output alphabetically
katana -u https://example.com -silent|sort
Filtrage Sortie
# Filter by file extension
katana -u https://example.com -silent|grep "\.js$"
# Filter by endpoint pattern
katana -u https://example.com -silent|grep "/api/"
# Find unique domains
katana -u https://example.com -silent|awk -F/ '\\\\{print $3\\\\}'|sort -u
Filtre avancé
Correspondance du modèle d'URL
# Match specific URL patterns
katana -u https://example.com -match-regex "^https://example.com/admin"
# Filter out specific URL patterns
katana -u https://example.com -filter-regex "^https://example.com/static"
# Match URLs containing specific query parameters
katana -u https://example.com -match-regex "id=[0-9]+"
Filtrage du contenu
# Match responses containing specific content
katana -u https://example.com -match-condition "contains(body, 'admin')"
# Filter responses by status code
katana -u https://example.com -match-condition "status == 200"
# Match responses by content type
katana -u https://example.com -match-condition "contains(content_type, 'application/json')"
Options de procuration et de réseau
# Use HTTP proxy
katana -u https://example.com -proxy http://127.0.0.1:8080
# Use SOCKS5 proxy
katana -u https://example.com -proxy socks5://127.0.0.1:1080
# Set custom headers
katana -u https://example.com -header "User-Agent: Mozilla/5.0" -header "Cookie: session=123456"
# Set custom cookies
katana -u https://example.com -cookie "session=123456; user=admin"
Divers Caractéristiques
Remplissage automatique du formulaire
# Enable automatic form filling
katana -u https://example.com -form-fill
# Set custom form values
katana -u https://example.com -form-fill -field-name "username=admin&password=admin"
Tracer des chemins spécifiques
# Crawl specific paths
katana -u https://example.com -paths /admin,/login,/dashboard
# Crawl from a file containing paths
katana -u https://example.com -paths-file paths.txt
Stockage des réponses
# Store all responses
katana -u https://example.com -store-response
# Specify response storage directory
katana -u https://example.com -store-response -store-response-dir responses/
Dépannage
Questions communes
- JavaScript analyse les problèmes
# Increase headless browser timeout
katana -u https://example.com -js-crawl -headless-timeout 30
# Specify Chrome path manually
katana -u https://example.com -js-crawl -chrome-path /usr/bin/google-chrome
```
2. **Limitation des taux par objectif* *
```bash
# Reduce concurrency
katana -u https://example.com -concurrency 5
# Add delay between requests
katana -u https://example.com -delay 500
```
3. **Questions de mémoire**
```bash
# Limit maximum URLs to crawl
katana -u https://example.com -max-urls 500
# Disable JavaScript parsing
katana -u https://example.com -no-js-crawl
```
4. ** Questions de portée générale**
```bash
# Restrict crawling to specific domain
katana -u https://example.com -crawl-scope strict
# Allow crawling subdomains
katana -u https://example.com -crawl-scope subs
```
### Déboguement
```bash
# Enable verbose mode
katana -u https://example.com -v
# Show debug information
katana -u https://example.com -debug
# Show request and response details
katana -u https://example.com -debug -show-request -show-response
Configuration
Fichier de configuration
Katana utilise un fichier de configuration situé à $HOME/.config/katana/config.yaml
. Vous pouvez personnaliser différents paramètres dans ce fichier :
# Example configuration file
concurrency: 10
delay: 100
timeout: 10
max-depth: 3
crawl-scope: strict
crawl-duration: 0
field: url,path,method
extensions: js,php,aspx
Variables d'environnement
# Set Katana configuration via environment variables
export KATANA_CONCURRENCY=10
export KATANA_DELAY=100
export KATANA_TIMEOUT=10
export KATANA_MAX_DEPTH=3
Référence
Options de ligne de commande
Flag | Description |
---|---|
-u, -url |
Target URL to crawl |
-list, -l |
File containing list of URLs to crawl |
-o, -output |
File to write output to |
-json |
Write output in JSON format |
-silent |
Show only URLs in output |
-v, -verbose |
Show verbose output |
-depth |
Maximum depth to crawl (default: 2) |
-crawl-scope |
Crawling scope (strict, subs, out-of-scope) |
-crawler |
Crawler types to use (standard, js, sitemap, robots) |
-field |
Fields to display in output |
-extension |
File extensions to include |
-exclude-extension |
File extensions to exclude |
-match-regex |
Regex pattern to match URLs |
-filter-regex |
Regex pattern to filter URLs |
-match-condition |
Condition to match URLs |
-form-fill |
Enable automatic form filling |
-js-crawl |
Enable JavaScript parsing |
-headless-timeout |
Timeout for headless browser (seconds) |
-chrome-path |
Path to Chrome browser |
-concurrency |
Number of concurrent requests |
-delay |
Delay between requests (milliseconds) |
-rate-limit |
Maximum number of requests per second |
-timeout |
Timeout for HTTP requests (seconds) |
-max-urls |
Maximum number of URLs to crawl |
-proxy |
HTTP/SOCKS5 proxy to use |
-header |
Custom header to add to all requests |
-cookie |
Custom cookies to add to all requests |
-paths |
Specific paths to crawl |
-paths-file |
File containing paths to crawl |
-store-response |
Store all responses |
-store-response-dir |
Directory to store responses |
-version |
Show Katana version |
Champ d'application
Scope | Description |
---|---|
strict |
Crawl only the exact domain provided |
subs |
Crawl the domain and its subdomains |
out-of-scope |
Crawl any domain, regardless of the initial domain |
Types de rampants
Type | Description |
---|---|
standard |
Standard HTTP crawler |
js |
JavaScript parser using headless browser |
sitemap |
Sitemap-based crawler |
robots |
Robots.txt-based crawler |
Options sur le terrain
Field | Description |
---|---|
url |
Full URL |
path |
URL path |
method |
HTTP method |
host |
Host part of URL |
fqdn |
Fully qualified domain name |
scheme |
URL scheme (http/https) |
port |
URL port |
query |
Query parameters |
fragment |
URL fragment |
endpoint |
URL endpoint |
Ressources
- [Documents officiels] (LINK_3)
- [Répertoire GitHub] (LINK_3)
- Discorde de découverte du projet
*Cette feuille de triche fournit une référence complète pour l'utilisation de Katana, du rampage de base au filtrage avancé et l'intégration avec d'autres outils. Pour les informations les plus récentes, veuillez toujours consulter la documentation officielle. *