Aller au contenu

DumpsterDiver

DumpsterDiver is a tool designed to search through large volumes of data to identify sensitive information including API keys, passwords, hardcoded credentials, and other secrets. It’s useful for security audits, compliance scanning, and identifying exposed credentials in code repositories and data dumps.

Installation

Install from GitHub

git clone https://github.com/maximumG/DumpsterDiver.git
cd DumpsterDiver
python3 -m pip install -r requirements.txt

Using pip

pip3 install dumpster-diver

Docker Installation

docker build -t dumpsterdiver .
docker run -v /path/to/scan:/data dumpsterdiver /data

System Requirements

# Python 3.6 or higher
python3 --version

# Install dependencies
pip3 install pyyaml requests

Basic Usage

Scan a Directory

python3 DumpsterDiver.py -p /path/to/directory

Scan a Single File

python3 DumpsterDiver.py -p /path/to/file.txt

Scan Git Repository

python3 DumpsterDiver.py -p /path/to/repo -r

Use Custom Rules File

python3 DumpsterDiver.py -p /path/to/scan -c custom_rules.yaml

Command-Line Options

OptionDescription
-p, --pathPath to file or directory to scan
-r, --recursiveRecursively scan subdirectories
-c, --configUse custom configuration/rules file
-o, --outputOutput file for results
-j, --jsonOutput results in JSON format
-s, --sensitiveShow sensitive content in results
--verboseEnable verbose output
--ignoreIgnore specific patterns
-e, --entropyCalculate entropy for detection

Practical Examples

Scan Project Directory for Secrets

python3 DumpsterDiver.py -p /home/user/projects -r

Scan and Save Results to File

python3 DumpsterDiver.py -p /var/www/html -o findings.txt

Scan with JSON Output for Processing

python3 DumpsterDiver.py -p /app/source -j -o results.json

Scan Git History for Exposed Secrets

git clone https://github.com/user/repo.git
python3 DumpsterDiver.py -p repo -r --git-history

Verbose Scanning with Details

python3 DumpsterDiver.py -p /code -r --verbose

Scan with Custom Rules

python3 DumpsterDiver.py -p /project -c my_rules.yaml -r

Detection Patterns

DumpsterDiver detects common secret patterns:

Secret TypePatternExample
AWS KeysAKIA[0-9A-Z]{16}AKIA2EXAMPLE123456
API Keysapi[_-]?keyapi_key=abc123xyz
Passwordspassword\s*=password = “secret123”
Tokenstoken|authauth_token: xyz789
SSH KeysBEGIN RSA-----BEGIN RSA PRIVATE KEY-----
Slack Tokensxox[baprs]xoxb-1234567890-abcdefghij
GitHub Tokensghp_[A-Za-z0-9_]{36,255}ghp_example123token
Database URLs(mysql|postgres):\/\/mysql://user:pass@host

Custom Rules Configuration

Create Custom Rules File

# custom_rules.yaml
rules:
  - name: "Custom API Key Pattern"
    pattern: "custom_api_[a-zA-Z0-9]{32}"
    entropy: 4.0
    type: "credentials"
    
  - name: "Internal Secret"
    pattern: "INTERNAL_SECRET_[A-Z0-9]{16}"
    entropy: 3.5
    type: "secret"
    
  - name: "Database Connection"
    pattern: "DB_PASSWORD=.*"
    entropy: 3.0
    type: "database"

Run with Custom Rules

python3 DumpsterDiver.py -p /app -c custom_rules.yaml -r

Advanced Techniques

Entropy-Based Detection

# Detect suspicious strings with high entropy
python3 DumpsterDiver.py -p /code -e --entropy-threshold 4.5

Scan Multiple Directories

# Create scan script
#!/bin/bash
for dir in /app /config /home/user; do
  python3 DumpsterDiver.py -p $dir -o result_$dir.txt
done

Git Repository Secret Hunting

# Clone and scan entire git history
git clone --mirror https://github.com/user/repo.git
python3 DumpsterDiver.py -p repo.git -r --git-history

Filter Results by Confidence

python3 DumpsterDiver.py -p /source -j | jq '.results[] | select(.confidence > 0.8)'

Parallel Scanning

# Use GNU Parallel for faster scanning
parallel python3 DumpsterDiver.py -p {} ::: /path1 /path2 /path3

Output Analysis

Parse JSON Results

# Extract only high-confidence findings
python3 DumpsterDiver.py -p /app -j -o findings.json
cat findings.json | jq '.[] | select(.confidence >= 0.9)'

Generate Report

python3 DumpsterDiver.py -p /app -o results.txt
cat results.txt | grep -E "^(File|Match|Pattern)" > report.txt

Count Findings by Type

python3 DumpsterDiver.py -p /code -j -o findings.json
jq '.[] | .type' findings.json | sort | uniq -c

Integration with CI/CD

GitHub Actions Integration

name: Secret Detection
on: [push, pull_request]
jobs:
  secrets:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run DumpsterDiver
        run: |
          git clone https://github.com/maximumG/DumpsterDiver.git
          cd DumpsterDiver
          python3 -m pip install -r requirements.txt
          python3 DumpsterDiver.py -p .. -j -o findings.json
      - name: Check findings
        run: |
          if [ -s findings.json ]; then
            cat findings.json
            exit 1
          fi

GitLab CI Integration

secret_scan:
  image: python:3.9
  script:
    - git clone https://github.com/maximumG/DumpsterDiver.git
    - cd DumpsterDiver
    - pip install -r requirements.txt
    - python3 DumpsterDiver.py -p .. -j -o findings.json
    - "[ ! -s findings.json ] || (cat findings.json && exit 1)"

Troubleshooting

Module Not Found

# Install missing dependencies
pip3 install pyyaml requests regex

# Verify installation
python3 -c "import DumpsterDiver"

Permission Denied on Files

# Run with appropriate permissions
sudo python3 DumpsterDiver.py -p /restricted/path -r

Out of Memory on Large Directories

# Scan specific subdirectories instead
python3 DumpsterDiver.py -p /large/path/subdir1 -r
python3 DumpsterDiver.py -p /large/path/subdir2 -r

No Results Found

# Verify patterns are correct
python3 DumpsterDiver.py -p /path --verbose
# Check if directory contains actual secrets
grep -r "password\|api_key\|token" /path | head

Security Best Practices

Handle Findings Responsibly

# Store results securely
python3 DumpsterDiver.py -p /app -o findings.txt
chmod 600 findings.txt
# Encrypt sensitive report
gpg -c findings.txt

Remediate Exposed Secrets

# After finding exposed credentials:
# 1. Rotate all exposed secrets immediately
# 2. Scan git history for exposure timeline
# 3. Update secrets management practices
# 4. Re-scan to verify remediation
python3 DumpsterDiver.py -p /app -r

Regular Scanning Schedule

# Add to crontab for regular scanning
0 2 * * * /usr/bin/python3 /opt/DumpsterDiver/DumpsterDiver.py -p /app -r -o /var/log/dumpster_$(date +%Y%m%d).txt

Comparison with Similar Tools

ToolFocusMethod
DumpsterDiverLarge data volumesPattern + entropy
TruffleHogGit historyEntropy + regex
GitGuardianGit monitoringAPI patterns
SAST ToolsCode analysisStatic analysis
git-secretsGit hooksPattern matching

Common Secret Patterns to Monitor

Environment Variables

# Scan for unprotected env vars
python3 DumpsterDiver.py -p /app -c patterns/env_vars.yaml

Configuration Files

# Focus on config file patterns
python3 DumpsterDiver.py -p /etc --include="*.conf" --include="*.yaml"

Backup Files

# Check backup directories
python3 DumpsterDiver.py -p /backups -r

Log Files

# Scan logs for leaked credentials
python3 DumpsterDiver.py -p /var/log -r --include="*.log"

Summary

DumpsterDiver is an essential tool for identifying exposed secrets and sensitive data in code repositories, configuration files, and data dumps. Its flexible pattern matching and entropy-based detection help organizations find credentials that may have been accidentally committed or exposed. Regular scanning as part of security audits and CI/CD pipelines helps maintain strong credential hygiene.