Pular para o conteúdo

ssdeep

ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.

  • Compute fuzzy hashes using CTPH algorithm
  • Compare files for similarity matching
  • Detect malware variants with minor modifications
  • Batch processing of multiple files
  • Generate fuzzy hash databases
  • Cross-platform support (Linux, Windows, macOS)
  • Malware analysis and variant detection
  • Digital forensics and file comparison
  • Identifying near-duplicate files
  • Detecting code variants in security research
  • File integrity monitoring with tolerance
sudo apt-get update
sudo apt-get install ssdeep
brew install ssdeep

Download from ssdeep sourceforge or install via package manager.

wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install
CommandPurpose
ssdeep file.binCalculate fuzzy hash of a single file
ssdeep -r directory/Recursively hash all files in directory
ssdeep -m hashfile.txt file.binCompare file against known hashes
ssdeep -s hashfile.txt file.binSilent mode comparison (minimal output)
ssdeep -p directory/Recursive mode with pretty-print
ssdeep -d hashfile.txt file.binShow matching hash details
ssdeep -hDisplay help message
ssdeep -VShow version information
ssdeep /path/to/file.bin

Output example:

3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin
ssdeep -r /path/to/directory/ > hashes.txt

This creates a file with all fuzzy hashes in the directory structure.

ssdeep -p -r /malware/samples/ > all_hashes.txt

Produces formatted output with file paths clearly visible.

ssdeep -m known_hashes.txt suspicious_file.bin

Shows similarity percentage if match is found.

ssdeep -m -s known_hashes.txt file.bin

Returns only exit code (0 for match, 1 for no match) without console output.

ssdeep -m -d known_hashes.txt file.bin

Displays detailed information about matched hashes and similarity scores.

for file in /path/to/files/*; do
    echo "=== $file ==="
    ssdeep -m hashes.txt "$file"
done
ssdeep -r -s /path/to/samples/ > malware_db.txt

Silent recursive mode creates a database suitable for later comparisons.

ssdeep -r /new/samples/ >> malware_db.txt

Adds new hashes to an existing database file.

ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/

Store baseline hashes, then perform batch comparisons against them.

# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt  # Verify hash count
# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt

# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin
# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe

# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)
# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
    ssdeep "$f" >> custom_hashes.txt
done
# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt
# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt
ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization

Fuzzy hash database files have simple format:

# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin

Structure:

  • size:hash1:hash2 — fuzzy hash components
  • Similarity percentage shown on comparison
  • One hash per line with comma-separated filename
ScoreMeaning
90-100%Very similar, likely same file or minor variation
75-89%Similar structure, possible variant or derivative
50-74%Moderate similarity, may share code blocks
25-49%Weak similarity, possible common libraries
< 25%Not similar, coincidental match
# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"
#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"

while true; do
    for file in "$MONITOR_DIR"/*.bin; do
        if [ -f "$file" ]; then
            if ssdeep -m -s "$BASELINE" "$file"; then
                echo "$(date): VARIANT DETECTED - $file"
                # Alert or quarantine
            fi
            rm "$file"
        fi
    done
    sleep 300  # Check every 5 minutes
done
#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"

for file in "$SAMPLE_DIR"/*; do
    result=$(ssdeep -m -s "$HASH_DB" "$file")
    if [ $? -eq 0 ]; then
        echo "MATCH: $file"
    fi
done
# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"

# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"
# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt

# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt
# After carving files from disk image
foremost -i disk.img -o carving_results/

# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt
# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt

# Document format for external tools
head -10 upload_hashes.txt
# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt

# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null  # Benchmark speed
  • ssdeep uses minimal memory
  • Suitable for embedded systems and resource-constrained environments
  • No significant slowdown even with thousands of files
TipBenefit
Use -s flagFaster, no pretty-print overhead
Hash to fileAvoids console bottleneck
Pre-filter filesReduce unnecessary hashing
Use SSD storageFaster disk I/O for large batches

Issue: Same file produces different hash each time.

Solution: This shouldn’t happen with ssdeep. Verify:

# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical

Issue: Hashing very large directories is slow.

Solution:

# Use silent mode
ssdeep -r -s /path > output.txt

# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt

# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt

Issue: Hash database becomes too large.

Solution:

# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt

# Compare against specific database
ssdeep -m exes.txt /suspicious.exe
  • Malware variant detection in sandboxed environments
  • Forensic investigation of compromised systems
  • Integrity monitoring with fuzzy matching tolerance
  • Research and analysis within controlled labs

NEVER use ssdeep to:

  • Analyze files without authorization
  • Bypass security measures or protections
  • Violate intellectual property rights
  • Conduct unauthorized security testing

Always obtain proper authorization before analyzing files or systems.

# Fast hash generation
ssdeep -r /path > hashes.txt

# Compare file against database
ssdeep -m hashes.txt /file.bin

# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"

# Pretty print with details
ssdeep -p -m hashes.txt /file.bin

# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt