ssdeep
Overview
Sección titulada «Overview»ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.
Key Features
Sección titulada «Key Features»- Compute fuzzy hashes using CTPH algorithm
- Compare files for similarity matching
- Detect malware variants with minor modifications
- Batch processing of multiple files
- Generate fuzzy hash databases
- Cross-platform support (Linux, Windows, macOS)
Use Cases
Sección titulada «Use Cases»- Malware analysis and variant detection
- Digital forensics and file comparison
- Identifying near-duplicate files
- Detecting code variants in security research
- File integrity monitoring with tolerance
Installation
Sección titulada «Installation»Linux/Debian-based
Sección titulada «Linux/Debian-based»sudo apt-get update
sudo apt-get install ssdeep
brew install ssdeep
Windows
Sección titulada «Windows»Download from ssdeep sourceforge or install via package manager.
Build from Source
Sección titulada «Build from Source»wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install
Basic Commands
Sección titulada «Basic Commands»| Command | Purpose |
|---|---|
ssdeep file.bin | Calculate fuzzy hash of a single file |
ssdeep -r directory/ | Recursively hash all files in directory |
ssdeep -m hashfile.txt file.bin | Compare file against known hashes |
ssdeep -s hashfile.txt file.bin | Silent mode comparison (minimal output) |
ssdeep -p directory/ | Recursive mode with pretty-print |
ssdeep -d hashfile.txt file.bin | Show matching hash details |
ssdeep -h | Display help message |
ssdeep -V | Show version information |
Computing Fuzzy Hashes
Sección titulada «Computing Fuzzy Hashes»Single File Hash
Sección titulada «Single File Hash»ssdeep /path/to/file.bin
Output example:
3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin
Recursive Directory Hash
Sección titulada «Recursive Directory Hash»ssdeep -r /path/to/directory/ > hashes.txt
This creates a file with all fuzzy hashes in the directory structure.
Pretty-Print Recursive Hash
Sección titulada «Pretty-Print Recursive Hash»ssdeep -p -r /malware/samples/ > all_hashes.txt
Produces formatted output with file paths clearly visible.
Comparing Files and Hashes
Sección titulada «Comparing Files and Hashes»Compare Single File Against Hash Database
Sección titulada «Compare Single File Against Hash Database»ssdeep -m known_hashes.txt suspicious_file.bin
Shows similarity percentage if match is found.
Silent Mode Comparison (for scripting)
Sección titulada «Silent Mode Comparison (for scripting)»ssdeep -m -s known_hashes.txt file.bin
Returns only exit code (0 for match, 1 for no match) without console output.
Detailed Match Information
Sección titulada «Detailed Match Information»ssdeep -m -d known_hashes.txt file.bin
Displays detailed information about matched hashes and similarity scores.
Batch Compare Multiple Files
Sección titulada «Batch Compare Multiple Files»for file in /path/to/files/*; do
echo "=== $file ==="
ssdeep -m hashes.txt "$file"
done
Creating Hash Databases
Sección titulada «Creating Hash Databases»Generate Hash Database from Directory
Sección titulada «Generate Hash Database from Directory»ssdeep -r -s /path/to/samples/ > malware_db.txt
Silent recursive mode creates a database suitable for later comparisons.
Append Hashes to Existing Database
Sección titulada «Append Hashes to Existing Database»ssdeep -r /new/samples/ >> malware_db.txt
Adds new hashes to an existing database file.
Create Indexed Hash Database
Sección titulada «Create Indexed Hash Database»ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/
Store baseline hashes, then perform batch comparisons against them.
Malware Analysis Workflow
Sección titulada «Malware Analysis Workflow»Establish Known Malware Baseline
Sección titulada «Establish Known Malware Baseline»# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt # Verify hash count
Analyze Suspicious File
Sección titulada «Analyze Suspicious File»# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt
# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin
Detect Variants
Sección titulada «Detect Variants»# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe
# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)
Advanced Usage
Sección titulada «Advanced Usage»Exclude Files by Pattern
Sección titulada «Exclude Files by Pattern»# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
ssdeep "$f" >> custom_hashes.txt
done
Hash Only Specific File Types
Sección titulada «Hash Only Specific File Types»# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt
Compare Two Hash Databases
Sección titulada «Compare Two Hash Databases»# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt
Generate HTML Report
Sección titulada «Generate HTML Report»ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization
Hash File Format
Sección titulada «Hash File Format»Fuzzy hash database files have simple format:
# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin
Structure:
size:hash1:hash2— fuzzy hash components- Similarity percentage shown on comparison
- One hash per line with comma-separated filename
Similarity Matching Thresholds
Sección titulada «Similarity Matching Thresholds»Interpreting Match Scores
Sección titulada «Interpreting Match Scores»| Score | Meaning |
|---|---|
| 90-100% | Very similar, likely same file or minor variation |
| 75-89% | Similar structure, possible variant or derivative |
| 50-74% | Moderate similarity, may share code blocks |
| 25-49% | Weak similarity, possible common libraries |
| < 25% | Not similar, coincidental match |
Setting Match Confidence
Sección titulada «Setting Match Confidence»# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"
Automation and Scripting
Sección titulada «Automation and Scripting»Monitor Directory for New Malware Variants
Sección titulada «Monitor Directory for New Malware Variants»#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"
while true; do
for file in "$MONITOR_DIR"/*.bin; do
if [ -f "$file" ]; then
if ssdeep -m -s "$BASELINE" "$file"; then
echo "$(date): VARIANT DETECTED - $file"
# Alert or quarantine
fi
rm "$file"
fi
done
sleep 300 # Check every 5 minutes
done
Bulk Hash and Compare
Sección titulada «Bulk Hash and Compare»#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"
for file in "$SAMPLE_DIR"/*; do
result=$(ssdeep -m -s "$HASH_DB" "$file")
if [ $? -eq 0 ]; then
echo "MATCH: $file"
fi
done
Generate Statistical Analysis
Sección titulada «Generate Statistical Analysis»# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"
# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"
Integration with Other Tools
Sección titulada «Integration with Other Tools»Use with YARA for Enhanced Detection
Sección titulada «Use with YARA for Enhanced Detection»# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt
# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt
Combine with File Carving
Sección titulada «Combine with File Carving»# After carving files from disk image
foremost -i disk.img -o carving_results/
# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt
Export for Analysis Platforms
Sección titulada «Export for Analysis Platforms»# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt
# Document format for external tools
head -10 upload_hashes.txt
Performance Considerations
Sección titulada «Performance Considerations»Large-Scale Hashing
Sección titulada «Large-Scale Hashing»# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt
# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null # Benchmark speed
Memory Usage
Sección titulada «Memory Usage»- ssdeep uses minimal memory
- Suitable for embedded systems and resource-constrained environments
- No significant slowdown even with thousands of files
Optimization Tips
Sección titulada «Optimization Tips»| Tip | Benefit |
|---|---|
Use -s flag | Faster, no pretty-print overhead |
| Hash to file | Avoids console bottleneck |
| Pre-filter files | Reduce unnecessary hashing |
| Use SSD storage | Faster disk I/O for large batches |
Troubleshooting
Sección titulada «Troubleshooting»Hash Mismatch Despite Similarity
Sección titulada «Hash Mismatch Despite Similarity»Issue: Same file produces different hash each time.
Solution: This shouldn’t happen with ssdeep. Verify:
# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical
Performance Issues with Large Directories
Sección titulada «Performance Issues with Large Directories»Issue: Hashing very large directories is slow.
Solution:
# Use silent mode
ssdeep -r -s /path > output.txt
# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt
# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt
Database Management
Sección titulada «Database Management»Issue: Hash database becomes too large.
Solution:
# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt
# Compare against specific database
ssdeep -m exes.txt /suspicious.exe
Security Considerations
Sección titulada «Security Considerations»Legitimate Uses (Authorized Testing)
Sección titulada «Legitimate Uses (Authorized Testing)»- Malware variant detection in sandboxed environments
- Forensic investigation of compromised systems
- Integrity monitoring with fuzzy matching tolerance
- Research and analysis within controlled labs
Warning
Sección titulada «Warning»NEVER use ssdeep to:
- Analyze files without authorization
- Bypass security measures or protections
- Violate intellectual property rights
- Conduct unauthorized security testing
Always obtain proper authorization before analyzing files or systems.
References
Sección titulada «References»- Official Project: ssdeep SourceForge
- CTPH Algorithm: Jesse Kornblum’s Research
- Academic Paper: Context Triggered Piecewise Hashing
- Documentation:
man ssdeepon Linux/Unix systems
Quick Reference
Sección titulada «Quick Reference»# Fast hash generation
ssdeep -r /path > hashes.txt
# Compare file against database
ssdeep -m hashes.txt /file.bin
# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"
# Pretty print with details
ssdeep -p -m hashes.txt /file.bin
# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt