ssdeep
Overview
Sezione intitolata “Overview”ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.
Key Features
Sezione intitolata “Key Features”- Compute fuzzy hashes using CTPH algorithm
- Compare files for similarity matching
- Detect malware variants with minor modifications
- Batch processing of multiple files
- Generate fuzzy hash databases
- Cross-platform support (Linux, Windows, macOS)
Use Cases
Sezione intitolata “Use Cases”- Malware analysis and variant detection
- Digital forensics and file comparison
- Identifying near-duplicate files
- Detecting code variants in security research
- File integrity monitoring with tolerance
Installation
Sezione intitolata “Installation”Linux/Debian-based
Sezione intitolata “Linux/Debian-based”sudo apt-get update
sudo apt-get install ssdeep
brew install ssdeep
Windows
Sezione intitolata “Windows”Download from ssdeep sourceforge or install via package manager.
Build from Source
Sezione intitolata “Build from Source”wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install
Basic Commands
Sezione intitolata “Basic Commands”| Command | Purpose |
|---|---|
ssdeep file.bin | Calculate fuzzy hash of a single file |
ssdeep -r directory/ | Recursively hash all files in directory |
ssdeep -m hashfile.txt file.bin | Compare file against known hashes |
ssdeep -s hashfile.txt file.bin | Silent mode comparison (minimal output) |
ssdeep -p directory/ | Recursive mode with pretty-print |
ssdeep -d hashfile.txt file.bin | Show matching hash details |
ssdeep -h | Display help message |
ssdeep -V | Show version information |
Computing Fuzzy Hashes
Sezione intitolata “Computing Fuzzy Hashes”Single File Hash
Sezione intitolata “Single File Hash”ssdeep /path/to/file.bin
Output example:
3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin
Recursive Directory Hash
Sezione intitolata “Recursive Directory Hash”ssdeep -r /path/to/directory/ > hashes.txt
This creates a file with all fuzzy hashes in the directory structure.
Pretty-Print Recursive Hash
Sezione intitolata “Pretty-Print Recursive Hash”ssdeep -p -r /malware/samples/ > all_hashes.txt
Produces formatted output with file paths clearly visible.
Comparing Files and Hashes
Sezione intitolata “Comparing Files and Hashes”Compare Single File Against Hash Database
Sezione intitolata “Compare Single File Against Hash Database”ssdeep -m known_hashes.txt suspicious_file.bin
Shows similarity percentage if match is found.
Silent Mode Comparison (for scripting)
Sezione intitolata “Silent Mode Comparison (for scripting)”ssdeep -m -s known_hashes.txt file.bin
Returns only exit code (0 for match, 1 for no match) without console output.
Detailed Match Information
Sezione intitolata “Detailed Match Information”ssdeep -m -d known_hashes.txt file.bin
Displays detailed information about matched hashes and similarity scores.
Batch Compare Multiple Files
Sezione intitolata “Batch Compare Multiple Files”for file in /path/to/files/*; do
echo "=== $file ==="
ssdeep -m hashes.txt "$file"
done
Creating Hash Databases
Sezione intitolata “Creating Hash Databases”Generate Hash Database from Directory
Sezione intitolata “Generate Hash Database from Directory”ssdeep -r -s /path/to/samples/ > malware_db.txt
Silent recursive mode creates a database suitable for later comparisons.
Append Hashes to Existing Database
Sezione intitolata “Append Hashes to Existing Database”ssdeep -r /new/samples/ >> malware_db.txt
Adds new hashes to an existing database file.
Create Indexed Hash Database
Sezione intitolata “Create Indexed Hash Database”ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/
Store baseline hashes, then perform batch comparisons against them.
Malware Analysis Workflow
Sezione intitolata “Malware Analysis Workflow”Establish Known Malware Baseline
Sezione intitolata “Establish Known Malware Baseline”# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt # Verify hash count
Analyze Suspicious File
Sezione intitolata “Analyze Suspicious File”# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt
# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin
Detect Variants
Sezione intitolata “Detect Variants”# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe
# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)
Advanced Usage
Sezione intitolata “Advanced Usage”Exclude Files by Pattern
Sezione intitolata “Exclude Files by Pattern”# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
ssdeep "$f" >> custom_hashes.txt
done
Hash Only Specific File Types
Sezione intitolata “Hash Only Specific File Types”# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt
Compare Two Hash Databases
Sezione intitolata “Compare Two Hash Databases”# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt
Generate HTML Report
Sezione intitolata “Generate HTML Report”ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization
Hash File Format
Sezione intitolata “Hash File Format”Fuzzy hash database files have simple format:
# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin
Structure:
size:hash1:hash2— fuzzy hash components- Similarity percentage shown on comparison
- One hash per line with comma-separated filename
Similarity Matching Thresholds
Sezione intitolata “Similarity Matching Thresholds”Interpreting Match Scores
Sezione intitolata “Interpreting Match Scores”| Score | Meaning |
|---|---|
| 90-100% | Very similar, likely same file or minor variation |
| 75-89% | Similar structure, possible variant or derivative |
| 50-74% | Moderate similarity, may share code blocks |
| 25-49% | Weak similarity, possible common libraries |
| < 25% | Not similar, coincidental match |
Setting Match Confidence
Sezione intitolata “Setting Match Confidence”# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"
Automation and Scripting
Sezione intitolata “Automation and Scripting”Monitor Directory for New Malware Variants
Sezione intitolata “Monitor Directory for New Malware Variants”#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"
while true; do
for file in "$MONITOR_DIR"/*.bin; do
if [ -f "$file" ]; then
if ssdeep -m -s "$BASELINE" "$file"; then
echo "$(date): VARIANT DETECTED - $file"
# Alert or quarantine
fi
rm "$file"
fi
done
sleep 300 # Check every 5 minutes
done
Bulk Hash and Compare
Sezione intitolata “Bulk Hash and Compare”#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"
for file in "$SAMPLE_DIR"/*; do
result=$(ssdeep -m -s "$HASH_DB" "$file")
if [ $? -eq 0 ]; then
echo "MATCH: $file"
fi
done
Generate Statistical Analysis
Sezione intitolata “Generate Statistical Analysis”# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"
# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"
Integration with Other Tools
Sezione intitolata “Integration with Other Tools”Use with YARA for Enhanced Detection
Sezione intitolata “Use with YARA for Enhanced Detection”# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt
# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt
Combine with File Carving
Sezione intitolata “Combine with File Carving”# After carving files from disk image
foremost -i disk.img -o carving_results/
# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt
Export for Analysis Platforms
Sezione intitolata “Export for Analysis Platforms”# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt
# Document format for external tools
head -10 upload_hashes.txt
Performance Considerations
Sezione intitolata “Performance Considerations”Large-Scale Hashing
Sezione intitolata “Large-Scale Hashing”# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt
# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null # Benchmark speed
Memory Usage
Sezione intitolata “Memory Usage”- ssdeep uses minimal memory
- Suitable for embedded systems and resource-constrained environments
- No significant slowdown even with thousands of files
Optimization Tips
Sezione intitolata “Optimization Tips”| Tip | Benefit |
|---|---|
Use -s flag | Faster, no pretty-print overhead |
| Hash to file | Avoids console bottleneck |
| Pre-filter files | Reduce unnecessary hashing |
| Use SSD storage | Faster disk I/O for large batches |
Troubleshooting
Sezione intitolata “Troubleshooting”Hash Mismatch Despite Similarity
Sezione intitolata “Hash Mismatch Despite Similarity”Issue: Same file produces different hash each time.
Solution: This shouldn’t happen with ssdeep. Verify:
# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical
Performance Issues with Large Directories
Sezione intitolata “Performance Issues with Large Directories”Issue: Hashing very large directories is slow.
Solution:
# Use silent mode
ssdeep -r -s /path > output.txt
# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt
# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt
Database Management
Sezione intitolata “Database Management”Issue: Hash database becomes too large.
Solution:
# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt
# Compare against specific database
ssdeep -m exes.txt /suspicious.exe
Security Considerations
Sezione intitolata “Security Considerations”Legitimate Uses (Authorized Testing)
Sezione intitolata “Legitimate Uses (Authorized Testing)”- Malware variant detection in sandboxed environments
- Forensic investigation of compromised systems
- Integrity monitoring with fuzzy matching tolerance
- Research and analysis within controlled labs
Warning
Sezione intitolata “Warning”NEVER use ssdeep to:
- Analyze files without authorization
- Bypass security measures or protections
- Violate intellectual property rights
- Conduct unauthorized security testing
Always obtain proper authorization before analyzing files or systems.
References
Sezione intitolata “References”- Official Project: ssdeep SourceForge
- CTPH Algorithm: Jesse Kornblum’s Research
- Academic Paper: Context Triggered Piecewise Hashing
- Documentation:
man ssdeepon Linux/Unix systems
Quick Reference
Sezione intitolata “Quick Reference”# Fast hash generation
ssdeep -r /path > hashes.txt
# Compare file against database
ssdeep -m hashes.txt /file.bin
# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"
# Pretty print with details
ssdeep -p -m hashes.txt /file.bin
# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt