ssdeep
Overview
Seção intitulada “Overview”ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.
Key Features
Seção intitulada “Key Features”- Compute fuzzy hashes using CTPH algorithm
- Compare files for similarity matching
- Detect malware variants with minor modifications
- Batch processing of multiple files
- Generate fuzzy hash databases
- Cross-platform support (Linux, Windows, macOS)
Use Cases
Seção intitulada “Use Cases”- Malware analysis and variant detection
- Digital forensics and file comparison
- Identifying near-duplicate files
- Detecting code variants in security research
- File integrity monitoring with tolerance
Installation
Seção intitulada “Installation”Linux/Debian-based
Seção intitulada “Linux/Debian-based”sudo apt-get update
sudo apt-get install ssdeep
brew install ssdeep
Windows
Seção intitulada “Windows”Download from ssdeep sourceforge or install via package manager.
Build from Source
Seção intitulada “Build from Source”wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install
Basic Commands
Seção intitulada “Basic Commands”| Command | Purpose |
|---|---|
ssdeep file.bin | Calculate fuzzy hash of a single file |
ssdeep -r directory/ | Recursively hash all files in directory |
ssdeep -m hashfile.txt file.bin | Compare file against known hashes |
ssdeep -s hashfile.txt file.bin | Silent mode comparison (minimal output) |
ssdeep -p directory/ | Recursive mode with pretty-print |
ssdeep -d hashfile.txt file.bin | Show matching hash details |
ssdeep -h | Display help message |
ssdeep -V | Show version information |
Computing Fuzzy Hashes
Seção intitulada “Computing Fuzzy Hashes”Single File Hash
Seção intitulada “Single File Hash”ssdeep /path/to/file.bin
Output example:
3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin
Recursive Directory Hash
Seção intitulada “Recursive Directory Hash”ssdeep -r /path/to/directory/ > hashes.txt
This creates a file with all fuzzy hashes in the directory structure.
Pretty-Print Recursive Hash
Seção intitulada “Pretty-Print Recursive Hash”ssdeep -p -r /malware/samples/ > all_hashes.txt
Produces formatted output with file paths clearly visible.
Comparing Files and Hashes
Seção intitulada “Comparing Files and Hashes”Compare Single File Against Hash Database
Seção intitulada “Compare Single File Against Hash Database”ssdeep -m known_hashes.txt suspicious_file.bin
Shows similarity percentage if match is found.
Silent Mode Comparison (for scripting)
Seção intitulada “Silent Mode Comparison (for scripting)”ssdeep -m -s known_hashes.txt file.bin
Returns only exit code (0 for match, 1 for no match) without console output.
Detailed Match Information
Seção intitulada “Detailed Match Information”ssdeep -m -d known_hashes.txt file.bin
Displays detailed information about matched hashes and similarity scores.
Batch Compare Multiple Files
Seção intitulada “Batch Compare Multiple Files”for file in /path/to/files/*; do
echo "=== $file ==="
ssdeep -m hashes.txt "$file"
done
Creating Hash Databases
Seção intitulada “Creating Hash Databases”Generate Hash Database from Directory
Seção intitulada “Generate Hash Database from Directory”ssdeep -r -s /path/to/samples/ > malware_db.txt
Silent recursive mode creates a database suitable for later comparisons.
Append Hashes to Existing Database
Seção intitulada “Append Hashes to Existing Database”ssdeep -r /new/samples/ >> malware_db.txt
Adds new hashes to an existing database file.
Create Indexed Hash Database
Seção intitulada “Create Indexed Hash Database”ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/
Store baseline hashes, then perform batch comparisons against them.
Malware Analysis Workflow
Seção intitulada “Malware Analysis Workflow”Establish Known Malware Baseline
Seção intitulada “Establish Known Malware Baseline”# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt # Verify hash count
Analyze Suspicious File
Seção intitulada “Analyze Suspicious File”# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt
# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin
Detect Variants
Seção intitulada “Detect Variants”# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe
# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)
Advanced Usage
Seção intitulada “Advanced Usage”Exclude Files by Pattern
Seção intitulada “Exclude Files by Pattern”# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
ssdeep "$f" >> custom_hashes.txt
done
Hash Only Specific File Types
Seção intitulada “Hash Only Specific File Types”# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt
Compare Two Hash Databases
Seção intitulada “Compare Two Hash Databases”# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt
Generate HTML Report
Seção intitulada “Generate HTML Report”ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization
Hash File Format
Seção intitulada “Hash File Format”Fuzzy hash database files have simple format:
# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin
Structure:
size:hash1:hash2— fuzzy hash components- Similarity percentage shown on comparison
- One hash per line with comma-separated filename
Similarity Matching Thresholds
Seção intitulada “Similarity Matching Thresholds”Interpreting Match Scores
Seção intitulada “Interpreting Match Scores”| Score | Meaning |
|---|---|
| 90-100% | Very similar, likely same file or minor variation |
| 75-89% | Similar structure, possible variant or derivative |
| 50-74% | Moderate similarity, may share code blocks |
| 25-49% | Weak similarity, possible common libraries |
| < 25% | Not similar, coincidental match |
Setting Match Confidence
Seção intitulada “Setting Match Confidence”# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"
Automation and Scripting
Seção intitulada “Automation and Scripting”Monitor Directory for New Malware Variants
Seção intitulada “Monitor Directory for New Malware Variants”#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"
while true; do
for file in "$MONITOR_DIR"/*.bin; do
if [ -f "$file" ]; then
if ssdeep -m -s "$BASELINE" "$file"; then
echo "$(date): VARIANT DETECTED - $file"
# Alert or quarantine
fi
rm "$file"
fi
done
sleep 300 # Check every 5 minutes
done
Bulk Hash and Compare
Seção intitulada “Bulk Hash and Compare”#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"
for file in "$SAMPLE_DIR"/*; do
result=$(ssdeep -m -s "$HASH_DB" "$file")
if [ $? -eq 0 ]; then
echo "MATCH: $file"
fi
done
Generate Statistical Analysis
Seção intitulada “Generate Statistical Analysis”# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"
# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"
Integration with Other Tools
Seção intitulada “Integration with Other Tools”Use with YARA for Enhanced Detection
Seção intitulada “Use with YARA for Enhanced Detection”# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt
# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt
Combine with File Carving
Seção intitulada “Combine with File Carving”# After carving files from disk image
foremost -i disk.img -o carving_results/
# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt
Export for Analysis Platforms
Seção intitulada “Export for Analysis Platforms”# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt
# Document format for external tools
head -10 upload_hashes.txt
Performance Considerations
Seção intitulada “Performance Considerations”Large-Scale Hashing
Seção intitulada “Large-Scale Hashing”# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt
# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null # Benchmark speed
Memory Usage
Seção intitulada “Memory Usage”- ssdeep uses minimal memory
- Suitable for embedded systems and resource-constrained environments
- No significant slowdown even with thousands of files
Optimization Tips
Seção intitulada “Optimization Tips”| Tip | Benefit |
|---|---|
Use -s flag | Faster, no pretty-print overhead |
| Hash to file | Avoids console bottleneck |
| Pre-filter files | Reduce unnecessary hashing |
| Use SSD storage | Faster disk I/O for large batches |
Troubleshooting
Seção intitulada “Troubleshooting”Hash Mismatch Despite Similarity
Seção intitulada “Hash Mismatch Despite Similarity”Issue: Same file produces different hash each time.
Solution: This shouldn’t happen with ssdeep. Verify:
# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical
Performance Issues with Large Directories
Seção intitulada “Performance Issues with Large Directories”Issue: Hashing very large directories is slow.
Solution:
# Use silent mode
ssdeep -r -s /path > output.txt
# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt
# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt
Database Management
Seção intitulada “Database Management”Issue: Hash database becomes too large.
Solution:
# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt
# Compare against specific database
ssdeep -m exes.txt /suspicious.exe
Security Considerations
Seção intitulada “Security Considerations”Legitimate Uses (Authorized Testing)
Seção intitulada “Legitimate Uses (Authorized Testing)”- Malware variant detection in sandboxed environments
- Forensic investigation of compromised systems
- Integrity monitoring with fuzzy matching tolerance
- Research and analysis within controlled labs
Warning
Seção intitulada “Warning”NEVER use ssdeep to:
- Analyze files without authorization
- Bypass security measures or protections
- Violate intellectual property rights
- Conduct unauthorized security testing
Always obtain proper authorization before analyzing files or systems.
References
Seção intitulada “References”- Official Project: ssdeep SourceForge
- CTPH Algorithm: Jesse Kornblum’s Research
- Academic Paper: Context Triggered Piecewise Hashing
- Documentation:
man ssdeepon Linux/Unix systems
Quick Reference
Seção intitulada “Quick Reference”# Fast hash generation
ssdeep -r /path > hashes.txt
# Compare file against database
ssdeep -m hashes.txt /file.bin
# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"
# Pretty print with details
ssdeep -p -m hashes.txt /file.bin
# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt