ssdeep
Overview
Section intitulée « Overview »ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.
Key Features
Section intitulée « Key Features »- Compute fuzzy hashes using CTPH algorithm
- Compare files for similarity matching
- Detect malware variants with minor modifications
- Batch processing of multiple files
- Generate fuzzy hash databases
- Cross-platform support (Linux, Windows, macOS)
Use Cases
Section intitulée « Use Cases »- Malware analysis and variant detection
- Digital forensics and file comparison
- Identifying near-duplicate files
- Detecting code variants in security research
- File integrity monitoring with tolerance
Installation
Section intitulée « Installation »Linux/Debian-based
Section intitulée « Linux/Debian-based »sudo apt-get update
sudo apt-get install ssdeep
brew install ssdeep
Download from ssdeep sourceforge or install via package manager.
Build from Source
Section intitulée « Build from Source »wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install
Basic Commands
Section intitulée « Basic Commands »| Command | Purpose |
|---|---|
ssdeep file.bin | Calculate fuzzy hash of a single file |
ssdeep -r directory/ | Recursively hash all files in directory |
ssdeep -m hashfile.txt file.bin | Compare file against known hashes |
ssdeep -s hashfile.txt file.bin | Silent mode comparison (minimal output) |
ssdeep -p directory/ | Recursive mode with pretty-print |
ssdeep -d hashfile.txt file.bin | Show matching hash details |
ssdeep -h | Display help message |
ssdeep -V | Show version information |
Computing Fuzzy Hashes
Section intitulée « Computing Fuzzy Hashes »Single File Hash
Section intitulée « Single File Hash »ssdeep /path/to/file.bin
Output example:
3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin
Recursive Directory Hash
Section intitulée « Recursive Directory Hash »ssdeep -r /path/to/directory/ > hashes.txt
This creates a file with all fuzzy hashes in the directory structure.
Pretty-Print Recursive Hash
Section intitulée « Pretty-Print Recursive Hash »ssdeep -p -r /malware/samples/ > all_hashes.txt
Produces formatted output with file paths clearly visible.
Comparing Files and Hashes
Section intitulée « Comparing Files and Hashes »Compare Single File Against Hash Database
Section intitulée « Compare Single File Against Hash Database »ssdeep -m known_hashes.txt suspicious_file.bin
Shows similarity percentage if match is found.
Silent Mode Comparison (for scripting)
Section intitulée « Silent Mode Comparison (for scripting) »ssdeep -m -s known_hashes.txt file.bin
Returns only exit code (0 for match, 1 for no match) without console output.
Detailed Match Information
Section intitulée « Detailed Match Information »ssdeep -m -d known_hashes.txt file.bin
Displays detailed information about matched hashes and similarity scores.
Batch Compare Multiple Files
Section intitulée « Batch Compare Multiple Files »for file in /path/to/files/*; do
echo "=== $file ==="
ssdeep -m hashes.txt "$file"
done
Creating Hash Databases
Section intitulée « Creating Hash Databases »Generate Hash Database from Directory
Section intitulée « Generate Hash Database from Directory »ssdeep -r -s /path/to/samples/ > malware_db.txt
Silent recursive mode creates a database suitable for later comparisons.
Append Hashes to Existing Database
Section intitulée « Append Hashes to Existing Database »ssdeep -r /new/samples/ >> malware_db.txt
Adds new hashes to an existing database file.
Create Indexed Hash Database
Section intitulée « Create Indexed Hash Database »ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/
Store baseline hashes, then perform batch comparisons against them.
Malware Analysis Workflow
Section intitulée « Malware Analysis Workflow »Establish Known Malware Baseline
Section intitulée « Establish Known Malware Baseline »# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt # Verify hash count
Analyze Suspicious File
Section intitulée « Analyze Suspicious File »# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt
# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin
Detect Variants
Section intitulée « Detect Variants »# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe
# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)
Advanced Usage
Section intitulée « Advanced Usage »Exclude Files by Pattern
Section intitulée « Exclude Files by Pattern »# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
ssdeep "$f" >> custom_hashes.txt
done
Hash Only Specific File Types
Section intitulée « Hash Only Specific File Types »# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt
Compare Two Hash Databases
Section intitulée « Compare Two Hash Databases »# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt
Generate HTML Report
Section intitulée « Generate HTML Report »ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization
Hash File Format
Section intitulée « Hash File Format »Fuzzy hash database files have simple format:
# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin
Structure:
size:hash1:hash2— fuzzy hash components- Similarity percentage shown on comparison
- One hash per line with comma-separated filename
Similarity Matching Thresholds
Section intitulée « Similarity Matching Thresholds »Interpreting Match Scores
Section intitulée « Interpreting Match Scores »| Score | Meaning |
|---|---|
| 90-100% | Very similar, likely same file or minor variation |
| 75-89% | Similar structure, possible variant or derivative |
| 50-74% | Moderate similarity, may share code blocks |
| 25-49% | Weak similarity, possible common libraries |
| < 25% | Not similar, coincidental match |
Setting Match Confidence
Section intitulée « Setting Match Confidence »# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"
Automation and Scripting
Section intitulée « Automation and Scripting »Monitor Directory for New Malware Variants
Section intitulée « Monitor Directory for New Malware Variants »#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"
while true; do
for file in "$MONITOR_DIR"/*.bin; do
if [ -f "$file" ]; then
if ssdeep -m -s "$BASELINE" "$file"; then
echo "$(date): VARIANT DETECTED - $file"
# Alert or quarantine
fi
rm "$file"
fi
done
sleep 300 # Check every 5 minutes
done
Bulk Hash and Compare
Section intitulée « Bulk Hash and Compare »#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"
for file in "$SAMPLE_DIR"/*; do
result=$(ssdeep -m -s "$HASH_DB" "$file")
if [ $? -eq 0 ]; then
echo "MATCH: $file"
fi
done
Generate Statistical Analysis
Section intitulée « Generate Statistical Analysis »# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"
# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"
Integration with Other Tools
Section intitulée « Integration with Other Tools »Use with YARA for Enhanced Detection
Section intitulée « Use with YARA for Enhanced Detection »# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt
# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt
Combine with File Carving
Section intitulée « Combine with File Carving »# After carving files from disk image
foremost -i disk.img -o carving_results/
# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt
Export for Analysis Platforms
Section intitulée « Export for Analysis Platforms »# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt
# Document format for external tools
head -10 upload_hashes.txt
Performance Considerations
Section intitulée « Performance Considerations »Large-Scale Hashing
Section intitulée « Large-Scale Hashing »# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt
# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null # Benchmark speed
Memory Usage
Section intitulée « Memory Usage »- ssdeep uses minimal memory
- Suitable for embedded systems and resource-constrained environments
- No significant slowdown even with thousands of files
Optimization Tips
Section intitulée « Optimization Tips »| Tip | Benefit |
|---|---|
Use -s flag | Faster, no pretty-print overhead |
| Hash to file | Avoids console bottleneck |
| Pre-filter files | Reduce unnecessary hashing |
| Use SSD storage | Faster disk I/O for large batches |
Troubleshooting
Section intitulée « Troubleshooting »Hash Mismatch Despite Similarity
Section intitulée « Hash Mismatch Despite Similarity »Issue: Same file produces different hash each time.
Solution: This shouldn’t happen with ssdeep. Verify:
# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical
Performance Issues with Large Directories
Section intitulée « Performance Issues with Large Directories »Issue: Hashing very large directories is slow.
Solution:
# Use silent mode
ssdeep -r -s /path > output.txt
# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt
# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt
Database Management
Section intitulée « Database Management »Issue: Hash database becomes too large.
Solution:
# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt
# Compare against specific database
ssdeep -m exes.txt /suspicious.exe
Security Considerations
Section intitulée « Security Considerations »Legitimate Uses (Authorized Testing)
Section intitulée « Legitimate Uses (Authorized Testing) »- Malware variant detection in sandboxed environments
- Forensic investigation of compromised systems
- Integrity monitoring with fuzzy matching tolerance
- Research and analysis within controlled labs
NEVER use ssdeep to:
- Analyze files without authorization
- Bypass security measures or protections
- Violate intellectual property rights
- Conduct unauthorized security testing
Always obtain proper authorization before analyzing files or systems.
References
Section intitulée « References »- Official Project: ssdeep SourceForge
- CTPH Algorithm: Jesse Kornblum’s Research
- Academic Paper: Context Triggered Piecewise Hashing
- Documentation:
man ssdeepon Linux/Unix systems
Quick Reference
Section intitulée « Quick Reference »# Fast hash generation
ssdeep -r /path > hashes.txt
# Compare file against database
ssdeep -m hashes.txt /file.bin
# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"
# Pretty print with details
ssdeep -p -m hashes.txt /file.bin
# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt