ssdeep

Overview

ssdeep is a program for computing and comparing context-triggered piecewise hashing (CTPH) fuzzy hashes. Unlike traditional cryptographic hashes like MD5 or SHA-1 that produce completely different outputs for slightly modified files, ssdeep detects similarity between files even when they differ.

Key Features

Compute fuzzy hashes using CTPH algorithm
Compare files for similarity matching
Detect malware variants with minor modifications
Batch processing of multiple files
Generate fuzzy hash databases
Cross-platform support (Linux, Windows, macOS)

Use Cases

Malware analysis and variant detection
Digital forensics and file comparison
Identifying near-duplicate files
Detecting code variants in security research
File integrity monitoring with tolerance

Installation

Linux/Debian-based

sudo apt-get update
sudo apt-get install ssdeep

macOS

brew install ssdeep

Windows

Download from ssdeep sourceforge or install via package manager.

Build from Source

wget https://sourceforge.net/projects/ssdeep/files/ssdeep-2.14.1/ssdeep-2.14.1.tar.gz
tar xzf ssdeep-2.14.1.tar.gz
cd ssdeep-2.14.1
./configure
make
sudo make install

Basic Commands

Command	Purpose
`ssdeep file.bin`	Calculate fuzzy hash of a single file
`ssdeep -r directory/`	Recursively hash all files in directory
`ssdeep -m hashfile.txt file.bin`	Compare file against known hashes
`ssdeep -s hashfile.txt file.bin`	Silent mode comparison (minimal output)
`ssdeep -p directory/`	Recursive mode with pretty-print
`ssdeep -d hashfile.txt file.bin`	Show matching hash details
`ssdeep -h`	Display help message
`ssdeep -V`	Show version information

Computing Fuzzy Hashes

Single File Hash

ssdeep /path/to/file.bin

Output example:

3072:abcdef+GHIJKL+mnopqr+stuvwx:1234
/path/to/file.bin

Recursive Directory Hash

ssdeep -r /path/to/directory/ > hashes.txt

This creates a file with all fuzzy hashes in the directory structure.

Pretty-Print Recursive Hash

ssdeep -p -r /malware/samples/ > all_hashes.txt

Produces formatted output with file paths clearly visible.

Comparing Files and Hashes

Compare Single File Against Hash Database

ssdeep -m known_hashes.txt suspicious_file.bin

Shows similarity percentage if match is found.

Silent Mode Comparison (for scripting)

ssdeep -m -s known_hashes.txt file.bin

Returns only exit code (0 for match, 1 for no match) without console output.

Detailed Match Information

ssdeep -m -d known_hashes.txt file.bin

Displays detailed information about matched hashes and similarity scores.

Batch Compare Multiple Files

for file in /path/to/files/*; do
    echo "=== $file ==="
    ssdeep -m hashes.txt "$file"
done

Creating Hash Databases

Generate Hash Database from Directory

ssdeep -r -s /path/to/samples/ > malware_db.txt

Silent recursive mode creates a database suitable for later comparisons.

Append Hashes to Existing Database

ssdeep -r /new/samples/ >> malware_db.txt

Adds new hashes to an existing database file.

Create Indexed Hash Database

ssdeep -r -s /samples/ > database.txt
ssdeep -m database.txt /path/to/compare/

Store baseline hashes, then perform batch comparisons against them.

Malware Analysis Workflow

Establish Known Malware Baseline

# Create hash database of known malware samples
ssdeep -r -s /known/malware/collection/ > malware_baseline.txt
wc -l malware_baseline.txt  # Verify hash count

Analyze Suspicious File

# Calculate hash of suspicious file
ssdeep /path/to/suspicious_sample.bin > analysis.txt

# Compare against known malware
ssdeep -m malware_baseline.txt /path/to/suspicious_sample.bin

Detect Variants

# Check if file is variant of known malware
ssdeep -m -d malware_baseline.txt /suspect.exe

# Output shows match percentage
# Example: 98:abc123:xyz789 matches /known/malware/trojan.exe (98%)

Advanced Usage

Exclude Files by Pattern

# Hash all files except system files
find /samples -type f ! -name "*.sys" ! -name "*.dll" | while read f; do
    ssdeep "$f" >> custom_hashes.txt
done

Hash Only Specific File Types

# Hash only executable files
find /samples -type f -name "*.exe" -exec ssdeep {} \; > exes.txt

Compare Two Hash Databases

# Find similar hashes between two databases
ssdeep -m database1.txt database2.txt

Generate HTML Report

ssdeep -r /samples/ | grep -v "^%" > hashes.txt
# Use third-party tools to convert to HTML visualization

Hash File Format

Fuzzy hash database files have simple format:

# ssdeep generated hashes
3072:abcdef+GHIJKL+mnopqr:1234,file1.bin
2048:qwerty+asdfgh:5678,file2.bin
4096:zxcvbn+poiuyt:9012,file3.bin

Structure:

size:hash1:hash2 — fuzzy hash components
Similarity percentage shown on comparison
One hash per line with comma-separated filename

Similarity Matching Thresholds

Interpreting Match Scores

Score	Meaning
90-100%	Very similar, likely same file or minor variation
75-89%	Similar structure, possible variant or derivative
50-74%	Moderate similarity, may share code blocks
25-49%	Weak similarity, possible common libraries
< 25%	Not similar, coincidental match

Setting Match Confidence

# Use detailed output to review match percentages
ssdeep -m -d baseline.txt /path/to/file.bin | grep "%"

Automation and Scripting

Monitor Directory for New Malware Variants

#!/bin/bash
BASELINE="/opt/malware_baseline.txt"
MONITOR_DIR="/quarantine"

while true; do
    for file in "$MONITOR_DIR"/*.bin; do
        if [ -f "$file" ]; then
            if ssdeep -m -s "$BASELINE" "$file"; then
                echo "$(date): VARIANT DETECTED - $file"
                # Alert or quarantine
            fi
            rm "$file"
        fi
    done
    sleep 300  # Check every 5 minutes
done

Bulk Hash and Compare

#!/bin/bash
SAMPLE_DIR="$1"
HASH_DB="$2"

for file in "$SAMPLE_DIR"/*; do
    result=$(ssdeep -m -s "$HASH_DB" "$file")
    if [ $? -eq 0 ]; then
        echo "MATCH: $file"
    fi
done

Generate Statistical Analysis

# Count hashes and create analysis
ssdeep -r /samples/ | grep -v "^%" > all_hashes.txt
total=$(wc -l < all_hashes.txt)
echo "Total files hashed: $total"

# Find most similar pairs (requires parsing)
echo "Use SSDEEP library or custom tools for pair-wise comparison"

Integration with Other Tools

Use with YARA for Enhanced Detection

# Generate fuzzy hashes as part of forensic investigation
ssdeep -r /evidence/drives/ > drive_hashes.txt

# Cross-reference with YARA rules for additional context
yara -r /rules/ /evidence/ > yara_results.txt

Combine with File Carving

# After carving files from disk image
foremost -i disk.img -o carving_results/

# Hash carved files for variant detection
ssdeep -r carving_results/ > carved_hashes.txt
ssdeep -m known_malware.txt carving_results/ > carved_analysis.txt

Export for Analysis Platforms

# Create hash database for uploading to analysis platform
ssdeep -r -s /samples/ > upload_hashes.txt

# Document format for external tools
head -10 upload_hashes.txt

Performance Considerations

Large-Scale Hashing

# Hash thousands of files efficiently
time ssdeep -r -s /massive/directory/ > output.txt

# Use -s (silent) for better performance without console output
ssdeep -r -s /path > /dev/null  # Benchmark speed

Memory Usage

ssdeep uses minimal memory
Suitable for embedded systems and resource-constrained environments
No significant slowdown even with thousands of files

Optimization Tips

Tip	Benefit
Use `-s` flag	Faster, no pretty-print overhead
Hash to file	Avoids console bottleneck
Pre-filter files	Reduce unnecessary hashing
Use SSD storage	Faster disk I/O for large batches

Troubleshooting

Hash Mismatch Despite Similarity

Issue: Same file produces different hash each time.

Solution: This shouldn’t happen with ssdeep. Verify:

# Check file integrity
md5sum file.bin
ssdeep file.bin
# Run again - should be identical

Performance Issues with Large Directories

Issue: Hashing very large directories is slow.

Solution:

# Use silent mode
ssdeep -r -s /path > output.txt

# Process in parallel using GNU Parallel
parallel ssdeep ::: /samples/*.bin > parallel_output.txt

# Or use xargs
find /samples -type f | xargs -P 4 ssdeep > parallel_results.txt

Database Management

Issue: Hash database becomes too large.

Solution:

# Segment databases by type
ssdeep -r -s /exe_samples/ > exes.txt
ssdeep -r -s /dll_samples/ > dlls.txt

# Compare against specific database
ssdeep -m exes.txt /suspicious.exe

Security Considerations

Legitimate Uses (Authorized Testing)

Malware variant detection in sandboxed environments
Forensic investigation of compromised systems
Integrity monitoring with fuzzy matching tolerance
Research and analysis within controlled labs

Warning

NEVER use ssdeep to:

Analyze files without authorization
Bypass security measures or protections
Violate intellectual property rights
Conduct unauthorized security testing

Always obtain proper authorization before analyzing files or systems.

References

Official Project: ssdeep SourceForge
CTPH Algorithm: Jesse Kornblum’s Research
Academic Paper: Context Triggered Piecewise Hashing
Documentation: man ssdeep on Linux/Unix systems

Quick Reference

# Fast hash generation
ssdeep -r /path > hashes.txt

# Compare file against database
ssdeep -m hashes.txt /file.bin

# Silent batch comparison
ssdeep -m -s hashes.txt /file.bin && echo "MATCH"

# Pretty print with details
ssdeep -p -m hashes.txt /file.bin

# Combine find with ssdeep
find /samples -type f -exec ssdeep {} \; > all.txt