Semgrep outil d'analyse statique feuille de chaleur¶
Aperçu général¶
Semgrep est un outil d'analyse statique rapide et ouvert pour trouver les bogues, les vulnérabilités de sécurité et l'application des normes de code dans plusieurs langages de programmation. Il utilise une analyse basée sur les motifs avec une syntaxe simple et intuitive qui permet aux développeurs d'écrire des règles personnalisées facilement. Semgrep est particulièrement précieux dans les pipelines DevSecOps pour sa rapidité, sa précision et sa vaste bibliothèque de règles couvrant les problèmes de sécurité, d'exactitude et de performance.
C'est pas vrai. Note: Semgrep est conçu pour l'analyse statique basée sur les motifs et peut nécessiter des règles personnalisées pour les exigences de sécurité spécifiques à l'organisation. Il devrait être intégré aux pipelines CI/CD pour une surveillance continue de la sécurité.
Installation¶
Utilisation de pip (Recommandé)¶
# Install Semgrep
pip install semgrep
# Install with specific version
pip install semgrep==1.45.0
# Install from source
pip install git+https://github.com/returntocorp/semgrep.git
# Verify installation
semgrep --version
Utilisation de Homebrew (macOS)¶
# Install Semgrep
brew install semgrep
# Update Semgrep
brew upgrade semgrep
```_
### Utilisation de Docker
```bash
# Pull Semgrep image
docker pull returntocorp/semgrep
# Run Semgrep in container
docker run --rm -v $(pwd):/src returntocorp/semgrep --config=auto /src
# Create alias for convenience
alias semgrep='docker run --rm -v $(pwd):/src returntocorp/semgrep'
# Build custom image
cat > Dockerfile ``<< 'EOF'
FROM returntocorp/semgrep
WORKDIR /src
ENTRYPOINT ["semgrep"]
EOF
docker build -t custom-semgrep .
```_
### Gestionnaires de paquets
```bash
# Ubuntu/Debian (via pip)
sudo apt update
sudo apt install python3-pip
pip3 install semgrep
# CentOS/RHEL/Fedora
sudo dnf install python3-pip
pip3 install semgrep
# Arch Linux
sudo pacman -S python-pip
pip install semgrep
Installation binaire¶
# Download binary (Linux)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-linux-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/
# Download binary (macOS)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-macos-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/
Utilisation de base¶
Démarrer rapidement¶
# Scan with auto-configuration (recommended for beginners)
semgrep --config=auto .
# Scan specific directory
semgrep --config=auto /path/to/project
# Scan single file
semgrep --config=auto file.py
# Scan with specific ruleset
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .
# Scan with multiple rulesets
semgrep --config=p/security-audit --config=p/owasp-top-ten .
Formats de sortie¶
# Default text output
semgrep --config=auto .
# JSON output
semgrep --config=auto --json .
# SARIF output (for GitHub integration)
semgrep --config=auto --sarif .
# JUnit XML output
semgrep --config=auto --junit-xml .
# Emacs output format
semgrep --config=auto --emacs .
# Vim output format
semgrep --config=auto --vim .
# Save output to file
semgrep --config=auto --json --output=results.json .
semgrep --config=auto --sarif --output=results.sarif .
Filtrage et ciblage¶
# Include specific file patterns
semgrep --config=auto --include="*.py" .
semgrep --config=auto --include="*.js" --include="*.ts" .
# Exclude specific file patterns
semgrep --config=auto --exclude="*test*" .
semgrep --config=auto --exclude="node_modules" --exclude="vendor" .
# Scan specific languages
semgrep --config=auto --lang=python .
semgrep --config=auto --lang=javascript .
semgrep --config=auto --lang=java .
# Severity filtering
semgrep --config=auto --severity=ERROR .
semgrep --config=auto --severity=WARNING .
semgrep --config=auto --severity=INFO .
Configuration des règles¶
Utilisation de règles intégrées¶
# Security-focused rulesets
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .
semgrep --config=p/secrets .
# Language-specific rulesets
semgrep --config=p/python .
semgrep --config=p/javascript .
semgrep --config=p/java .
semgrep --config=p/go .
# Framework-specific rulesets
semgrep --config=p/django .
semgrep --config=p/flask .
semgrep --config=p/react .
semgrep --config=p/express .
# Code quality rulesets
semgrep --config=p/code-quality .
semgrep --config=p/performance .
semgrep --config=p/correctness .
# List available rulesets
semgrep --config=p/
Règles douanières¶
# custom-rules.yml
rules:
- id: hardcoded-password
pattern: password = "..."
message: Hardcoded password detected
languages: [python]
severity: ERROR
- id: sql-injection
pattern-either:
- pattern: cursor.execute("..." + $VAR)
- pattern: cursor.execute(f"...\\\{$VAR\\\}...")
message: Potential SQL injection vulnerability
languages: [python]
severity: ERROR
- id: unsafe-yaml-load
pattern: yaml.load($DATA)
message: Use yaml.safe_load() instead of yaml.load()
languages: [python]
severity: WARNING
fix: yaml.safe_load($DATA)
- id: missing-csrf-protection
pattern:|
class $CLASS(...):
...
def post(self, ...):
...
pattern-not:|
class $CLASS(...):
...
@csrf_exempt
def post(self, ...):
...
message: POST method missing CSRF protection
languages: [python]
severity: ERROR
Exemples de syntaxe des règles¶
# Pattern matching
rules:
- id: basic-pattern
pattern: eval($X)
message: Avoid using eval()
languages: [python]
severity: ERROR
- id: pattern-either
pattern-either:
- pattern: exec($X)
- pattern: eval($X)
message: Avoid using exec() or eval()
languages: [python]
severity: ERROR
- id: pattern-inside
pattern-inside:|
def $FUNC(...):
...
pattern: return $X
message: Function returns value
languages: [python]
severity: INFO
- id: pattern-not
pattern: requests.get($URL)
pattern-not: requests.get($URL, verify=True)
message: HTTPS request without certificate verification
languages: [python]
severity: WARNING
- id: metavariable-regex
pattern: $FUNC($ARG)
metavariable-regex:
metavariable: $FUNC
regex: ^(exec|eval)$
message: Dangerous function call
languages: [python]
severity: ERROR
Utilisation avancée¶
Fichiers de configuration¶
# .semgrep.yml
rules:
- rules/security
- rules/performance
exclude:
- "*/tests/*"
- "*/node_modules/*"
- "*/vendor/*"
- "*.min.js"
include:
- "*.py"
- "*.js"
- "*.java"
- "*.go"
severity:
- ERROR
- WARNING
Développement des règles douanières¶
# advanced-rules.yml
rules:
- id: jwt-hardcoded-secret
pattern-either:
- pattern: jwt.encode($PAYLOAD, "...", ...)
- pattern: jwt.decode($TOKEN, "...", ...)
message: JWT secret should not be hardcoded
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-798: Use of Hard-coded Credentials"
owasp: "A02:2021 – Cryptographic Failures"
- id: unsafe-deserialization
pattern-either:
- pattern: pickle.loads($DATA)
- pattern: pickle.load($FILE)
- pattern: cPickle.loads($DATA)
message: Unsafe deserialization with pickle
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-502: Deserialization of Untrusted Data"
- id: command-injection
pattern-either:
- pattern: os.system($CMD)
- pattern: subprocess.call($CMD, shell=True)
- pattern: subprocess.run($CMD, shell=True)
pattern-not-inside:|
$CMD = "..."
message: Potential command injection vulnerability
languages: [python]
severity: ERROR
fix-regex:
regex: 'shell=True'
replacement: 'shell=False'
Analyse des taints¶
# taint-rules.yml
rules:
- id: user-input-to-sql
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
- pattern: request.json.get(...)
pattern-sinks:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
message: User input flows to SQL query
languages: [python]
severity: ERROR
- id: user-input-to-eval
mode: taint
pattern-sources:
- pattern: input(...)
- pattern: sys.argv[...]
pattern-sinks:
- pattern: eval($CODE)
- pattern: exec($CODE)
message: User input flows to code execution
languages: [python]
severity: ERROR
Intégration CI/CD¶
Actions GitHub¶
# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
semgrep:
name: Scan
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v3
- name: Run Semgrep
run:|
semgrep \
--config=auto \
--sarif \
--output=semgrep-results.sarif \
.
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep-results.sarif
if: always()
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: semgrep-report
path: semgrep-results.sarif
GitLab CI¶
# .gitlab-ci.yml
stages:
- security
semgrep:
stage: security
image: returntocorp/semgrep
script:
- semgrep --config=auto --json --output=semgrep-report.json .
artifacts:
reports:
sast: semgrep-report.json
paths:
- semgrep-report.json
expire_in: 1 week
allow_failure: true
Jenkins Pipeline¶
// Jenkinsfile
pipeline \\\{
agent any
stages \\\{
stage('Security Scan') \\\{
steps \\\{
script \\\{
docker.image('returntocorp/semgrep').inside \\\{
sh 'semgrep --config=auto --json --output=semgrep-results.json .'
sh 'semgrep --config=auto --sarif --output=semgrep-results.sarif .'
\\\}
\\\}
\\\}
post \\\{
always \\\{
archiveArtifacts artifacts: 'semgrep-results.*', fingerprint: true
// Parse results and fail build if high severity issues found
script \\\{
def results = readJSON file: 'semgrep-results.json'
def errors = results.results.findAll \\\{ it.extra.severity == 'ERROR' \\\}
if (errors.size() >`` 0) \\\\{
currentBuild.result = 'FAILURE'
error("Found $\\\\{errors.size()\\\\} high severity security issues")
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
Azure DevOps¶
# azure-pipelines.yml
trigger:
- main
pool:
vmImage: 'ubuntu-latest'
container: returntocorp/semgrep
steps:
- checkout: self
- script:|
semgrep --config=auto --json --output=$(Agent.TempDirectory)/semgrep-results.json .
semgrep --config=auto --sarif --output=$(Agent.TempDirectory)/semgrep-results.sarif .
displayName: 'Run Semgrep Security Scan'
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: '$(Agent.TempDirectory)/semgrep-results.sarif'
testRunTitle: 'Semgrep Security Scan'
condition: always()
Crochet de pré-engagement¶
# .pre-commit-config.yaml
repos:
- repo: https://github.com/returntocorp/semgrep
rev: 'v1.45.0'
hooks:
- id: semgrep
args: ['--config=auto', '--error']
Utilisation spécifique à la langue¶
Projets Python¶
# Python security scan
semgrep --config=p/python --config=p/flask --config=p/django .
# Python-specific rules
semgrep --config=p/bandit .
semgrep --config=p/secrets .
# Custom Python rules
cat > python-rules.yml << 'EOF'
rules:
- id: flask-debug-mode
pattern: app.run(debug=True)
message: Flask debug mode should not be enabled in production
languages: [python]
severity: ERROR
- id: django-debug-setting
pattern: DEBUG = True
message: Django DEBUG should be False in production
languages: [python]
severity: ERROR
EOF
semgrep --config=python-rules.yml .
JavaScript/TypeScript Projets¶
# JavaScript security scan
semgrep --config=p/javascript --config=p/typescript .
# Framework-specific scans
semgrep --config=p/react .
semgrep --config=p/express .
semgrep --config=p/nodejs .
# Custom JavaScript rules
cat > js-rules.yml << 'EOF'
rules:
- id: eval-usage
pattern-either:
- pattern: eval($X)
- pattern: Function($X)
message: Avoid using eval() or Function() constructor
languages: [javascript, typescript]
severity: ERROR
- id: innerHTML-xss
pattern: $EL.innerHTML = $VAR
message: Potential XSS vulnerability with innerHTML
languages: [javascript, typescript]
severity: WARNING
EOF
semgrep --config=js-rules.yml .
Projets Java¶
# Java security scan
semgrep --config=p/java .
semgrep --config=p/spring .
# Custom Java rules
cat > java-rules.yml << 'EOF'
rules:
- id: sql-injection-java
pattern:|
Statement $STMT = ...;
...
$STMT.executeQuery($QUERY + ...)
message: Potential SQL injection vulnerability
languages: [java]
severity: ERROR
- id: hardcoded-password-java
pattern:|
String $VAR = "...";
metavariable-regex:
metavariable: $VAR
regex: (?i)(password|passwd|pwd)
message: Hardcoded password detected
languages: [java]
severity: ERROR
EOF
semgrep --config=java-rules.yml .
Automatisation et écriture¶
Scanner automatisé de sécurité¶
#!/usr/bin/env python3
# semgrep_scanner.py
import subprocess
import json
import sys
import argparse
from pathlib import Path
class SemgrepScanner:
def __init__(self, project_path, config='auto'):
self.project_path = Path(project_path)
self.config = config
self.results = \\\\{\\\\}
def run_scan(self, output_format='json', severity_filter=None):
"""Run Semgrep scan with specified parameters"""
cmd = [
'semgrep',
'--config', self.config,
f'--\\\\{output_format\\\\}',
str(self.project_path)
]
if severity_filter:
cmd.extend(['--severity', severity_filter])
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=False)
if output_format == 'json':
self.results = json.loads(result.stdout) if result.stdout else \\\\{\\\\}
else:
self.results = result.stdout
return result.returncode == 0
except subprocess.CalledProcessError as e:
print(f"Error running Semgrep: \\\\{e\\\\}")
return False
except json.JSONDecodeError as e:
print(f"Error parsing JSON output: \\\\{e\\\\}")
return False
def get_summary(self):
"""Get scan summary"""
if not isinstance(self.results, dict):
return "No results available"
findings = self.results.get('results', [])
summary = \\\\{
'total_findings': len(findings),
'error_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'ERROR']),
'warning_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'WARNING']),
'info_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'INFO'])
\\\\}
return summary
def get_findings_by_severity(self, severity='ERROR'):
"""Get findings filtered by severity"""
if not isinstance(self.results, dict):
return []
findings = self.results.get('results', [])
return [f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == severity]
def get_findings_by_rule(self):
"""Group findings by rule ID"""
if not isinstance(self.results, dict):
return \\\\{\\\\}
findings = self.results.get('results', [])
by_rule = \\\\{\\\\}
for finding in findings:
rule_id = finding.get('check_id', 'unknown')
if rule_id not in by_rule:
by_rule[rule_id] = []
by_rule[rule_id].append(finding)
return by_rule
def save_results(self, output_file='semgrep_results.json'):
"""Save results to file"""
if isinstance(self.results, dict):
with open(output_file, 'w') as f:
json.dump(self.results, f, indent=2)
else:
with open(output_file, 'w') as f:
f.write(str(self.results))
def generate_report(self, output_file='semgrep_report.html'):
"""Generate HTML report"""
cmd = [
'semgrep',
'--config', self.config,
'--output', output_file,
str(self.project_path)
]
try:
subprocess.run(cmd, check=True)
return True
except subprocess.CalledProcessError:
return False
def main():
parser = argparse.ArgumentParser(description='Automated Semgrep Scanner')
parser.add_argument('project_path', help='Path to project to scan')
parser.add_argument('--config', default='auto', help='Semgrep configuration')
parser.add_argument('--severity', choices=['ERROR', 'WARNING', 'INFO'],
help='Filter by severity level')
parser.add_argument('--output', help='Output file for results')
parser.add_argument('--format', default='json',
choices=['json', 'sarif', 'text'],
help='Output format')
args = parser.parse_args()
scanner = SemgrepScanner(args.project_path, args.config)
print(f"Scanning \\\\{args.project_path\\\\} with config \\\\{args.config\\\\}...")
success = scanner.run_scan(output_format=args.format, severity_filter=args.severity)
if success:
if args.format == 'json':
summary = scanner.get_summary()
print(f"Scan completed successfully!")
print(f"Total findings: \\\\{summary['total_findings']\\\\}")
print(f"Errors: \\\\{summary['error_count']\\\\}")
print(f"Warnings: \\\\{summary['warning_count']\\\\}")
print(f"Info: \\\\{summary['info_count']\\\\}")
# Show top issues by rule
by_rule = scanner.get_findings_by_rule()
if by_rule:
print("\nTop issues by rule:")
sorted_rules = sorted(by_rule.items(), key=lambda x: len(x[1]), reverse=True)
for rule_id, findings in sorted_rules[:5]:
print(f" \\\\{rule_id\\\\}: \\\\{len(findings)\\\\} findings")
if args.output:
scanner.save_results(args.output)
print(f"Results saved to \\\\{args.output\\\\}")
# Exit with error code if high severity issues found
if args.format == 'json':
summary = scanner.get_summary()
if summary['error_count'] > 0:
print(f"Found \\\\{summary['error_count']\\\\} high severity issues!")
sys.exit(1)
else:
print("Scan failed!")
sys.exit(1)
if __name__ == '__main__':
main()
Script de traitement par lots¶
#!/bin/bash
# batch_semgrep_scan.sh
# Configuration
PROJECTS_DIR="/path/to/projects"
REPORTS_DIR="/path/to/reports"
CONFIG="auto"
DATE=$(date +%Y%m%d_%H%M%S)
# Create reports directory
mkdir -p "$REPORTS_DIR"
# Function to scan project
scan_project() \\\\{
local project_path="$1"
local project_name=$(basename "$project_path")
local report_file="$REPORTS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.json"
local sarif_report="$REPORTS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.sarif"
echo "Scanning $project_name..."
# Run Semgrep scan
semgrep --config="$CONFIG" --json --output="$report_file" "$project_path"
semgrep --config="$CONFIG" --sarif --output="$sarif_report" "$project_path"
# Check for high severity issues
if [ -f "$report_file" ]; then
error_count=$(jq '[.results[]|select(.extra.severity == "ERROR")]|length' "$report_file" 2>/dev/null||echo "0")
if [ "$error_count" -gt 0 ]; then
echo "WARNING: $project_name has $error_count high severity issues!"
echo "$project_name" >> "$REPORTS_DIR/high_severity_projects.txt"
fi
fi
echo "Scan completed for $project_name"
\\\\}
# Find and scan all projects
find "$PROJECTS_DIR" -maxdepth 1 -type d|while read -r project_dir; do
if [ "$project_dir" != "$PROJECTS_DIR" ]; then
scan_project "$project_dir"
fi
done
echo "Batch scanning completed. Reports saved to $REPORTS_DIR"
# Generate summary report
echo "=== Batch Scan Summary ===" > "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Scan Date: $(date)" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Configuration: $CONFIG" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Total projects scanned: $(find "$REPORTS_DIR" -name "*_$\\\\{DATE\\\\}.json"|wc -l)" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
if [ -f "$REPORTS_DIR/high_severity_projects.txt" ]; then
echo "High severity projects: $(wc -l < "$REPORTS_DIR/high_severity_projects.txt")" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
fi
Meilleures pratiques¶
Gestion des règles¶
# .semgrep.yml - Project configuration
rules:
# Security rules
- p/security-audit
- p/owasp-top-ten
- p/secrets
# Language-specific rules
- p/python
- p/javascript
# Custom rules
- rules/custom-security.yml
- rules/custom-performance.yml
exclude:
- "*/tests/*"
- "*/test/*"
- "*/.venv/*"
- "*/venv/*"
- "*/node_modules/*"
- "*/vendor/*"
- "*.min.js"
- "*.min.css"
severity:
- ERROR
- WARNING
Développement des règles douanières¶
# rules/custom-security.yml
rules:
- id: custom-jwt-secret
pattern-either:
- pattern: jwt.encode($PAYLOAD, "...", ...)
- pattern: jwt.decode($TOKEN, "...", ...)
message:|
JWT secret should not be hardcoded. Use environment variables or secure configuration.
languages: [python]
severity: ERROR
metadata:
category: security
cwe: "CWE-798"
owasp: "A02:2021"
confidence: HIGH
fix-regex:
regex: '"[^"]*"'
replacement: 'os.environ.get("JWT_SECRET")'
Optimisation des performances¶
# Optimize for large codebases
semgrep --config=auto --max-target-bytes=1000000 .
# Use specific rules instead of auto
semgrep --config=p/security-audit --config=p/owasp-top-ten .
# Exclude unnecessary files
semgrep --config=auto --exclude="*/node_modules/*" --exclude="*/vendor/*" .
# Parallel processing
semgrep --config=auto --jobs=4 .
Dépannage¶
Questions communes¶
# Issue: Semgrep running slowly
# Solution: Exclude large directories and use specific rules
semgrep --config=p/security-audit --exclude="*/node_modules/*" .
# Issue: Too many false positives
# Solution: Use higher confidence rules and custom exclusions
semgrep --config=p/security-audit --exclude="*/tests/*" .
# Issue: Missing language support
# Solution: Check supported languages and update Semgrep
semgrep --version
pip install --upgrade semgrep
# Issue: Custom rules not working
# Solution: Validate rule syntax
semgrep --validate rules/custom.yml
Mode de débogage¶
# Verbose output
semgrep --config=auto --verbose .
# Debug mode
semgrep --config=auto --debug .
# Dry run (validate rules without scanning)
semgrep --config=auto --dryrun .
# Test specific rule
semgrep --config=rules/custom.yml --test .
Ressources¶
- [Documentation officielle Semgrep] (LINK_5)
- [Répertoire de Semgrep GitHub] (LINK_5)
- [Registre des règles Semgrep] (LINK_5)
- Communauté Semgrep
- [Règles douanières écrites] (LINK_5)
*Cette feuille de triche fournit des conseils complets pour l'utilisation de Semgrep pour trouver les vulnérabilités de sécurité et faire appliquer les normes de code. Les mises à jour régulières des règles et l ' élaboration de règles coutumières améliorent la sécurité. *