Semgrep Static Analysis Tool Cheat Sheet
Überblick
Semgrep ist ein schnelles, Open-Source-Statik-Analyse-Tool, um Fehler, Sicherheitslücken zu finden und Code-Standards in mehreren Programmiersprachen zu forcieren. Es verwendet musterbasierte Analyse mit einer einfachen, intuitiven Syntax, die Entwicklern erlaubt, benutzerdefinierte Regeln einfach zu schreiben. Semgrep ist besonders wertvoll in DevSecOps Pipelines für seine Geschwindigkeit, Genauigkeit und umfangreiche Regelbibliothek für Sicherheits-, Korrektheits- und Leistungsfragen.
ZEIT Anmerkung: Semgrep ist für musterbasierte statische Analyse konzipiert und kann kundenspezifische Regeln für organisatorische Sicherheitsanforderungen verlangen. Sie sollte in CI/CD-Pipelines zur kontinuierlichen Sicherheitsüberwachung integriert werden.
Installation
Verwendung von pip (empfohlen)
```bash
Install Semgrep
pip install semgrep
Install with specific version
pip install semgrep==1.45.0
Install from source
pip install git+https://github.com/returntocorp/semgrep.git
Verify installation
semgrep --version ```_
Verwendung von Homebrew (macOS)
```bash
Install Semgrep
brew install semgrep
Update Semgrep
brew upgrade semgrep ```_
Verwendung von Docker
```bash
Pull Semgrep image
docker pull returntocorp/semgrep
Run Semgrep in container
docker run --rm -v $(pwd):/src returntocorp/semgrep --config=auto /src
Create alias for convenience
alias semgrep='docker run --rm -v $(pwd):/src returntocorp/semgrep'
Build custom image
cat > Dockerfile ``<< 'EOF' FROM returntocorp/semgrep WORKDIR /src ENTRYPOINT ["semgrep"] EOF
docker build -t custom-semgrep . ```_
Paketmanager
```bash
Ubuntu/Debian (via pip)
sudo apt update sudo apt install python3-pip pip3 install semgrep
CentOS/RHEL/Fedora
sudo dnf install python3-pip pip3 install semgrep
Arch Linux
sudo pacman -S python-pip pip install semgrep ```_
Binärinstallation
```bash
Download binary (Linux)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-linux-x86_64 -o semgrep chmod +x semgrep sudo mv semgrep /usr/local/bin/
Download binary (macOS)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-macos-x86_64 -o semgrep chmod +x semgrep sudo mv semgrep /usr/local/bin/ ```_
Basisnutzung
Schneller Start
```bash
Scan with auto-configuration (recommended for beginners)
semgrep --config=auto .
Scan specific directory
semgrep --config=auto /path/to/project
Scan single file
semgrep --config=auto file.py
Scan with specific ruleset
semgrep --config=p/security-audit . semgrep --config=p/owasp-top-ten . semgrep --config=p/cwe-top-25 .
Scan with multiple rulesets
semgrep --config=p/security-audit --config=p/owasp-top-ten . ```_
Ausgabeformate
```bash
Default text output
semgrep --config=auto .
JSON output
semgrep --config=auto --json .
SARIF output (for GitHub integration)
semgrep --config=auto --sarif .
JUnit XML output
semgrep --config=auto --junit-xml .
Emacs output format
semgrep --config=auto --emacs .
Vim output format
semgrep --config=auto --vim .
Save output to file
semgrep --config=auto --json --output=results.json . semgrep --config=auto --sarif --output=results.sarif . ```_
Filtern und Targeting
```bash
Include specific file patterns
semgrep --config=auto --include=".py" . semgrep --config=auto --include=".js" --include="*.ts" .
Exclude specific file patterns
semgrep --config=auto --exclude="test" . semgrep --config=auto --exclude="node_modules" --exclude="vendor" .
Scan specific languages
semgrep --config=auto --lang=python . semgrep --config=auto --lang=javascript . semgrep --config=auto --lang=java .
Severity filtering
semgrep --config=auto --severity=ERROR . semgrep --config=auto --severity=WARNING . semgrep --config=auto --severity=INFO . ```_
Regelkonfiguration
Verwendung von integrierten Regeln
```bash
Security-focused rulesets
semgrep --config=p/security-audit . semgrep --config=p/owasp-top-ten . semgrep --config=p/cwe-top-25 . semgrep --config=p/secrets .
Language-specific rulesets
semgrep --config=p/python . semgrep --config=p/javascript . semgrep --config=p/java . semgrep --config=p/go .
Framework-specific rulesets
semgrep --config=p/django . semgrep --config=p/flask . semgrep --config=p/react . semgrep --config=p/express .
Code quality rulesets
semgrep --config=p/code-quality . semgrep --config=p/performance . semgrep --config=p/correctness .
List available rulesets
semgrep --config=p/ ```_
Zollvorschriften
```yaml
custom-rules.yml
rules: - id: hardcoded-password pattern: password = "..." message: Hardcoded password detected languages: [python] severity: ERROR
-
id: sql-injection pattern-either:
- pattern: cursor.execute("..." + $VAR)
- pattern: cursor.execute(f"...\{$VAR\}...") message: Potential SQL injection vulnerability languages: [python] severity: ERROR
-
id: unsafe-yaml-load pattern: yaml.load($DATA) message: Use yaml.safe_load() instead of yaml.load() languages: [python] severity: WARNING fix: yaml.safe_load($DATA)
-
id: missing-csrf-protection pattern: | class $CLASS(...): ... def post(self, ...): ... pattern-not: | class $CLASS(...): ... @csrf_exempt def post(self, ...): ... message: POST method missing CSRF protection languages: [python] severity: ERROR ```_
Regel Syntax Beispiele
```yaml
Pattern matching
rules: - id: basic-pattern pattern: eval($X) message: Avoid using eval() languages: [python] severity: ERROR
-
id: pattern-either pattern-either:
- pattern: exec($X)
- pattern: eval($X) message: Avoid using exec() or eval() languages: [python] severity: ERROR
-
id: pattern-inside pattern-inside: | def $FUNC(...): ... pattern: return $X message: Function returns value languages: [python] severity: INFO
-
id: pattern-not pattern: requests.get($URL) pattern-not: requests.get($URL, verify=True) message: HTTPS request without certificate verification languages: [python] severity: WARNING
-
id: metavariable-regex pattern: $FUNC($ARG) metavariable-regex: metavariable: $FUNC regex: ^(exec|eval)$ message: Dangerous function call languages: [python] severity: ERROR ```_
Erweiterte Nutzung
Konfigurationsdateien
```yaml
.semgrep.yml
rules: - rules/security - rules/performance
exclude: - "/tests/" - "/node_modules/" - "/vendor/" - "*.min.js"
include: - ".py" - ".js" - ".java" - ".go"
severity: - ERROR - WARNING ```_
Artikel 1
```yaml
advanced-rules.yml
rules: - id: jwt-hardcoded-secret pattern-either: - pattern: jwt.encode($PAYLOAD, "...", ...) - pattern: jwt.decode($TOKEN, "...", ...) message: JWT secret should not be hardcoded languages: [python] severity: ERROR metadata: cwe: "CWE-798: Use of Hard-coded Credentials" owasp: "A02:2021 - Cryptographic Failures"
-
id: unsafe-deserialization pattern-either:
- pattern: pickle.loads($DATA)
- pattern: pickle.load($FILE)
- pattern: cPickle.loads($DATA) message: Unsafe deserialization with pickle languages: [python] severity: ERROR metadata: cwe: "CWE-502: Deserialization of Untrusted Data"
-
id: command-injection pattern-either:
- pattern: os.system($CMD)
- pattern: subprocess.call($CMD, shell=True)
- pattern: subprocess.run($CMD, shell=True) pattern-not-inside: | $CMD = "..." message: Potential command injection vulnerability languages: [python] severity: ERROR fix-regex: regex: 'shell=True' replacement: 'shell=False' ```_
Taint Analyse
```yaml
taint-rules.yml
rules: - id: user-input-to-sql mode: taint pattern-sources: - pattern: request.args.get(...) - pattern: request.form.get(...) - pattern: request.json.get(...) pattern-sinks: - pattern: cursor.execute($QUERY) - pattern: db.execute($QUERY) message: User input flows to SQL query languages: [python] severity: ERROR
- id: user-input-to-eval
mode: taint
pattern-sources:
- pattern: input(...)
- pattern: sys.argv[...] pattern-sinks:
- pattern: eval($CODE)
- pattern: exec($CODE) message: User input flows to code execution languages: [python] severity: ERROR ```_
CI/CD Integration
GitHub Aktionen
```yaml
.github/workflows/semgrep.yml
name: Semgrep Security Scan
on: push: branches: [ main, develop ] pull_request: branches: [ main ]
jobs: semgrep: name: Scan runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v3
- name: Run Semgrep
run: |
semgrep \
--config=auto \
--sarif \
--output=semgrep-results.sarif \
.
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep-results.sarif
if: always()
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: semgrep-report
path: semgrep-results.sarif
```_
GitLab CI
```yaml
.gitlab-ci.yml
stages: - security
semgrep: stage: security image: returntocorp/semgrep script: - semgrep --config=auto --json --output=semgrep-report.json . artifacts: reports: sast: semgrep-report.json paths: - semgrep-report.json expire_in: 1 week allow_failure: true ```_
Jenkins Pipeline
```groovy // Jenkinsfile pipeline \{ agent any
stages \\\{
stage('Security Scan') \\\{
steps \\\{
script \\\{
docker.image('returntocorp/semgrep').inside \\\{
sh 'semgrep --config=auto --json --output=semgrep-results.json .'
sh 'semgrep --config=auto --sarif --output=semgrep-results.sarif .'
\\\}
\\\}
\\\}
post \\\{
always \\\{
archiveArtifacts artifacts: 'semgrep-results.*', fingerprint: true
// Parse results and fail build if high severity issues found
script \\\{
def results = readJSON file: 'semgrep-results.json'
def errors = results.results.findAll \\\{ it.extra.severity == 'ERROR' \\\}
if (errors.size() >`` 0) \\\\{
currentBuild.result = 'FAILURE'
error("Found $\\\\{errors.size()\\\\} high severity security issues")
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\} ```_
Azure DevOs
```yaml
azure-pipelines.yml
trigger: - main
pool: vmImage: 'ubuntu-latest'
container: returntocorp/semgrep
steps: - checkout: self
-
script: | semgrep --config=auto --json --output=$(Agent.TempDirectory)/semgrep-results.json . semgrep --config=auto --sarif --output=$(Agent.TempDirectory)/semgrep-results.sarif . displayName: 'Run Semgrep Security Scan'
-
task: PublishTestResults@2 inputs: testResultsFormat: 'JUnit' testResultsFiles: '$(Agent.TempDirectory)/semgrep-results.sarif' testRunTitle: 'Semgrep Security Scan' condition: always() ```_
Precommit Hook
```yaml
.pre-commit-config.yaml
repos: - repo: https://github.com/returntocorp/semgrep rev: 'v1.45.0' hooks: - id: semgrep args: ['--config=auto', '--error'] ```_
Sprach-spezifische Nutzung
Python Projekte
```bash
Python security scan
semgrep --config=p/python --config=p/flask --config=p/django .
Python-specific rules
semgrep --config=p/bandit . semgrep --config=p/secrets .
Custom Python rules
cat > python-rules.yml << 'EOF' rules: - id: flask-debug-mode pattern: app.run(debug=True) message: Flask debug mode should not be enabled in production languages: [python] severity: ERROR
- id: django-debug-setting pattern: DEBUG = True message: Django DEBUG should be False in production languages: [python] severity: ERROR EOF
semgrep --config=python-rules.yml . ```_
JavaScript/TypScript Projekte
```bash
JavaScript security scan
semgrep --config=p/javascript --config=p/typescript .
Framework-specific scans
semgrep --config=p/react . semgrep --config=p/express . semgrep --config=p/nodejs .
Custom JavaScript rules
cat > js-rules.yml << 'EOF' rules: - id: eval-usage pattern-either: - pattern: eval($X) - pattern: Function($X) message: Avoid using eval() or Function() constructor languages: [javascript, typescript] severity: ERROR
- id: innerHTML-xss pattern: $EL.innerHTML = $VAR message: Potential XSS vulnerability with innerHTML languages: [javascript, typescript] severity: WARNING EOF
semgrep --config=js-rules.yml . ```_
Java Projekte
```bash
Java security scan
semgrep --config=p/java . semgrep --config=p/spring .
Custom Java rules
cat > java-rules.yml << 'EOF' rules: - id: sql-injection-java pattern:| Statement $STMT = ...; ... $STMT.executeQuery($QUERY + ...) message: Potential SQL injection vulnerability languages: [java] severity: ERROR
- id: hardcoded-password-java pattern:| String $VAR = "..."; metavariable-regex: metavariable: $VAR | regex: (?i)(password | passwd | pwd) | message: Hardcoded password detected languages: [java] severity: ERROR EOF
semgrep --config=java-rules.yml . ```_
Automatisierung und Schrift
Automatischer Sicherheitsscanner
```python
!/usr/bin/env python3
semgrep_scanner.py
import subprocess import json import sys import argparse from pathlib import Path
class SemgrepScanner: def init(self, project_path, config='auto'): self.project_path = Path(project_path) self.config = config self.results = \\{\\}
def run_scan(self, output_format='json', severity_filter=None):
"""Run Semgrep scan with specified parameters"""
cmd = [
'semgrep',
'--config', self.config,
f'--\\\\{output_format\\\\}',
str(self.project_path)
]
if severity_filter:
cmd.extend(['--severity', severity_filter])
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=False)
if output_format == 'json':
self.results = json.loads(result.stdout) if result.stdout else \\\\{\\\\}
else:
self.results = result.stdout
return result.returncode == 0
except subprocess.CalledProcessError as e:
print(f"Error running Semgrep: \\\\{e\\\\}")
return False
except json.JSONDecodeError as e:
print(f"Error parsing JSON output: \\\\{e\\\\}")
return False
def get_summary(self):
"""Get scan summary"""
if not isinstance(self.results, dict):
return "No results available"
findings = self.results.get('results', [])
summary = \\\\{
'total_findings': len(findings),
'error_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'ERROR']),
'warning_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'WARNING']),
'info_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'INFO'])
\\\\}
return summary
def get_findings_by_severity(self, severity='ERROR'):
"""Get findings filtered by severity"""
if not isinstance(self.results, dict):
return []
findings = self.results.get('results', [])
return [f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == severity]
def get_findings_by_rule(self):
"""Group findings by rule ID"""
if not isinstance(self.results, dict):
return \\\\{\\\\}
findings = self.results.get('results', [])
by_rule = \\\\{\\\\}
for finding in findings:
rule_id = finding.get('check_id', 'unknown')
if rule_id not in by_rule:
by_rule[rule_id] = []
by_rule[rule_id].append(finding)
return by_rule
def save_results(self, output_file='semgrep_results.json'):
"""Save results to file"""
if isinstance(self.results, dict):
with open(output_file, 'w') as f:
json.dump(self.results, f, indent=2)
else:
with open(output_file, 'w') as f:
f.write(str(self.results))
def generate_report(self, output_file='semgrep_report.html'):
"""Generate HTML report"""
cmd = [
'semgrep',
'--config', self.config,
'--output', output_file,
str(self.project_path)
]
try:
subprocess.run(cmd, check=True)
return True
except subprocess.CalledProcessError:
return False
def main(): parser = argparse.ArgumentParser(description='Automated Semgrep Scanner') parser.add_argument('project_path', help='Path to project to scan') parser.add_argument('--config', default='auto', help='Semgrep configuration') parser.add_argument('--severity', choices=['ERROR', 'WARNING', 'INFO'], help='Filter by severity level') parser.add_argument('--output', help='Output file for results') parser.add_argument('--format', default='json', choices=['json', 'sarif', 'text'], help='Output format')
args = parser.parse_args()
scanner = SemgrepScanner(args.project_path, args.config)
print(f"Scanning \\\\{args.project_path\\\\} with config \\\\{args.config\\\\}...")
success = scanner.run_scan(output_format=args.format, severity_filter=args.severity)
if success:
if args.format == 'json':
summary = scanner.get_summary()
print(f"Scan completed successfully!")
print(f"Total findings: \\\\{summary['total_findings']\\\\}")
print(f"Errors: \\\\{summary['error_count']\\\\}")
print(f"Warnings: \\\\{summary['warning_count']\\\\}")
print(f"Info: \\\\{summary['info_count']\\\\}")
# Show top issues by rule
by_rule = scanner.get_findings_by_rule()
if by_rule:
print("\nTop issues by rule:")
sorted_rules = sorted(by_rule.items(), key=lambda x: len(x[1]), reverse=True)
for rule_id, findings in sorted_rules[:5]:
print(f" \\\\{rule_id\\\\}: \\\\{len(findings)\\\\} findings")
if args.output:
scanner.save_results(args.output)
print(f"Results saved to \\\\{args.output\\\\}")
# Exit with error code if high severity issues found
if args.format == 'json':
summary = scanner.get_summary()
if summary['error_count'] > 0:
print(f"Found \\\\{summary['error_count']\\\\} high severity issues!")
sys.exit(1)
else:
print("Scan failed!")
sys.exit(1)
if name == 'main': main() ```_
Batch Processing Script
```bash
!/bin/bash
batch_semgrep_scan.sh
Configuration
PROJECTS_DIR="/path/to/projects" REPORTS_DIR="/path/to/reports" CONFIG="auto" DATE=$(date +%Y%m%d_%H%M%S)
Create reports directory
mkdir -p "$REPORTS_DIR"
Function to scan project
scan_project() \\{ local project_path="$1" local project_name=$(basename "$project_path") local report_file="$REPORTS_DIR/$\\{project_name\\}$\\{DATE\\}.json" local sarif_report="$REPORTS_DIR/$\\{project_name\\}$\\{DATE\\}.sarif"
echo "Scanning $project_name..."
# Run Semgrep scan
semgrep --config="$CONFIG" --json --output="$report_file" "$project_path"
semgrep --config="$CONFIG" --sarif --output="$sarif_report" "$project_path"
# Check for high severity issues
if [ -f "$report_file" ]; then
| error_count=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' "$report_file" 2>/dev/null | | echo "0") |
if [ "$error_count" -gt 0 ]; then
echo "WARNING: $project_name has $error_count high severity issues!"
echo "$project_name" >> "$REPORTS_DIR/high_severity_projects.txt"
fi
fi
echo "Scan completed for $project_name"
\\}
Find and scan all projects
find "$PROJECTS_DIR" -maxdepth 1 -type d|while read -r project_dir; do if [ "$project_dir" != "$PROJECTS_DIR" ]; then scan_project "$project_dir" fi done
echo "Batch scanning completed. Reports saved to $REPORTS_DIR"
Generate summary report
echo "=== Batch Scan Summary ===" > "$REPORTS_DIR/summary_$\\{DATE\\}.txt" echo "Scan Date: $(date)" >> "$REPORTS_DIR/summary_$\\{DATE\\}.txt" echo "Configuration: $CONFIG" >> "$REPORTS_DIR/summary_$\\{DATE\\}.txt" echo "Total projects scanned: $(find "$REPORTS_DIR" -name "*$\\{DATE\\}.json"|wc -l)" >> "$REPORTS_DIR/summary$\\{DATE\\}.txt"
if [ -f "$REPORTS_DIR/high_severity_projects.txt" ]; then echo "High severity projects: $(wc -l < "$REPORTS_DIR/high_severity_projects.txt")" >> "$REPORTS_DIR/summary_$\\{DATE\\}.txt" fi ```_
Best Practices
Regelverwaltung
```yaml
.semgrep.yml - Project configuration
rules: # Security rules - p/security-audit - p/owasp-top-ten - p/secrets
# Language-specific rules - p/python - p/javascript
# Custom rules - rules/custom-security.yml - rules/custom-performance.yml
exclude: - "/tests/" - "/test/" - "/.venv/" - "/venv/" - "/node_modules/" - "/vendor/" - ".min.js" - ".min.css"
severity: - ERROR - WARNING ```_
Artikel 1
```yaml
rules/custom-security.yml
rules: - id: custom-jwt-secret pattern-either: - pattern: jwt.encode($PAYLOAD, "...", ...) - pattern: jwt.decode($TOKEN, "...", ...) message: | JWT secret should not be hardcoded. Use environment variables or secure configuration. languages: [python] severity: ERROR metadata: category: security cwe: "CWE-798" owasp: "A02:2021" confidence: HIGH fix-regex: regex: '"[^"]*"' replacement: 'os.environ.get("JWT_SECRET")' ```_
Leistungsoptimierung
```bash
Optimize for large codebases
semgrep --config=auto --max-target-bytes=1000000 .
Use specific rules instead of auto
semgrep --config=p/security-audit --config=p/owasp-top-ten .
Exclude unnecessary files
semgrep --config=auto --exclude="/node_modules/" --exclude="/vendor/" .
Parallel processing
semgrep --config=auto --jobs=4 . ```_
Fehlerbehebung
Gemeinsame Themen
```bash
Issue: Semgrep running slowly
Solution: Exclude large directories and use specific rules
semgrep --config=p/security-audit --exclude="/node_modules/" .
Issue: Too many false positives
Solution: Use higher confidence rules and custom exclusions
semgrep --config=p/security-audit --exclude="/tests/" .
Issue: Missing language support
Solution: Check supported languages and update Semgrep
semgrep --version pip install --upgrade semgrep
Issue: Custom rules not working
Solution: Validate rule syntax
semgrep --validate rules/custom.yml ```_
Debug Mode
```bash
Verbose output
semgrep --config=auto --verbose .
Debug mode
semgrep --config=auto --debug .
Dry run (validate rules without scanning)
semgrep --config=auto --dryrun .
Test specific rule
semgrep --config=rules/custom.yml --test . ```_
Ressourcen
- Beamtliche Dokumentation
- Semgrep GitHub Repository
- Semgrep Rule Registry
- [Semgrep Community](_LINK_5___ -%20Benutzerdefinierte%20Regeln%20schreiben
--
*Dieses Betrugsblatt bietet umfassende Anleitung für die Verwendung von Semgrep, um Sicherheitslücken zu finden und Code Standards durchzusetzen. Regelmäßige Regelaktualisierungen und benutzerdefinierte Regelentwicklung erhöhen die Sicherheitsabdeckung. *