Herramienta de análisis estadístico de semgrep¶
"Clase de la hoja" idbutton id="semgrep-copy-btn" class="copy-btn" onclick="copyAllCommands()" Copiar todos los comandos id="semgrep-pdf-btn" class="pdf-btn" onclick="generatePDF()" Generar PDF seleccionado/button ■/div titulada
Sinopsis¶
Semgrep es una herramienta de análisis estática de código abierto y rápido para encontrar errores, vulnerabilidades de seguridad y cumplimiento de estándares de código en varios idiomas de programación. Utiliza el análisis basado en patrones con una sintaxis sencilla e intuitiva que permite a los desarrolladores escribir reglas personalizadas fácilmente. Semgrep es particularmente valioso en los oleoductos DevSecOps por su velocidad, precisión y amplia biblioteca de reglas que abarcan cuestiones de seguridad, corrección y rendimiento.
NOVEDAD Nota: Semgrep está diseñado para el análisis estático basado en patrones y puede requerir reglas personalizadas para requisitos de seguridad específicos para la organización. Debe integrarse en los oleoductos CI/CD para la vigilancia continua de la seguridad.
Instalación¶
Usando pip (Recomendado)¶
# Install Semgrep
pip install semgrep
# Install with specific version
pip install semgrep==1.45.0
# Install from source
pip install git+https://github.com/returntocorp/semgrep.git
# Verify installation
semgrep --version
Utilizando Homebrew (macOS)¶
Usando Docker¶
# Pull Semgrep image
docker pull returntocorp/semgrep
# Run Semgrep in container
docker run --rm -v $(pwd):/src returntocorp/semgrep --config=auto /src
# Create alias for convenience
alias semgrep='docker run --rm -v $(pwd):/src returntocorp/semgrep'
# Build custom image
cat > Dockerfile ``<< 'EOF'
FROM returntocorp/semgrep
WORKDIR /src
ENTRYPOINT ["semgrep"]
EOF
docker build -t custom-semgrep .
Administradores de paquetes¶
# Ubuntu/Debian (via pip)
sudo apt update
sudo apt install python3-pip
pip3 install semgrep
# CentOS/RHEL/Fedora
sudo dnf install python3-pip
pip3 install semgrep
# Arch Linux
sudo pacman -S python-pip
pip install semgrep
Instalación binaria¶
# Download binary (Linux)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-linux-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/
# Download binary (macOS)
curl -L https://github.com/returntocorp/semgrep/releases/latest/download/semgrep-macos-x86_64 -o semgrep
chmod +x semgrep
sudo mv semgrep /usr/local/bin/
Uso básico¶
Inicio rápido¶
# Scan with auto-configuration (recommended for beginners)
semgrep --config=auto .
# Scan specific directory
semgrep --config=auto /path/to/project
# Scan single file
semgrep --config=auto file.py
# Scan with specific ruleset
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .
# Scan with multiple rulesets
semgrep --config=p/security-audit --config=p/owasp-top-ten .
Formatos de salida¶
# Default text output
semgrep --config=auto .
# JSON output
semgrep --config=auto --json .
# SARIF output (for GitHub integration)
semgrep --config=auto --sarif .
# JUnit XML output
semgrep --config=auto --junit-xml .
# Emacs output format
semgrep --config=auto --emacs .
# Vim output format
semgrep --config=auto --vim .
# Save output to file
semgrep --config=auto --json --output=results.json .
semgrep --config=auto --sarif --output=results.sarif .
Filtración y destino¶
# Include specific file patterns
semgrep --config=auto --include="*.py" .
semgrep --config=auto --include="*.js" --include="*.ts" .
# Exclude specific file patterns
semgrep --config=auto --exclude="*test*" .
semgrep --config=auto --exclude="node_modules" --exclude="vendor" .
# Scan specific languages
semgrep --config=auto --lang=python .
semgrep --config=auto --lang=javascript .
semgrep --config=auto --lang=java .
# Severity filtering
semgrep --config=auto --severity=ERROR .
semgrep --config=auto --severity=WARNING .
semgrep --config=auto --severity=INFO .
Configuración de reglas¶
Utilizando las reglas incorporadas¶
# Security-focused rulesets
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/cwe-top-25 .
semgrep --config=p/secrets .
# Language-specific rulesets
semgrep --config=p/python .
semgrep --config=p/javascript .
semgrep --config=p/java .
semgrep --config=p/go .
# Framework-specific rulesets
semgrep --config=p/django .
semgrep --config=p/flask .
semgrep --config=p/react .
semgrep --config=p/express .
# Code quality rulesets
semgrep --config=p/code-quality .
semgrep --config=p/performance .
semgrep --config=p/correctness .
# List available rulesets
semgrep --config=p/
Normas de aduana¶
# custom-rules.yml
rules:
- id: hardcoded-password
pattern: password = "..."
message: Hardcoded password detected
languages: [python]
severity: ERROR
- id: sql-injection
pattern-either:
- pattern: cursor.execute("..." + $VAR)
- pattern: cursor.execute(f"...\\\{$VAR\\\}...")
message: Potential SQL injection vulnerability
languages: [python]
severity: ERROR
- id: unsafe-yaml-load
pattern: yaml.load($DATA)
message: Use yaml.safe_load() instead of yaml.load()
languages: [python]
severity: WARNING
fix: yaml.safe_load($DATA)
- id: missing-csrf-protection
pattern:|
class $CLASS(...):
...
def post(self, ...):
...
pattern-not:|
class $CLASS(...):
...
@csrf_exempt
def post(self, ...):
...
message: POST method missing CSRF protection
languages: [python]
severity: ERROR
Ejemplos de sintaxis de reglas¶
# Pattern matching
rules:
- id: basic-pattern
pattern: eval($X)
message: Avoid using eval()
languages: [python]
severity: ERROR
- id: pattern-either
pattern-either:
- pattern: exec($X)
- pattern: eval($X)
message: Avoid using exec() or eval()
languages: [python]
severity: ERROR
- id: pattern-inside
pattern-inside:|
def $FUNC(...):
...
pattern: return $X
message: Function returns value
languages: [python]
severity: INFO
- id: pattern-not
pattern: requests.get($URL)
pattern-not: requests.get($URL, verify=True)
message: HTTPS request without certificate verification
languages: [python]
severity: WARNING
- id: metavariable-regex
pattern: $FUNC($ARG)
metavariable-regex:
metavariable: $FUNC
regex: ^(exec|eval)$
message: Dangerous function call
languages: [python]
severity: ERROR
Uso avanzado¶
Archivos de configuración¶
# .semgrep.yml
rules:
- rules/security
- rules/performance
exclude:
- "*/tests/*"
- "*/node_modules/*"
- "*/vendor/*"
- "*.min.js"
include:
- "*.py"
- "*.js"
- "*.java"
- "*.go"
severity:
- ERROR
- WARNING
Custom Rule Development¶
# advanced-rules.yml
rules:
- id: jwt-hardcoded-secret
pattern-either:
- pattern: jwt.encode($PAYLOAD, "...", ...)
- pattern: jwt.decode($TOKEN, "...", ...)
message: JWT secret should not be hardcoded
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-798: Use of Hard-coded Credentials"
owasp: "A02:2021 – Cryptographic Failures"
- id: unsafe-deserialization
pattern-either:
- pattern: pickle.loads($DATA)
- pattern: pickle.load($FILE)
- pattern: cPickle.loads($DATA)
message: Unsafe deserialization with pickle
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-502: Deserialization of Untrusted Data"
- id: command-injection
pattern-either:
- pattern: os.system($CMD)
- pattern: subprocess.call($CMD, shell=True)
- pattern: subprocess.run($CMD, shell=True)
pattern-not-inside:|
$CMD = "..."
message: Potential command injection vulnerability
languages: [python]
severity: ERROR
fix-regex:
regex: 'shell=True'
replacement: 'shell=False'
Taint Analysis¶
# taint-rules.yml
rules:
- id: user-input-to-sql
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
- pattern: request.json.get(...)
pattern-sinks:
- pattern: cursor.execute($QUERY)
- pattern: db.execute($QUERY)
message: User input flows to SQL query
languages: [python]
severity: ERROR
- id: user-input-to-eval
mode: taint
pattern-sources:
- pattern: input(...)
- pattern: sys.argv[...]
pattern-sinks:
- pattern: eval($CODE)
- pattern: exec($CODE)
message: User input flows to code execution
languages: [python]
severity: ERROR
CI/CD Integration¶
GitHub Actions¶
# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
semgrep:
name: Scan
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v3
- name: Run Semgrep
run:|
semgrep \
--config=auto \
--sarif \
--output=semgrep-results.sarif \
.
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep-results.sarif
if: always()
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: semgrep-report
path: semgrep-results.sarif
GitLab CI¶
# .gitlab-ci.yml
stages:
- security
semgrep:
stage: security
image: returntocorp/semgrep
script:
- semgrep --config=auto --json --output=semgrep-report.json .
artifacts:
reports:
sast: semgrep-report.json
paths:
- semgrep-report.json
expire_in: 1 week
allow_failure: true
Jenkins Pipeline¶
// Jenkinsfile
pipeline \\\{
agent any
stages \\\{
stage('Security Scan') \\\{
steps \\\{
script \\\{
docker.image('returntocorp/semgrep').inside \\\{
sh 'semgrep --config=auto --json --output=semgrep-results.json .'
sh 'semgrep --config=auto --sarif --output=semgrep-results.sarif .'
\\\}
\\\}
\\\}
post \\\{
always \\\{
archiveArtifacts artifacts: 'semgrep-results.*', fingerprint: true
// Parse results and fail build if high severity issues found
script \\\{
def results = readJSON file: 'semgrep-results.json'
def errors = results.results.findAll \\\{ it.extra.severity == 'ERROR' \\\}
if (errors.size() >`` 0) \\\\{
currentBuild.result = 'FAILURE'
error("Found $\\\\{errors.size()\\\\} high severity security issues")
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
Azure DevOps¶
# azure-pipelines.yml
trigger:
- main
pool:
vmImage: 'ubuntu-latest'
container: returntocorp/semgrep
steps:
- checkout: self
- script:|
semgrep --config=auto --json --output=$(Agent.TempDirectory)/semgrep-results.json .
semgrep --config=auto --sarif --output=$(Agent.TempDirectory)/semgrep-results.sarif .
displayName: 'Run Semgrep Security Scan'
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: '$(Agent.TempDirectory)/semgrep-results.sarif'
testRunTitle: 'Semgrep Security Scan'
condition: always()
Pre-commit Hook¶
# .pre-commit-config.yaml
repos:
- repo: https://github.com/returntocorp/semgrep
rev: 'v1.45.0'
hooks:
- id: semgrep
args: ['--config=auto', '--error']
Uso del lenguaje-específico¶
Python Projects¶
# Python security scan
semgrep --config=p/python --config=p/flask --config=p/django .
# Python-specific rules
semgrep --config=p/bandit .
semgrep --config=p/secrets .
# Custom Python rules
cat > python-rules.yml << 'EOF'
rules:
- id: flask-debug-mode
pattern: app.run(debug=True)
message: Flask debug mode should not be enabled in production
languages: [python]
severity: ERROR
- id: django-debug-setting
pattern: DEBUG = True
message: Django DEBUG should be False in production
languages: [python]
severity: ERROR
EOF
semgrep --config=python-rules.yml .
JavaScript/TypeScript Proyectos¶
# JavaScript security scan
semgrep --config=p/javascript --config=p/typescript .
# Framework-specific scans
semgrep --config=p/react .
semgrep --config=p/express .
semgrep --config=p/nodejs .
# Custom JavaScript rules
cat > js-rules.yml << 'EOF'
rules:
- id: eval-usage
pattern-either:
- pattern: eval($X)
- pattern: Function($X)
message: Avoid using eval() or Function() constructor
languages: [javascript, typescript]
severity: ERROR
- id: innerHTML-xss
pattern: $EL.innerHTML = $VAR
message: Potential XSS vulnerability with innerHTML
languages: [javascript, typescript]
severity: WARNING
EOF
semgrep --config=js-rules.yml .
Java Projects¶
# Java security scan
semgrep --config=p/java .
semgrep --config=p/spring .
# Custom Java rules
cat > java-rules.yml << 'EOF'
rules:
- id: sql-injection-java
pattern:|
Statement $STMT = ...;
...
$STMT.executeQuery($QUERY + ...)
message: Potential SQL injection vulnerability
languages: [java]
severity: ERROR
- id: hardcoded-password-java
pattern:|
String $VAR = "...";
metavariable-regex:
metavariable: $VAR
regex: (?i)(password|passwd|pwd)
message: Hardcoded password detected
languages: [java]
severity: ERROR
EOF
semgrep --config=java-rules.yml .
Automatización y scripting¶
Escáner de seguridad automatizado¶
#!/usr/bin/env python3
# semgrep_scanner.py
import subprocess
import json
import sys
import argparse
from pathlib import Path
class SemgrepScanner:
def __init__(self, project_path, config='auto'):
self.project_path = Path(project_path)
self.config = config
self.results = \\\\{\\\\}
def run_scan(self, output_format='json', severity_filter=None):
"""Run Semgrep scan with specified parameters"""
cmd = [
'semgrep',
'--config', self.config,
f'--\\\\{output_format\\\\}',
str(self.project_path)
]
if severity_filter:
cmd.extend(['--severity', severity_filter])
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=False)
if output_format == 'json':
self.results = json.loads(result.stdout) if result.stdout else \\\\{\\\\}
else:
self.results = result.stdout
return result.returncode == 0
except subprocess.CalledProcessError as e:
print(f"Error running Semgrep: \\\\{e\\\\}")
return False
except json.JSONDecodeError as e:
print(f"Error parsing JSON output: \\\\{e\\\\}")
return False
def get_summary(self):
"""Get scan summary"""
if not isinstance(self.results, dict):
return "No results available"
findings = self.results.get('results', [])
summary = \\\\{
'total_findings': len(findings),
'error_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'ERROR']),
'warning_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'WARNING']),
'info_count': len([f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == 'INFO'])
\\\\}
return summary
def get_findings_by_severity(self, severity='ERROR'):
"""Get findings filtered by severity"""
if not isinstance(self.results, dict):
return []
findings = self.results.get('results', [])
return [f for f in findings if f.get('extra', \\\\{\\\\}).get('severity') == severity]
def get_findings_by_rule(self):
"""Group findings by rule ID"""
if not isinstance(self.results, dict):
return \\\\{\\\\}
findings = self.results.get('results', [])
by_rule = \\\\{\\\\}
for finding in findings:
rule_id = finding.get('check_id', 'unknown')
if rule_id not in by_rule:
by_rule[rule_id] = []
by_rule[rule_id].append(finding)
return by_rule
def save_results(self, output_file='semgrep_results.json'):
"""Save results to file"""
if isinstance(self.results, dict):
with open(output_file, 'w') as f:
json.dump(self.results, f, indent=2)
else:
with open(output_file, 'w') as f:
f.write(str(self.results))
def generate_report(self, output_file='semgrep_report.html'):
"""Generate HTML report"""
cmd = [
'semgrep',
'--config', self.config,
'--output', output_file,
str(self.project_path)
]
try:
subprocess.run(cmd, check=True)
return True
except subprocess.CalledProcessError:
return False
def main():
parser = argparse.ArgumentParser(description='Automated Semgrep Scanner')
parser.add_argument('project_path', help='Path to project to scan')
parser.add_argument('--config', default='auto', help='Semgrep configuration')
parser.add_argument('--severity', choices=['ERROR', 'WARNING', 'INFO'],
help='Filter by severity level')
parser.add_argument('--output', help='Output file for results')
parser.add_argument('--format', default='json',
choices=['json', 'sarif', 'text'],
help='Output format')
args = parser.parse_args()
scanner = SemgrepScanner(args.project_path, args.config)
print(f"Scanning \\\\{args.project_path\\\\} with config \\\\{args.config\\\\}...")
success = scanner.run_scan(output_format=args.format, severity_filter=args.severity)
if success:
if args.format == 'json':
summary = scanner.get_summary()
print(f"Scan completed successfully!")
print(f"Total findings: \\\\{summary['total_findings']\\\\}")
print(f"Errors: \\\\{summary['error_count']\\\\}")
print(f"Warnings: \\\\{summary['warning_count']\\\\}")
print(f"Info: \\\\{summary['info_count']\\\\}")
# Show top issues by rule
by_rule = scanner.get_findings_by_rule()
if by_rule:
print("\nTop issues by rule:")
sorted_rules = sorted(by_rule.items(), key=lambda x: len(x[1]), reverse=True)
for rule_id, findings in sorted_rules[:5]:
print(f" \\\\{rule_id\\\\}: \\\\{len(findings)\\\\} findings")
if args.output:
scanner.save_results(args.output)
print(f"Results saved to \\\\{args.output\\\\}")
# Exit with error code if high severity issues found
if args.format == 'json':
summary = scanner.get_summary()
if summary['error_count'] > 0:
print(f"Found \\\\{summary['error_count']\\\\} high severity issues!")
sys.exit(1)
else:
print("Scan failed!")
sys.exit(1)
if __name__ == '__main__':
main()
Batch Processing Script¶
#!/bin/bash
# batch_semgrep_scan.sh
# Configuration
PROJECTS_DIR="/path/to/projects"
REPORTS_DIR="/path/to/reports"
CONFIG="auto"
DATE=$(date +%Y%m%d_%H%M%S)
# Create reports directory
mkdir -p "$REPORTS_DIR"
# Function to scan project
scan_project() \\\\{
local project_path="$1"
local project_name=$(basename "$project_path")
local report_file="$REPORTS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.json"
local sarif_report="$REPORTS_DIR/$\\\\{project_name\\\\}_$\\\\{DATE\\\\}.sarif"
echo "Scanning $project_name..."
# Run Semgrep scan
semgrep --config="$CONFIG" --json --output="$report_file" "$project_path"
semgrep --config="$CONFIG" --sarif --output="$sarif_report" "$project_path"
# Check for high severity issues
if [ -f "$report_file" ]; then
error_count=$(jq '[.results[]|select(.extra.severity == "ERROR")]|length' "$report_file" 2>/dev/null||echo "0")
if [ "$error_count" -gt 0 ]; then
echo "WARNING: $project_name has $error_count high severity issues!"
echo "$project_name" >> "$REPORTS_DIR/high_severity_projects.txt"
fi
fi
echo "Scan completed for $project_name"
\\\\}
# Find and scan all projects
find "$PROJECTS_DIR" -maxdepth 1 -type d|while read -r project_dir; do
if [ "$project_dir" != "$PROJECTS_DIR" ]; then
scan_project "$project_dir"
fi
done
echo "Batch scanning completed. Reports saved to $REPORTS_DIR"
# Generate summary report
echo "=== Batch Scan Summary ===" > "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Scan Date: $(date)" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Configuration: $CONFIG" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
echo "Total projects scanned: $(find "$REPORTS_DIR" -name "*_$\\\\{DATE\\\\}.json"|wc -l)" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
if [ -f "$REPORTS_DIR/high_severity_projects.txt" ]; then
echo "High severity projects: $(wc -l < "$REPORTS_DIR/high_severity_projects.txt")" >> "$REPORTS_DIR/summary_$\\\\{DATE\\\\}.txt"
fi
Buenas prácticas¶
Gestión de las normas¶
# .semgrep.yml - Project configuration
rules:
# Security rules
- p/security-audit
- p/owasp-top-ten
- p/secrets
# Language-specific rules
- p/python
- p/javascript
# Custom rules
- rules/custom-security.yml
- rules/custom-performance.yml
exclude:
- "*/tests/*"
- "*/test/*"
- "*/.venv/*"
- "*/venv/*"
- "*/node_modules/*"
- "*/vendor/*"
- "*.min.js"
- "*.min.css"
severity:
- ERROR
- WARNING
Custom Rule Development¶
# rules/custom-security.yml
rules:
- id: custom-jwt-secret
pattern-either:
- pattern: jwt.encode($PAYLOAD, "...", ...)
- pattern: jwt.decode($TOKEN, "...", ...)
message:|
JWT secret should not be hardcoded. Use environment variables or secure configuration.
languages: [python]
severity: ERROR
metadata:
category: security
cwe: "CWE-798"
owasp: "A02:2021"
confidence: HIGH
fix-regex:
regex: '"[^"]*"'
replacement: 'os.environ.get("JWT_SECRET")'
Optimización del rendimiento¶
# Optimize for large codebases
semgrep --config=auto --max-target-bytes=1000000 .
# Use specific rules instead of auto
semgrep --config=p/security-audit --config=p/owasp-top-ten .
# Exclude unnecessary files
semgrep --config=auto --exclude="*/node_modules/*" --exclude="*/vendor/*" .
# Parallel processing
semgrep --config=auto --jobs=4 .
Solución de problemas¶
Cuestiones comunes¶
# Issue: Semgrep running slowly
# Solution: Exclude large directories and use specific rules
semgrep --config=p/security-audit --exclude="*/node_modules/*" .
# Issue: Too many false positives
# Solution: Use higher confidence rules and custom exclusions
semgrep --config=p/security-audit --exclude="*/tests/*" .
# Issue: Missing language support
# Solution: Check supported languages and update Semgrep
semgrep --version
pip install --upgrade semgrep
# Issue: Custom rules not working
# Solution: Validate rule syntax
semgrep --validate rules/custom.yml
Modo de depuración¶
# Verbose output
semgrep --config=auto --verbose .
# Debug mode
semgrep --config=auto --debug .
# Dry run (validate rules without scanning)
semgrep --config=auto --dryrun .
# Test specific rule
semgrep --config=rules/custom.yml --test .
Recursos¶
- Semgrep Official Documentation
- Semgrep GitHub Repository
- Semgrep Rule Registry
- Semgrep Community
- Reglas de aduana
-...
*Esta hoja de trampa proporciona una guía completa para usar Semgrep para encontrar vulnerabilidades de seguridad y aplicar estándares de código. Las actualizaciones regulares de reglas y el desarrollo personalizado de reglas aumentan la cobertura de seguridad. *