Salta ai contenuti

Comandi FlameGraph

FlameGraph è una raccolta di script di Brendan Gregg che generano grafici a fiamma SVG interattivi dai dati delle tracce dello stack. I grafici a fiamma visualizzano il software profilato mostrando quali percorsi di codice consumano più risorse, rendendo i colli di bottiglia delle prestazioni immediatamente visibili.

Installazione

Linux/Ubuntu

# Clone the repository
git clone https://github.com/brendangregg/FlameGraph.git
cd FlameGraph

# Add to PATH (optional)
export PATH="$PATH:$(pwd)"

# Verify
./flamegraph.pl --help 2>&1 | head -3

# Dependencies — Perl is required (usually pre-installed)
perl --version

Workflow principale

# The standard 3-step process:
# 1. Capture stacks (perf, bpftrace, dtrace, etc.)
# 2. Collapse/fold stacks into single lines
# 3. Generate the SVG flame graph

# Example with perf:
perf record -F 99 -a -g -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf > out.folded
./flamegraph.pl out.folded > flamegraph.svg

Stack collapser

# Collapse perf script output
./stackcollapse-perf.pl out.perf > out.folded

# Collapse with PID annotations
./stackcollapse-perf.pl --pid out.perf > out.folded

# Collapse with thread IDs
./stackcollapse-perf.pl --tid out.perf > out.folded

# Collapse DTrace output
./stackcollapse.pl out.dtrace > out.folded

# Collapse bpftrace output
./stackcollapse-bpftrace.pl out.bpftrace > out.folded

# Collapse Java jstack output
./stackcollapse-jstack.pl out.jstack > out.folded

# Collapse Go pprof output
./stackcollapse-go.pl out.pprof > out.folded

# Collapse Python cProfile output
./stackcollapse-python.pl out.cprofile > out.folded

# Collapse Xcode Instruments output
./stackcollapse-instruments.pl out.instruments > out.folded

# Collapse strace output
./stackcollapse-stap.pl out.strace > out.folded

# Collapse recursive grep of /proc/PID/stack
./stackcollapse-recursive.pl out.procstack > out.folded

Generazione di grafici a fiamma

# Basic flame graph
./flamegraph.pl out.folded > flamegraph.svg

# Custom title
./flamegraph.pl --title "My App CPU Profile" out.folded > flamegraph.svg

# Custom subtitle
./flamegraph.pl --subtitle "Production 2026-05-21" out.folded > flamegraph.svg

# Set minimum display width (percentage)
./flamegraph.pl --minwidth 0.5 out.folded > flamegraph.svg

# Custom width and height
./flamegraph.pl --width 1400 --height 24 out.folded > flamegraph.svg

# Reverse stack order (icicle graph — grows downward)
./flamegraph.pl --inverted out.folded > icicle.svg

# Custom color palette
./flamegraph.pl --color hot out.folded > flamegraph.svg
./flamegraph.pl --color mem out.folded > flamegraph.svg
./flamegraph.pl --color io out.folded > flamegraph.svg
./flamegraph.pl --color java out.folded > flamegraph.svg

# Count name on y-axis
./flamegraph.pl --countname "microseconds" out.folded > flamegraph.svg

# Custom name type
./flamegraph.pl --nametype "Function:" out.folded > flamegraph.svg

Grafici a fiamma CPU

# On-CPU flame graph from perf
perf record -F 99 -a -g -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf > out.folded
./flamegraph.pl --title "CPU Flame Graph" out.folded > cpu.svg

# On-CPU flame graph from bpftrace
sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > out.bpftrace
./stackcollapse-bpftrace.pl out.bpftrace > out.folded
./flamegraph.pl out.folded > cpu_bpf.svg

# User-space only CPU flame graph
perf record -F 99 -g --call-graph dwarf -p 1234 -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl --kernel out.perf > out.folded
./flamegraph.pl --color java out.folded > user_cpu.svg

Grafici a fiamma Off-CPU

# Using BCC offcputime
sudo offcputime-bpfcc -f 30 > offcpu.folded
./flamegraph.pl --color=io --title="Off-CPU Time" --countname=us offcpu.folded > offcpu.svg

# Using bpftrace for off-CPU analysis
sudo bpftrace -e '
tracepoint:sched:sched_switch {
    @start[tid] = nsecs;
}
tracepoint:sched:sched_wakeup /@start[args.pid]/ {
    @[kstack, args.comm] = sum(nsecs - @start[args.pid]);
    delete(@start[args.pid]);
}' > offcpu.bt

Grafici a fiamma memoria

# Memory allocation flame graph from perf
perf record -e kmem:kmalloc -a -g -- sleep 10
perf script > mem.perf
./stackcollapse-perf.pl mem.perf > mem.folded
./flamegraph.pl --color=mem --title="Memory Allocations" --countname="bytes" mem.folded > mem.svg

Grafici a fiamma differenziali

# Compare two profiles using difffolded.pl
# 1. Capture baseline profile
perf record -F 99 -a -g -- sleep 30
perf script > baseline.perf
./stackcollapse-perf.pl baseline.perf > baseline.folded

# 2. Capture comparison profile (after changes)
perf record -F 99 -a -g -- sleep 30
perf script > comparison.perf
./stackcollapse-perf.pl comparison.perf > comparison.folded

# 3. Generate differential folded stacks
./difffolded.pl baseline.folded comparison.folded > diff.folded

# 4. Generate differential flame graph
# Red = growth (regression), blue = shrinkage (improvement)
./flamegraph.pl --negate diff.folded > diff_flamegraph.svg

# Normalize to same sample count
./difffolded.pl -n baseline.folded comparison.folded > diff_normalized.folded
./flamegraph.pl --negate diff_normalized.folded > diff_norm.svg

Filtraggio e trasformazione

# Grep for specific functions in folded stacks
grep 'tcp_' out.folded | ./flamegraph.pl > tcp_only.svg

# Exclude kernel stacks
grep -v 'vmlinux' out.folded | ./flamegraph.pl > user_only.svg

# Filter to specific process
grep 'my_app' out.folded | ./flamegraph.pl > my_app.svg

# Combine multiple folded stack files
cat profile1.folded profile2.folded | ./flamegraph.pl > combined.svg

# Sort folded stacks for diffing
sort out.folded > sorted.folded

Formato stack ripiegato

# Format: semicolon-separated stack frames followed by a space and count
# Bottom of stack is on the left, top (leaf) on the right

main;read_data;parse_json;validate 42
main;read_data;parse_json;transform 87
main;handle_request;send_response 156
main;handle_request;log_request 23

Funzionalità SVG interattive

I file SVG generati includono interattività integrata:

FeatureDescription
HoverShows function name, sample count, and percentage
ClickZooms into a specific frame and its children
Ctrl+F / SearchHighlights matching frames with magenta
Reset ZoomClick “Reset Zoom” or press Escape
Right-clickOpens browser context menu for saving

Workflow specifici per linguaggio

Java

# Using async-profiler (recommended for Java)
./asprof -d 30 -f out.html jps_pid
# Or generate folded stacks
./asprof -d 30 -o collapsed -f out.folded jps_pid
./flamegraph.pl out.folded > java_cpu.svg

Python

# Using py-spy to generate folded stacks
py-spy record -f raw -o out.folded --pid 1234
./flamegraph.pl out.folded > python_cpu.svg

Node.js

# Using 0x (wrapper around perf for Node.js)
npx 0x my_app.js
# Or perf with --perf-basic-prof
node --perf-basic-prof my_app.js &
perf record -F 99 -p $! -g -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf > out.folded
./flamegraph.pl out.folded > node_cpu.svg

Suggerimenti

# Increase sample rate for short workloads
perf record -F 999 -g -- ./short_task

# Use DWARF unwinding for accurate user-space stacks
perf record -g --call-graph dwarf -F 99 -p 1234 -- sleep 30

# Check if frame pointers are available
readelf -S /usr/bin/myapp | grep -i frame

# Pipe everything in one line
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg