FlameGraph 명령어
FlameGraph는 스택 트레이스 데이터로부터 인터랙티브 SVG 플레임 그래프를 생성하는 Brendan Gregg의 스크립트 모음입니다. 플레임 그래프는 어떤 코드 경로가 가장 많은 리소스를 소비하는지 보여줌으로써 프로파일된 소프트웨어를 시각화하여 성능 병목 현상을 즉시 확인할 수 있게 합니다.
설치
Linux/Ubuntu
# Clone the repository
git clone https://github.com/brendangregg/FlameGraph.git
cd FlameGraph
# Add to PATH (optional)
export PATH="$PATH:$(pwd)"
# Verify
./flamegraph.pl --help 2>&1 | head -3
# Dependencies — Perl is required (usually pre-installed)
perl --version
핵심 워크플로우
# The standard 3-step process:
# 1. Capture stacks (perf, bpftrace, dtrace, etc.)
# 2. Collapse/fold stacks into single lines
# 3. Generate the SVG flame graph
# Example with perf:
perf record -F 99 -a -g -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf > out.folded
./flamegraph.pl out.folded > flamegraph.svg
스택 콜랩서
# Collapse perf script output
./stackcollapse-perf.pl out.perf > out.folded
# Collapse with PID annotations
./stackcollapse-perf.pl --pid out.perf > out.folded
# Collapse with thread IDs
./stackcollapse-perf.pl --tid out.perf > out.folded
# Collapse DTrace output
./stackcollapse.pl out.dtrace > out.folded
# Collapse bpftrace output
./stackcollapse-bpftrace.pl out.bpftrace > out.folded
# Collapse Java jstack output
./stackcollapse-jstack.pl out.jstack > out.folded
# Collapse Go pprof output
./stackcollapse-go.pl out.pprof > out.folded
# Collapse Python cProfile output
./stackcollapse-python.pl out.cprofile > out.folded
# Collapse Xcode Instruments output
./stackcollapse-instruments.pl out.instruments > out.folded
# Collapse strace output
./stackcollapse-stap.pl out.strace > out.folded
# Collapse recursive grep of /proc/PID/stack
./stackcollapse-recursive.pl out.procstack > out.folded
플레임 그래프 생성
# Basic flame graph
./flamegraph.pl out.folded > flamegraph.svg
# Custom title
./flamegraph.pl --title "My App CPU Profile" out.folded > flamegraph.svg
# Custom subtitle
./flamegraph.pl --subtitle "Production 2026-05-21" out.folded > flamegraph.svg
# Set minimum display width (percentage)
./flamegraph.pl --minwidth 0.5 out.folded > flamegraph.svg
# Custom width and height
./flamegraph.pl --width 1400 --height 24 out.folded > flamegraph.svg
# Reverse stack order (icicle graph — grows downward)
./flamegraph.pl --inverted out.folded > icicle.svg
# Custom color palette
./flamegraph.pl --color hot out.folded > flamegraph.svg
./flamegraph.pl --color mem out.folded > flamegraph.svg
./flamegraph.pl --color io out.folded > flamegraph.svg
./flamegraph.pl --color java out.folded > flamegraph.svg
# Count name on y-axis
./flamegraph.pl --countname "microseconds" out.folded > flamegraph.svg
# Custom name type
./flamegraph.pl --nametype "Function:" out.folded > flamegraph.svg
CPU 플레임 그래프
# On-CPU flame graph from perf
perf record -F 99 -a -g -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf > out.folded
./flamegraph.pl --title "CPU Flame Graph" out.folded > cpu.svg
# On-CPU flame graph from bpftrace
sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > out.bpftrace
./stackcollapse-bpftrace.pl out.bpftrace > out.folded
./flamegraph.pl out.folded > cpu_bpf.svg
# User-space only CPU flame graph
perf record -F 99 -g --call-graph dwarf -p 1234 -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl --kernel out.perf > out.folded
./flamegraph.pl --color java out.folded > user_cpu.svg
Off-CPU 플레임 그래프
# Using BCC offcputime
sudo offcputime-bpfcc -f 30 > offcpu.folded
./flamegraph.pl --color=io --title="Off-CPU Time" --countname=us offcpu.folded > offcpu.svg
# Using bpftrace for off-CPU analysis
sudo bpftrace -e '
tracepoint:sched:sched_switch {
@start[tid] = nsecs;
}
tracepoint:sched:sched_wakeup /@start[args.pid]/ {
@[kstack, args.comm] = sum(nsecs - @start[args.pid]);
delete(@start[args.pid]);
}' > offcpu.bt
메모리 플레임 그래프
# Memory allocation flame graph from perf
perf record -e kmem:kmalloc -a -g -- sleep 10
perf script > mem.perf
./stackcollapse-perf.pl mem.perf > mem.folded
./flamegraph.pl --color=mem --title="Memory Allocations" --countname="bytes" mem.folded > mem.svg
차이 플레임 그래프
# Compare two profiles using difffolded.pl
# 1. Capture baseline profile
perf record -F 99 -a -g -- sleep 30
perf script > baseline.perf
./stackcollapse-perf.pl baseline.perf > baseline.folded
# 2. Capture comparison profile (after changes)
perf record -F 99 -a -g -- sleep 30
perf script > comparison.perf
./stackcollapse-perf.pl comparison.perf > comparison.folded
# 3. Generate differential folded stacks
./difffolded.pl baseline.folded comparison.folded > diff.folded
# 4. Generate differential flame graph
# Red = growth (regression), blue = shrinkage (improvement)
./flamegraph.pl --negate diff.folded > diff_flamegraph.svg
# Normalize to same sample count
./difffolded.pl -n baseline.folded comparison.folded > diff_normalized.folded
./flamegraph.pl --negate diff_normalized.folded > diff_norm.svg
필터링과 변환
# Grep for specific functions in folded stacks
grep 'tcp_' out.folded | ./flamegraph.pl > tcp_only.svg
# Exclude kernel stacks
grep -v 'vmlinux' out.folded | ./flamegraph.pl > user_only.svg
# Filter to specific process
grep 'my_app' out.folded | ./flamegraph.pl > my_app.svg
# Combine multiple folded stack files
cat profile1.folded profile2.folded | ./flamegraph.pl > combined.svg
# Sort folded stacks for diffing
sort out.folded > sorted.folded
접힌 스택 형식
# Format: semicolon-separated stack frames followed by a space and count
# Bottom of stack is on the left, top (leaf) on the right
main;read_data;parse_json;validate 42
main;read_data;parse_json;transform 87
main;handle_request;send_response 156
main;handle_request;log_request 23
대화형 SVG 기능
생성된 SVG 파일에는 내장 인터랙티브 기능이 포함됩니다:
| Feature | Description |
|---|---|
| Hover | Shows function name, sample count, and percentage |
| Click | Zooms into a specific frame and its children |
| Ctrl+F / Search | Highlights matching frames with magenta |
| Reset Zoom | Click “Reset Zoom” or press Escape |
| Right-click | Opens browser context menu for saving |
언어별 워크플로우
Java
# Using async-profiler (recommended for Java)
./asprof -d 30 -f out.html jps_pid
# Or generate folded stacks
./asprof -d 30 -o collapsed -f out.folded jps_pid
./flamegraph.pl out.folded > java_cpu.svg
Python
# Using py-spy to generate folded stacks
py-spy record -f raw -o out.folded --pid 1234
./flamegraph.pl out.folded > python_cpu.svg
Node.js
# Using 0x (wrapper around perf for Node.js)
npx 0x my_app.js
# Or perf with --perf-basic-prof
node --perf-basic-prof my_app.js &
perf record -F 99 -p $! -g -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf > out.folded
./flamegraph.pl out.folded > node_cpu.svg
팁
# Increase sample rate for short workloads
perf record -F 999 -g -- ./short_task
# Use DWARF unwinding for accurate user-space stacks
perf record -g --call-graph dwarf -F 99 -p 1234 -- sleep 30
# Check if frame pointers are available
readelf -S /usr/bin/myapp | grep -i frame
# Pipe everything in one line
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg