Appearance
Linux Text Processing Cheat Sheet
Overview
Linux text processing tools provide powerful capabilities for manipulating, analyzing, and transforming text data. This comprehensive guide covers essential tools like grep, awk, sed, sort, and many others that form the foundation of command-line text processing and data analysis workflows.
⚠️ Warning: Text processing commands can modify files permanently. Always backup important files before performing bulk text operations.
File Viewing and Navigation
Basic File Display
bash
# Display entire file
cat filename
cat -n filename # With line numbers
cat -b filename # Number non-blank lines only
cat -A filename # Show all characters including non-printing
# Display multiple files
cat file1 file2 file3
# Create file with content
cat > newfile << EOF
Line 1
Line 2
EOF
Paginated Viewing
bash
# Page through file
less filename
more filename
# Less navigation:
# Space/f - next page
# b - previous page
# /pattern - search forward
# ?pattern - search backward
# n - next search result
# N - previous search result
# q - quit
# More options
less +F filename # Follow file like tail -f
less +/pattern filename # Start at first match
Partial File Display
bash
# First lines of file
head filename
head -n 20 filename # First 20 lines
head -c 100 filename # First 100 characters
# Last lines of file
tail filename
tail -n 20 filename # Last 20 lines
tail -f filename # Follow file changes
tail -F filename # Follow with retry
# Specific line ranges
sed -n '10,20p' filename # Lines 10-20
awk 'NR>=10 && NR<=20' filename
Pattern Searching with Grep
Basic Grep Usage
bash
# Search for pattern
grep "pattern" filename
grep "pattern" file1 file2 file3
# Case-insensitive search
grep -i "pattern" filename
# Show line numbers
grep -n "pattern" filename
# Show only matching part
grep -o "pattern" filename
# Count matches
grep -c "pattern" filename
Advanced Grep Options
bash
# Recursive search
grep -r "pattern" /path/to/directory
grep -R "pattern" /path/to/directory
# Search in specific file types
grep -r --include="*.txt" "pattern" /path
grep -r --exclude="*.log" "pattern" /path
# Invert match (show non-matching lines)
grep -v "pattern" filename
# Show context around matches
grep -A 3 "pattern" filename # 3 lines after
grep -B 3 "pattern" filename # 3 lines before
grep -C 3 "pattern" filename # 3 lines before and after
# Multiple patterns
grep -E "pattern1|pattern2" filename
grep -e "pattern1" -e "pattern2" filename
Regular Expressions with Grep
bash
# Extended regular expressions
grep -E "^start.*end$" filename
grep -E "[0-9]{3}-[0-9]{3}-[0-9]{4}" filename # Phone numbers
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" filename # Email
# Perl-compatible regular expressions
grep -P "\d{3}-\d{3}-\d{4}" filename
# Word boundaries
grep -w "word" filename # Match whole word only
grep "\bword\b" filename # Same as -w
# Character classes
grep "[0-9]" filename # Any digit
grep "[a-zA-Z]" filename # Any letter
grep "[^0-9]" filename # Not a digit
Stream Editing with Sed
Basic Sed Operations
bash
# Substitute (replace)
sed 's/old/new/' filename # First occurrence per line
sed 's/old/new/g' filename # All occurrences
sed 's/old/new/2' filename # Second occurrence per line
# In-place editing
sed -i 's/old/new/g' filename
sed -i.bak 's/old/new/g' filename # Create backup
# Case-insensitive substitution
sed 's/old/new/gi' filename
Advanced Sed Commands
bash
# Delete lines
sed '5d' filename # Delete line 5
sed '5,10d' filename # Delete lines 5-10
sed '/pattern/d' filename # Delete lines matching pattern
# Print specific lines
sed -n '5p' filename # Print line 5 only
sed -n '5,10p' filename # Print lines 5-10
sed -n '/pattern/p' filename # Print matching lines
# Insert and append
sed '5i\New line' filename # Insert before line 5
sed '5a\New line' filename # Append after line 5
# Multiple commands
sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename
sed 's/old1/new1/g; s/old2/new2/g' filename
Sed with Regular Expressions
bash
# Address ranges with patterns
sed '/start/,/end/d' filename # Delete from start to end pattern
sed '/pattern/,+5d' filename # Delete matching line and next 5
# Backreferences
sed 's/\([0-9]*\)-\([0-9]*\)/\2-\1/' filename # Swap numbers around dash
# Multiple line operations
sed 'N;s/\n/ /' filename # Join pairs of lines
Text Processing with AWK
Basic AWK Usage
bash
# Print specific fields
awk '{print $1}' filename # First field
awk '{print $1, $3}' filename # First and third fields
awk '{print $NF}' filename # Last field
awk '{print $(NF-1)}' filename # Second to last field
# Field separator
awk -F: '{print $1}' /etc/passwd # Use colon as separator
awk -F',' '{print $2}' file.csv # Use comma as separator
# Print with custom formatting
awk '{printf "%-10s %s\n", $1, $2}' filename
AWK Pattern Matching
bash
# Pattern matching
awk '/pattern/ {print}' filename
awk '/pattern/ {print $1}' filename
awk '$1 ~ /pattern/ {print}' filename # First field matches pattern
awk '$1 !~ /pattern/ {print}' filename # First field doesn't match
# Numeric comparisons
awk '$3 > 100 {print}' filename # Third field greater than 100
awk '$2 == "value" {print}' filename # Second field equals value
awk 'NR > 1 {print}' filename # Skip header line
AWK Programming Constructs
bash
# Variables and calculations
awk '{sum += $1} END {print sum}' filename # Sum first column
awk '{count++} END {print count}' filename # Count lines
# Conditional statements
awk '{if ($1 > 100) print "High: " $0; else print "Low: " $0}' filename
# Loops
awk '{for(i=1; i<=NF; i++) print $i}' filename # Print each field on new line
# Built-in variables
awk '{print NR, NF, $0}' filename # Line number, field count, whole line
awk 'END {print NR}' filename # Total line count
AWK Advanced Features
bash
# Multiple patterns
awk '/start/,/end/ {print}' filename # Print from start to end pattern
# User-defined functions
awk 'function square(x) {return x*x} {print square($1)}' filename
# Arrays
awk '{count[$1]++} END {for (word in count) print word, count[word]}' filename
# String functions
awk '{print length($0)}' filename # Line length
awk '{print substr($0, 1, 10)}' filename # First 10 characters
awk '{print toupper($0)}' filename # Convert to uppercase
Sorting and Uniqueness
Basic Sorting
bash
# Sort lines alphabetically
sort filename
sort -r filename # Reverse order
sort -u filename # Remove duplicates
# Numeric sorting
sort -n filename # Numeric sort
sort -nr filename # Numeric reverse sort
sort -h filename # Human numeric sort (1K, 2M, etc.)
# Sort by specific field
sort -k2 filename # Sort by second field
sort -k2,2 filename # Sort by second field only
sort -k2n filename # Numeric sort by second field
Advanced Sorting
bash
# Multiple sort keys
sort -k1,1 -k2n filename # Sort by field 1, then numerically by field 2
# Custom field separator
sort -t: -k3n /etc/passwd # Sort passwd by UID
# Sort by specific columns
sort -k1.2,1.4 filename # Sort by characters 2-4 of first field
# Stable sort
sort -s -k2 filename # Maintain relative order of equal elements
Uniqueness Operations
bash
# Remove duplicate lines
uniq filename # Remove consecutive duplicates
sort filename | uniq # Remove all duplicates
# Count occurrences
uniq -c filename # Count consecutive duplicates
sort filename | uniq -c # Count all duplicates
# Show only duplicates or unique lines
uniq -d filename # Show only duplicate lines
uniq -u filename # Show only unique lines
# Compare fields
uniq -f1 filename # Skip first field when comparing
uniq -s5 filename # Skip first 5 characters
Text Transformation
Character Translation
bash
# Character replacement
tr 'a-z' 'A-Z' < filename # Convert to uppercase
tr 'A-Z' 'a-z' < filename # Convert to lowercase
tr ' ' '_' < filename # Replace spaces with underscores
# Delete characters
tr -d '0-9' < filename # Delete all digits
tr -d '\n' < filename # Remove newlines
tr -d '[:punct:]' < filename # Remove punctuation
# Squeeze repeated characters
tr -s ' ' < filename # Squeeze multiple spaces to one
tr -s '\n' < filename # Remove blank lines
Cut and Paste Operations
bash
# Extract columns
cut -c1-10 filename # Characters 1-10
cut -c1,5,10 filename # Characters 1, 5, and 10
cut -c10- filename # From character 10 to end
# Extract fields
cut -d: -f1 /etc/passwd # First field (colon delimiter)
cut -d, -f1,3 file.csv # Fields 1 and 3 (comma delimiter)
cut -f2- filename # From field 2 to end (tab delimiter)
# Paste files together
paste file1 file2 # Merge lines side by side
paste -d, file1 file2 # Use comma as delimiter
paste -s filename # Merge all lines into one
Join Operations
bash
# Join files on common field
join file1 file2 # Join on first field
join -1 2 -2 1 file1 file2 # Join field 2 of file1 with field 1 of file2
join -t: file1 file2 # Use colon as field separator
# Outer joins
join -a1 file1 file2 # Include unmatched lines from file1
join -a2 file1 file2 # Include unmatched lines from file2
join -a1 -a2 file1 file2 # Full outer join
Text Analysis and Statistics
Word and Line Counting
bash
# Count lines, words, characters
wc filename
wc -l filename # Lines only
wc -w filename # Words only
wc -c filename # Characters only
wc -m filename # Characters (multibyte aware)
# Count specific patterns
grep -c "pattern" filename # Count matching lines
grep -o "pattern" filename | wc -l # Count pattern occurrences
Frequency Analysis
bash
# Word frequency
tr ' ' '\n' < filename | sort | uniq -c | sort -nr
# Character frequency
fold -w1 filename | sort | uniq -c | sort -nr
# Line frequency
sort filename | uniq -c | sort -nr
# Field frequency
awk '{print $1}' filename | sort | uniq -c | sort -nr
Advanced Text Processing
Multi-file Operations
bash
# Process multiple files
grep "pattern" *.txt
awk '{print FILENAME, $0}' *.txt
sed 's/old/new/g' *.txt
# Combine files
cat file1 file2 > combined
sort -m sorted1 sorted2 > merged # Merge sorted files
Complex Pipelines
bash
# Log analysis pipeline
cat access.log | grep "404" | awk '{print $1}' | sort | uniq -c | sort -nr
# CSV processing
cut -d, -f2,4 data.csv | grep -v "^$" | sort -u
# Text statistics
cat document.txt | tr -d '[:punct:]' | tr ' ' '\n' | grep -v "^$" | sort | uniq -c | sort -nr | head -10
Regular Expression Tools
bash
# Perl-style regex
perl -pe 's/pattern/replacement/g' filename
perl -ne 'print if /pattern/' filename
# Extended grep alternatives
egrep "pattern1|pattern2" filename
fgrep "literal string" filename # No regex interpretation
Troubleshooting Text Processing
Common Issues
bash
# Handle different line endings
dos2unix filename # Convert DOS to Unix line endings
unix2dos filename # Convert Unix to DOS line endings
tr -d '\r' < filename # Remove carriage returns
# Encoding issues
iconv -f ISO-8859-1 -t UTF-8 filename # Convert encoding
file filename # Check file type and encoding
# Large file processing
split -l 1000 largefile prefix # Split into 1000-line chunks
head -n 1000000 largefile | tail -n 1000 # Process middle section
Performance Optimization
bash
# Faster alternatives for large files
LC_ALL=C sort filename # Use C locale for faster sorting
mawk instead of awk # Faster AWK implementation
ripgrep (rg) instead of grep # Faster search tool
# Memory-efficient processing
sort -S 1G filename # Use 1GB memory for sorting
split and process in chunks # For very large files
Resources
- GNU Text Utilities Manual
- AWK Programming Guide
- Sed Manual
- Regular Expressions Tutorial
- Text Processing Examples
This cheat sheet provides comprehensive text processing commands for Linux systems. Practice with sample data before applying these commands to important files.