Linux Text Processing Cheat Sheet

Overview

Linux text processing tools provide powerful capabilities for manipulating, analyzing, and transforming text data. This comprehensive guide covers essential tools like grep, awk, sed, sort, and many others that form the foundation of command-line text processing and data analysis workflows.

⚠️ Warning: Text processing commands can modify files permanently. Always backup important files before performing bulk text operations.

Basic File Display

bash

# Display entire file
cat filename
cat -n filename         # With line numbers
cat -b filename         # Number non-blank lines only
cat -A filename         # Show all characters including non-printing

# Display multiple files
cat file1 file2 file3

# Create file with content
cat > newfile << EOF
Line 1
Line 2
EOF

Paginated Viewing

bash

# Page through file
less filename
more filename

# Less navigation:
# Space/f - next page
# b - previous page
# /pattern - search forward
# ?pattern - search backward
# n - next search result
# N - previous search result
# q - quit

# More options
less +F filename        # Follow file like tail -f
less +/pattern filename # Start at first match

Partial File Display

bash

# First lines of file
head filename
head -n 20 filename     # First 20 lines
head -c 100 filename    # First 100 characters

# Last lines of file
tail filename
tail -n 20 filename     # Last 20 lines
tail -f filename        # Follow file changes
tail -F filename        # Follow with retry

# Specific line ranges
sed -n '10,20p' filename    # Lines 10-20
awk 'NR>=10 && NR<=20' filename

Pattern Searching with Grep

Basic Grep Usage

bash

# Search for pattern
grep "pattern" filename
grep "pattern" file1 file2 file3

# Case-insensitive search
grep -i "pattern" filename

# Show line numbers
grep -n "pattern" filename

# Show only matching part
grep -o "pattern" filename

# Count matches
grep -c "pattern" filename

Advanced Grep Options

bash

# Recursive search
grep -r "pattern" /path/to/directory
grep -R "pattern" /path/to/directory

# Search in specific file types
grep -r --include="*.txt" "pattern" /path
grep -r --exclude="*.log" "pattern" /path

# Invert match (show non-matching lines)
grep -v "pattern" filename

# Show context around matches
grep -A 3 "pattern" filename    # 3 lines after
grep -B 3 "pattern" filename    # 3 lines before
grep -C 3 "pattern" filename    # 3 lines before and after

# Multiple patterns
grep -E "pattern1|pattern2" filename
grep -e "pattern1" -e "pattern2" filename

Regular Expressions with Grep

bash

# Extended regular expressions
grep -E "^start.*end$" filename
grep -E "[0-9]{3}-[0-9]{3}-[0-9]{4}" filename  # Phone numbers
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" filename  # Email

# Perl-compatible regular expressions
grep -P "\d{3}-\d{3}-\d{4}" filename

# Word boundaries
grep -w "word" filename         # Match whole word only
grep "\bword\b" filename        # Same as -w

# Character classes
grep "[0-9]" filename           # Any digit
grep "[a-zA-Z]" filename        # Any letter
grep "[^0-9]" filename          # Not a digit

Stream Editing with Sed

Basic Sed Operations

bash

# Substitute (replace)
sed 's/old/new/' filename              # First occurrence per line
sed 's/old/new/g' filename             # All occurrences
sed 's/old/new/2' filename             # Second occurrence per line

# In-place editing
sed -i 's/old/new/g' filename
sed -i.bak 's/old/new/g' filename      # Create backup

# Case-insensitive substitution
sed 's/old/new/gi' filename

Advanced Sed Commands

bash

# Delete lines
sed '5d' filename               # Delete line 5
sed '5,10d' filename            # Delete lines 5-10
sed '/pattern/d' filename       # Delete lines matching pattern

# Print specific lines
sed -n '5p' filename            # Print line 5 only
sed -n '5,10p' filename         # Print lines 5-10
sed -n '/pattern/p' filename    # Print matching lines

# Insert and append
sed '5i\New line' filename      # Insert before line 5
sed '5a\New line' filename      # Append after line 5

# Multiple commands
sed -e 's/old1/new1/g' -e 's/old2/new2/g' filename
sed 's/old1/new1/g; s/old2/new2/g' filename

Sed with Regular Expressions

bash

# Address ranges with patterns
sed '/start/,/end/d' filename           # Delete from start to end pattern
sed '/pattern/,+5d' filename            # Delete matching line and next 5

# Backreferences
sed 's/\([0-9]*\)-\([0-9]*\)/\2-\1/' filename  # Swap numbers around dash

# Multiple line operations
sed 'N;s/\n/ /' filename               # Join pairs of lines

Text Processing with AWK

Basic AWK Usage

bash

# Print specific fields
awk '{print $1}' filename              # First field
awk '{print $1, $3}' filename          # First and third fields
awk '{print $NF}' filename             # Last field
awk '{print $(NF-1)}' filename         # Second to last field

# Field separator
awk -F: '{print $1}' /etc/passwd       # Use colon as separator
awk -F',' '{print $2}' file.csv        # Use comma as separator

# Print with custom formatting
awk '{printf "%-10s %s\n", $1, $2}' filename

AWK Pattern Matching

bash

# Pattern matching
awk '/pattern/ {print}' filename
awk '/pattern/ {print $1}' filename
awk '$1 ~ /pattern/ {print}' filename   # First field matches pattern
awk '$1 !~ /pattern/ {print}' filename  # First field doesn't match

# Numeric comparisons
awk '$3 > 100 {print}' filename         # Third field greater than 100
awk '$2 == "value" {print}' filename    # Second field equals value
awk 'NR > 1 {print}' filename          # Skip header line

AWK Programming Constructs

bash

# Variables and calculations
awk '{sum += $1} END {print sum}' filename      # Sum first column
awk '{count++} END {print count}' filename      # Count lines

# Conditional statements
awk '{if ($1 > 100) print "High: " $0; else print "Low: " $0}' filename

# Loops
awk '{for(i=1; i<=NF; i++) print $i}' filename # Print each field on new line

# Built-in variables
awk '{print NR, NF, $0}' filename      # Line number, field count, whole line
awk 'END {print NR}' filename          # Total line count

AWK Advanced Features

bash

# Multiple patterns
awk '/start/,/end/ {print}' filename    # Print from start to end pattern

# User-defined functions
awk 'function square(x) {return x*x} {print square($1)}' filename

# Arrays
awk '{count[$1]++} END {for (word in count) print word, count[word]}' filename

# String functions
awk '{print length($0)}' filename       # Line length
awk '{print substr($0, 1, 10)}' filename # First 10 characters
awk '{print toupper($0)}' filename      # Convert to uppercase

Sorting and Uniqueness

Basic Sorting

bash

# Sort lines alphabetically
sort filename
sort -r filename                # Reverse order
sort -u filename                # Remove duplicates

# Numeric sorting
sort -n filename                # Numeric sort
sort -nr filename               # Numeric reverse sort
sort -h filename                # Human numeric sort (1K, 2M, etc.)

# Sort by specific field
sort -k2 filename               # Sort by second field
sort -k2,2 filename             # Sort by second field only
sort -k2n filename              # Numeric sort by second field

Advanced Sorting

bash

# Multiple sort keys
sort -k1,1 -k2n filename        # Sort by field 1, then numerically by field 2

# Custom field separator
sort -t: -k3n /etc/passwd       # Sort passwd by UID

# Sort by specific columns
sort -k1.2,1.4 filename         # Sort by characters 2-4 of first field

# Stable sort
sort -s -k2 filename            # Maintain relative order of equal elements

Uniqueness Operations

bash

# Remove duplicate lines
uniq filename                   # Remove consecutive duplicates
sort filename | uniq            # Remove all duplicates

# Count occurrences
uniq -c filename                # Count consecutive duplicates
sort filename | uniq -c         # Count all duplicates

# Show only duplicates or unique lines
uniq -d filename                # Show only duplicate lines
uniq -u filename                # Show only unique lines

# Compare fields
uniq -f1 filename               # Skip first field when comparing
uniq -s5 filename               # Skip first 5 characters

Text Transformation

Character Translation

bash

# Character replacement
tr 'a-z' 'A-Z' < filename       # Convert to uppercase
tr 'A-Z' 'a-z' < filename       # Convert to lowercase
tr ' ' '_' < filename           # Replace spaces with underscores

# Delete characters
tr -d '0-9' < filename          # Delete all digits
tr -d '\n' < filename           # Remove newlines
tr -d '[:punct:]' < filename    # Remove punctuation

# Squeeze repeated characters
tr -s ' ' < filename            # Squeeze multiple spaces to one
tr -s '\n' < filename           # Remove blank lines

Cut and Paste Operations

bash

# Extract columns
cut -c1-10 filename             # Characters 1-10
cut -c1,5,10 filename           # Characters 1, 5, and 10
cut -c10- filename              # From character 10 to end

# Extract fields
cut -d: -f1 /etc/passwd         # First field (colon delimiter)
cut -d, -f1,3 file.csv          # Fields 1 and 3 (comma delimiter)
cut -f2- filename               # From field 2 to end (tab delimiter)

# Paste files together
paste file1 file2               # Merge lines side by side
paste -d, file1 file2           # Use comma as delimiter
paste -s filename               # Merge all lines into one

Join Operations

bash

# Join files on common field
join file1 file2                # Join on first field
join -1 2 -2 1 file1 file2      # Join field 2 of file1 with field 1 of file2
join -t: file1 file2            # Use colon as field separator

# Outer joins
join -a1 file1 file2            # Include unmatched lines from file1
join -a2 file1 file2            # Include unmatched lines from file2
join -a1 -a2 file1 file2        # Full outer join

Text Analysis and Statistics

Word and Line Counting

bash

# Count lines, words, characters
wc filename
wc -l filename                  # Lines only
wc -w filename                  # Words only
wc -c filename                  # Characters only
wc -m filename                  # Characters (multibyte aware)

# Count specific patterns
grep -c "pattern" filename      # Count matching lines
grep -o "pattern" filename | wc -l  # Count pattern occurrences

Frequency Analysis

bash

# Word frequency
tr ' ' '\n' < filename | sort | uniq -c | sort -nr

# Character frequency
fold -w1 filename | sort | uniq -c | sort -nr

# Line frequency
sort filename | uniq -c | sort -nr

# Field frequency
awk '{print $1}' filename | sort | uniq -c | sort -nr

Advanced Text Processing

Multi-file Operations

bash

# Process multiple files
grep "pattern" *.txt
awk '{print FILENAME, $0}' *.txt
sed 's/old/new/g' *.txt

# Combine files
cat file1 file2 > combined
sort -m sorted1 sorted2 > merged  # Merge sorted files

Complex Pipelines

bash

# Log analysis pipeline
cat access.log | grep "404" | awk '{print $1}' | sort | uniq -c | sort -nr

# CSV processing
cut -d, -f2,4 data.csv | grep -v "^$" | sort -u

# Text statistics
cat document.txt | tr -d '[:punct:]' | tr ' ' '\n' | grep -v "^$" | sort | uniq -c | sort -nr | head -10

Regular Expression Tools

bash

# Perl-style regex
perl -pe 's/pattern/replacement/g' filename
perl -ne 'print if /pattern/' filename

# Extended grep alternatives
egrep "pattern1|pattern2" filename
fgrep "literal string" filename  # No regex interpretation

Troubleshooting Text Processing

Common Issues

bash

# Handle different line endings
dos2unix filename               # Convert DOS to Unix line endings
unix2dos filename               # Convert Unix to DOS line endings
tr -d '\r' < filename           # Remove carriage returns

# Encoding issues
iconv -f ISO-8859-1 -t UTF-8 filename  # Convert encoding
file filename                   # Check file type and encoding

# Large file processing
split -l 1000 largefile prefix  # Split into 1000-line chunks
head -n 1000000 largefile | tail -n 1000  # Process middle section

Performance Optimization

bash

# Faster alternatives for large files
LC_ALL=C sort filename          # Use C locale for faster sorting
mawk instead of awk             # Faster AWK implementation
ripgrep (rg) instead of grep    # Faster search tool

# Memory-efficient processing
sort -S 1G filename             # Use 1GB memory for sorting
split and process in chunks     # For very large files

Resources

This cheat sheet provides comprehensive text processing commands for Linux systems. Practice with sample data before applying these commands to important files.

Linux Text Processing Cheat Sheet ​

Overview ​

File Viewing and Navigation ​

Basic File Display ​

Paginated Viewing ​

Partial File Display ​

Pattern Searching with Grep ​

Basic Grep Usage ​

Advanced Grep Options ​

Regular Expressions with Grep ​

Stream Editing with Sed ​

Basic Sed Operations ​

Advanced Sed Commands ​

Sed with Regular Expressions ​

Text Processing with AWK ​

Basic AWK Usage ​

AWK Pattern Matching ​

AWK Programming Constructs ​

AWK Advanced Features ​

Sorting and Uniqueness ​

Basic Sorting ​

Advanced Sorting ​

Uniqueness Operations ​

Text Transformation ​

Character Translation ​

Cut and Paste Operations ​

Join Operations ​

Text Analysis and Statistics ​

Word and Line Counting ​

Frequency Analysis ​

Advanced Text Processing ​

Multi-file Operations ​

Complex Pipelines ​

Regular Expression Tools ​

Troubleshooting Text Processing ​

Common Issues ​

Performance Optimization ​

Resources ​

Linux Text Processing Cheat Sheet

Overview

File Viewing and Navigation

Basic File Display

Paginated Viewing

Partial File Display

Pattern Searching with Grep

Basic Grep Usage

Advanced Grep Options

Regular Expressions with Grep

Stream Editing with Sed

Basic Sed Operations

Advanced Sed Commands

Sed with Regular Expressions

Text Processing with AWK

Basic AWK Usage

AWK Pattern Matching

AWK Programming Constructs

AWK Advanced Features

Sorting and Uniqueness

Basic Sorting

Advanced Sorting

Uniqueness Operations

Text Transformation

Character Translation

Cut and Paste Operations

Join Operations

Text Analysis and Statistics

Word and Line Counting

Frequency Analysis

Advanced Text Processing

Multi-file Operations

Complex Pipelines

Regular Expression Tools

Troubleshooting Text Processing

Common Issues

Performance Optimization

Resources