Zum Inhalt

Regular Expressions (RegEx) - Musteranpassung

*Komplette Anleitung zu regelmäßigen Ausdrücken für Musteranpassung und Textverarbeitung *

Regelmäßige Ausdrücke (regex) sind leistungsstarke Muster-matching-Tools, die über Programmiersprachen, Texteditoren und Befehlszeilentools verwendet werden. Dieser umfassende Leitfaden umfasst Regex-Syntax, gemeinsame Muster und praktische Beispiele für eine effektive Textverarbeitung.

Grundprinzip

Literale Charaktere

```regex

Exact character matching

hello # Matches "hello" exactly 123 # Matches "123" exactly Hello World # Matches "Hello World" exactly

Case sensitivity (depends on flags)

Hello # Matches "Hello" but not "hello" (case sensitive) (?i)Hello # Matches "Hello", "hello", "HELLO" (case insensitive) ```_

Metacharacter

```regex

Special characters with meaning

. # Matches any single character except newline ^ # Matches start of string/line $ # Matches end of string/line * # Matches 0 or more of preceding element + # Matches 1 or more of preceding element ? # Matches 0 or 1 of preceding element |# OR operator (alternation) () # Grouping [] # Character class \\{\\} # Quantifiers \ # Escape character ```_

Zeichenklassen

```regex

Predefined character classes

\d # Any digit (0-9) \D # Any non-digit \w # Any word character (a-z, A-Z, 0-9, _) \W # Any non-word character \s # Any whitespace character (space, tab, newline) \S # Any non-whitespace character \n # Newline character \t # Tab character \r # Carriage return

Custom character classes

[abc] # Matches 'a', 'b', or 'c' [a-z] # Matches any lowercase letter [A-Z] # Matches any uppercase letter [0-9] # Matches any digit [a-zA-Z] # Matches any letter [a-zA-Z0-9] # Matches any alphanumeric character [^abc] # Matches any character except 'a', 'b', or 'c' [^0-9] # Matches any non-digit ```_

Quantifier

Grundwerte

```regex

Exact repetition

a\\{3\\} # Matches exactly 3 'a's: "aaa" a\\{2,5\\} # Matches 2 to 5 'a's: "aa", "aaa", "aaaa", "aaaaa" a\\{3,\\} # Matches 3 or more 'a's: "aaa", "aaaa", etc. a\\{,3\\} # Matches 0 to 3 'a's: "", "a", "aa", "aaa"

Common quantifiers

a* # Matches 0 or more 'a's (equivalent to a\\{0,\\}) a+ # Matches 1 or more 'a's (equivalent to a\\{1,\\}) a? # Matches 0 or 1 'a' (equivalent to a\\{0,1\\}) ```_

Greedy vs Lazy Quantifiers

```regex

Greedy (default) - matches as much as possible

.* # Matches as many characters as possible .+ # Matches as many characters as possible (at least 1) .\\{2,5\\} # Matches as many characters as possible (2-5)

Lazy (non-greedy) - matches as little as possible

.*? # Matches as few characters as possible .+? # Matches as few characters as possible (at least 1) .\\{2,5\\}? # Matches as few characters as possible (2-5)

Example difference

String: "Hello World" H.o # Greedy: matches "Hello Wo" (to last 'o') H.?o # Lazy: matches "Hello" (to first 'o') ```_

Anker und Boundaries

Positions-Anker

```regex

String/line boundaries

^ # Start of string or line $ # End of string or line \A # Start of string (not line) \Z # End of string (not line) \z # Very end of string

Examples

^Hello # Matches "Hello" at start of line World$ # Matches "World" at end of line ^Hello World$ # Matches entire line containing only "Hello World" ```_

Wortschwärmer

```regex

Word boundaries

\b # Word boundary \B # Non-word boundary

Examples

\bcat\b # Matches "cat" as whole word, not in "category" \Bcat\B # Matches "cat" only when not at word boundaries \bcat # Matches "cat" at start of word: "cat", "category" cat\b # Matches "cat" at end of word: "cat", "tomcat" ```_

Gruppen und Aktivitäten

Grundgruppen

```regex

Grouping with parentheses

(abc) # Groups "abc" together (abc)+ # Matches one or more "abc" sequences (abc|def) # Matches either "abc" or "def" (abc)\\{2,4\\} # Matches 2 to 4 "abc" sequences

Non-capturing groups

(?:abc) # Groups without capturing (?:abc|def)+ # Matches sequences of "abc" or "def" ```_

Gruppen aufnehmen

```regex

Numbered captures

(abc)(def) # Capture group 1: "abc", group 2: "def" (\d\\{4\\})-(\d\\{2\\})-(\d\\{2\\}) # Captures date parts: year, month, day

Named captures

(?\d\\{4\\})-(?\d\\{2\\})-(?\d\\{2\\}) # Named groups (?P\w+) # Python-style named group

Backreferences

(\w+)\s+\1 # Matches repeated words: "hello hello" (["'])(.*?)\1 # Matches quoted strings with same quote type ```_

Lookahead und Lookbehind

Suchbegriffe

```regex

Positive lookahead

\d+(?=\s*dollars) # Matches digits followed by "dollars" \w+(?=@) # Matches username before @ in email

Negative lookahead

\d+(?!\s*cents) # Matches digits NOT followed by "cents" \w+(?!@) # Matches words NOT followed by @ ```_

Blick hinter den Himmel

```regex

Positive lookbehind

(?``<=\$)\d+ # Matches digits preceded by $ (?<=@)\w+ # Matches domain after @ in email

Negative lookbehind

(?<!\$)\d+ # Matches digits NOT preceded by $ (?<!@)\w+ # Matches words NOT preceded by @ ```_

Gemeinsame Muster

E-Mail-Bewertung

```regex

Basic email pattern

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]\{2,\}\b

More comprehensive email

^[a-zA-Z0-9.!#$%&'+/=?^_`\{|\}~-]+@a-zA-Z0-9?(?:.a-zA-Z0-9?)$

Simple email validation

^\S+@\S+.\S+$ ```_

Telefonnummern

```regex

US phone numbers

(?\d\{3\})?[-.\s]?\d\{3\}[-.\s]?\d\{4\} # (123) 456-7890 or 123-456-7890 ^+?1?[-.\s]?(?\d\{3\})?[-.\s]?\d\{3\}[-.\s]?\d\{4\}$ # With optional country code

International format

^+?[1-9]\d\{1,14\}$ # E.164 format ```_

URLs

```regex

Basic URL pattern

https?://[^\s]+

More comprehensive URL

^https?://(?:[-\w.])+(?:\:[0-9]+)?(?:/(?:[\w/_.])(?:\?(?:[\w&=%.]))?(?:#(?:[\w.])*)?)?$

Domain validation

^(?:a-zA-Z0-9?.)+[a-zA-Z]\{2,\}$ ```_

Datum

```regex

MM/DD/YYYY format

| ^(0[1-9] | 1[0-2])/(0[1-9] | [12]\d | 3[01])/\d\{4\}$ |

YYYY-MM-DD format (ISO 8601)

| ^\d\{4\}-(0[1-9] | 1[0-2])-(0[1-9] | [12]\d | 3[01])$ |

Flexible date formats

\b\d\{1,2\}[/-]\d\{1,2\}[/-]\d\{2,4\}\b ```_

Kreditkartennummern

```regex

Visa (starts with 4, 13-16 digits)

^4\d\{12\}(?:\d\{3\})?$

MasterCard (starts with 5, 16 digits)

^5[1-5]\d\{14\}$

American Express (starts with 34 or 37, 15 digits)

^3[47]\d\{13\}$

General credit card (with optional spaces/dashes)

^\d\{4\}[-\s]?\d\{4\}[-\s]?\d\{4\}[-\s]?\d\{4\}$ ```_

IP-Adressen

```regex

IPv4 address

| ^(?:(?:25[0-5] | 2[0-4]\d | [01]?\d\d?).)\{3\}(?:25[0-5] | 2[0-4]\d | [01]?\d\d?)$ |

IPv6 address (simplified)

^(?:[0-9a-fA-F]\{1,4\}:)\{7\}[0-9a-fA-F]\{1,4\}$

IPv4 or IPv6

| ^(?:(?:25[0-5] | 2[0-4]\d | [01]?\d\d?).)\{3\}(?:25[0-5] | 2[0-4]\d | [01]?\d\d?)$ | ^(?:[0-9a-fA-F]\{1,4\}:)\{7\}[0-9a-fA-F]\{1,4\}$ | ```_

Sprachenspezifische Beispiele

JavaScript

```javascript // Basic regex usage const pattern = /hello/i; // Case insensitive const text = "Hello World"; console.log(pattern.test(text)); // true

// String methods with regex text.match(/\w+/g); // ["Hello", "World"] text.replace(/hello/i, "Hi"); // "Hi World" text.split(/\s+/); // ["Hello", "World"]

// Constructor syntax const regex = new RegExp("hello", "i"); ```_

Python

```python import re

Basic usage

pattern = r'hello' text = "Hello World" match = re.search(pattern, text, re.IGNORECASE)

Common methods

re.findall(r'\w+', text) # ['Hello', 'World'] re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE) # "Hi World" re.split(r'\s+', text) # ['Hello', 'World']

Compiled patterns (more efficient for repeated use)

pattern = re.compile(r'\d+') pattern.findall("123 and 456") # ['123', '456'] ```_

Java

```java import java.util.regex.*;

// Basic usage String pattern = "hello"; String text = "Hello World"; boolean matches = Pattern.matches("(?i)" + pattern, text);

// Pattern and Matcher objects Pattern p = Pattern.compile("\w+"); Matcher m = p.matcher(text); while (m.find()) \{ System.out.println(m.group()); // Prints each word \}

// String methods text.replaceAll("(?i)hello", "Hi"); // "Hi World" text.split("\s+"); // ["Hello", "World"] ```_

PHP

```php // Basic usage $pattern = '/hello/i'; $text = "Hello World"; $matches = preg_match($pattern, $text); // 1 if found

// Common functions preg_match_all('/\w+/', $text, $matches); // Find all words preg_replace('/hello/i', 'Hi', $text); // "Hi World" preg_split('/\s+/', $text); // ["Hello", "World"]

// With capture groups preg_match('/(\w+)\s+(\w+)/', $text, $matches); // $matches[1] = "Hello", $matches[2] = "World" ```_

Flaggen und Modifier

Flaggen

```regex

Case insensitive

/pattern/i # JavaScript (?i)pattern # Inline flag re.IGNORECASE # Python

Global (find all matches)

/pattern/g # JavaScript re.findall() # Python (default behavior)

Multiline (^ and $ match line breaks)

/pattern/m # JavaScript re.MULTILINE # Python

Dot matches newline

/pattern/s # JavaScript re.DOTALL # Python

Extended (ignore whitespace, allow comments)

/pattern/x # Some languages re.VERBOSE # Python ```_

Leistungsspitzen

Optimierungsstrategien

```regex

Use specific character classes instead of .

\d+ # Better than .+ for digits [a-zA-Z]+ # Better than .+ for letters

Anchor patterns when possible

^pattern # Faster when pattern should be at start pattern$ # Faster when pattern should be at end

Use non-capturing groups when you don't need the capture

(?:abc)+ # Better than (abc)+ if you don't need the group

Avoid catastrophic backtracking

(a+)+b # Dangerous pattern a+b # Better alternative

Use atomic groups or possessive quantifiers

(?>``a+)b # Atomic group (some languages) a++b # Possessive quantifier (some languages) ```_

Häufige Fallstricke

```regex

Greedy quantifiers can be slow

.expensive # Can be slow on long strings .?expensive # Often faster (lazy)

Alternation order matters

cat|catch # "cat" will match first part of "catch" catch|cat # Better: longer alternative first

Escape special characters

. # Literal dot \$ # Literal dollar sign ( # Literal parenthesis ```_

Prüfung und Debugging

Online-Tools

  • regex101.com - Interaktiver Regex-Tester mit Erläuterungen
  • regexr.com - Visual Regex Builder und Tester
  • regexpal.com - Einfaches Regex-Testwerkzeug
  • regexper.com - Visual Regex Diagramme

Teststrategien

```regex

Start simple and build complexity

\d # Start with basic digit matching \d+ # Add quantifier \d\{2,4\} # Add specific range ^\d\{2,4\}$ # Add anchors

Test edge cases

"" # Empty string "a" # Single character "aaa...aaa" # Very long strings "special!@#" # Special characters ```_

Diese umfassende Regex-Führung umfasst die wesentlichen Muster und Techniken, die für eine effektive Textverarbeitung über verschiedene Programmiersprachen und Werkzeuge benötigt werden. Praxis mit realen Beispielen, um diese leistungsstarken Muster-matching Fähigkeiten zu meistern.