Regular Expressions (RegEx) - Musteranpassung¶

**Komplette Anleitung zu regelmäßigen Ausdrücken für Musteranpassung und Textverarbeitung* *

Regelmäßige Ausdrücke (regex) sind leistungsstarke Muster-matching-Tools, die über Programmiersprachen, Texteditoren und Befehlszeilentools verwendet werden. Dieser umfassende Leitfaden umfasst Regex-Syntax, gemeinsame Muster und praktische Beispiele für eine effektive Textverarbeitung.

Grundprinzip¶

Literale Charaktere¶

```regex

Exact character matching¶

hello # Matches "hello" exactly 123 # Matches "123" exactly Hello World # Matches "Hello World" exactly

Case sensitivity (depends on flags)¶

Hello # Matches "Hello" but not "hello" (case sensitive) (?i)Hello # Matches "Hello", "hello", "HELLO" (case insensitive) ```_

Metacharacter¶

```regex

Special characters with meaning¶

. # Matches any single character except newline ^ # Matches start of string/line $ # Matches end of string/line * # Matches 0 or more of preceding element + # Matches 1 or more of preceding element ? # Matches 0 or 1 of preceding element |# OR operator (alternation) () # Grouping [] # Character class \\{\\} # Quantifiers # Escape character ```_

Zeichenklassen¶

```regex

Predefined character classes¶

\d # Any digit (0-9) \D # Any non-digit \w # Any word character (a-z, A-Z, 0-9, _) \W # Any non-word character \s # Any whitespace character (space, tab, newline) \S # Any non-whitespace character \n # Newline character \t # Tab character \r # Carriage return

Custom character classes¶

[abc] # Matches 'a', 'b', or 'c' [a-z] # Matches any lowercase letter [A-Z] # Matches any uppercase letter [0-9] # Matches any digit [a-zA-Z] # Matches any letter [a-zA-Z0-9] # Matches any alphanumeric character [^abc] # Matches any character except 'a', 'b', or 'c' [^0-9] # Matches any non-digit ```_

Quantifier¶

Grundwerte¶

```regex

Exact repetition¶

a\\{3\\} # Matches exactly 3 'a's: "aaa" a\\{2,5\\} # Matches 2 to 5 'a's: "aa", "aaa", "aaaa", "aaaaa" a\\{3,\\} # Matches 3 or more 'a's: "aaa", "aaaa", etc. a\\{,3\\} # Matches 0 to 3 'a's: "", "a", "aa", "aaa"

Common quantifiers¶

a* # Matches 0 or more 'a's (equivalent to a\\{0,\\}) a+ # Matches 1 or more 'a's (equivalent to a\\{1,\\}) a? # Matches 0 or 1 'a' (equivalent to a\\{0,1\\}) ```_

Greedy vs Lazy Quantifiers¶

```regex

Greedy (default) - matches as much as possible¶

.* # Matches as many characters as possible .+ # Matches as many characters as possible (at least 1) .\\{2,5\\} # Matches as many characters as possible (2-5)

Lazy (non-greedy) - matches as little as possible¶

.*? # Matches as few characters as possible .+? # Matches as few characters as possible (at least 1) .\\{2,5\\}? # Matches as few characters as possible (2-5)

Example difference¶

String: "Hello World" H.o # Greedy: matches "Hello Wo" (to last 'o') H.?o # Lazy: matches "Hello" (to first 'o') ```_

Anker und Boundaries¶

Positions-Anker¶

```regex

String/line boundaries¶

^ # Start of string or line $ # End of string or line \A # Start of string (not line) \Z # End of string (not line) \z # Very end of string

Examples¶

^Hello # Matches "Hello" at start of line World$ # Matches "World" at end of line ^Hello World$ # Matches entire line containing only "Hello World" ```_

Wortschwärmer¶

```regex

Word boundaries¶

\b # Word boundary \B # Non-word boundary

Examples¶

\bcat\b # Matches "cat" as whole word, not in "category" \Bcat\B # Matches "cat" only when not at word boundaries \bcat # Matches "cat" at start of word: "cat", "category" cat\b # Matches "cat" at end of word: "cat", "tomcat" ```_

Gruppen und Aktivitäten¶

Grundgruppen¶

```regex

Grouping with parentheses¶

(abc) # Groups "abc" together (abc)+ # Matches one or more "abc" sequences (abc|def) # Matches either "abc" or "def" (abc)\\{2,4\\} # Matches 2 to 4 "abc" sequences

Non-capturing groups¶

(?:abc) # Groups without capturing (?:abc|def)+ # Matches sequences of "abc" or "def" ```_

Gruppen aufnehmen¶

```regex

Numbered captures¶

(abc)(def) # Capture group 1: "abc", group 2: "def" (\d\\{4\\})-(\d\\{2\\})-(\d\\{2\\}) # Captures date parts: year, month, day

Named captures¶

(?\d\\{4\\})-(?\d\\{2\\})-(?\d\\{2\\}) # Named groups (?P\w+) # Python-style named group

Backreferences¶

(\w+)\s+\1 # Matches repeated words: "hello hello" (["'])(.*?)\1 # Matches quoted strings with same quote type ```_

Lookahead und Lookbehind¶

Suchbegriffe¶

```regex

Positive lookahead¶

\d+(?=\s*dollars) # Matches digits followed by "dollars" \w+(?=@) # Matches username before @ in email

Negative lookahead¶

\d+(?!\s*cents) # Matches digits NOT followed by "cents" \w+(?!@) # Matches words NOT followed by @ ```_

Blick hinter den Himmel¶

```regex

Positive lookbehind¶

(?``<=$)\d+ # Matches digits preceded by $ (?<=@)\w+ # Matches domain after @ in email

Negative lookbehind¶

(?<!$)\d+ # Matches digits NOT preceded by $ (?<!@)\w+ # Matches words NOT preceded by @ ```_

Gemeinsame Muster¶

E-Mail-Bewertung¶

```regex

Basic email pattern¶

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]\{2,\}\b

More comprehensive email¶

^[a-zA-Z0-9.!#$%&'*+/=?^_`\\\{|\\\}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]\\\{0,61\\\}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]\\\{0,61\\\}[a-zA-Z0-9])?)*$

Simple email validation¶

^\S+@\S+.\S+$ ```_

Telefonnummern¶

```regex

US phone numbers¶

$?\d\\\{3\\\}$?[-.\s]?\d\{3\}[-.\s]?\d\{4\} # (123) 456-7890 or 123-456-7890 ^+?1?[-.\s]?$?\d\\\{3\\\}$?[-.\s]?\d\{3\}[-.\s]?\d\{4\}$ # With optional country code

International format¶

^+?[1-9]\d\{1,14\}$ # E.164 format ```_

URLs¶

```regex

Basic URL pattern¶

https?://[^\s]+

More comprehensive URL¶

^https?://(?:[-\w.])+(?::[0-9]+)?(?:/(?:[\w/_.])(?:\?(?:[\w&=%.]))?(?:#(?:[\w.])*)?)?$

Domain validation¶

^(?:a-zA-Z0-9?.)+[a-zA-Z]\{2,\}$ ```_

Datum¶

```regex

MM/DD/YYYY format¶

^(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])/\d\{4\}$

YYYY-MM-DD format (ISO 8601)¶

^\d\{4\}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Flexible date formats¶

\b\d\{1,2\}[/-]\d\{1,2\}[/-]\d\{2,4\}\b ```_

Kreditkartennummern¶

```regex

Visa (starts with 4, 13-16 digits)¶

^4\d\{12\}(?:\d\{3\})?$

MasterCard (starts with 5, 16 digits)¶

^5[1-5]\d\{14\}$

American Express (starts with 34 or 37, 15 digits)¶

^3[47]\d\{13\}$

General credit card (with optional spaces/dashes)¶

^\d\{4\}[-\s]?\d\{4\}[-\s]?\d\{4\}[-\s]?\d\{4\}$ ```_

IP-Adressen¶

```regex

IPv4 address¶

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?).)\{3\}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

IPv6 address (simplified)¶

^(?:[0-9a-fA-F]\{1,4\}:)\{7\}[0-9a-fA-F]\{1,4\}$

IPv4 or IPv6¶

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?).)\{3\}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$|^(?:[0-9a-fA-F]\\\{1,4\\\}:)\\\{7\\\}[0-9a-fA-F]\\\{1,4\\\}$ ```_

Sprachenspezifische Beispiele¶

JavaScript¶

```javascript // Basic regex usage const pattern = /hello/i; // Case insensitive const text = "Hello World"; console.log(pattern.test(text)); // true

// String methods with regex text.match(/\w+/g); // ["Hello", "World"] text.replace(/hello/i, "Hi"); // "Hi World" text.split(/\s+/); // ["Hello", "World"]

// Constructor syntax const regex = new RegExp("hello", "i"); ```_

Python¶

```python import re

Basic usage¶

pattern = r'hello' text = "Hello World" match = re.search(pattern, text, re.IGNORECASE)

Common methods¶

re.findall(r'\w+', text) # ['Hello', 'World'] re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE) # "Hi World" re.split(r'\s+', text) # ['Hello', 'World']

Compiled patterns (more efficient for repeated use)¶

pattern = re.compile(r'\d+') pattern.findall("123 and 456") # ['123', '456'] ```_

Java¶

```java import java.util.regex.*;

// Basic usage String pattern = "hello"; String text = "Hello World"; boolean matches = Pattern.matches("(?i)" + pattern, text);

// Pattern and Matcher objects Pattern p = Pattern.compile("\w+"); Matcher m = p.matcher(text); while (m.find()) \{ System.out.println(m.group()); // Prints each word \}

// String methods text.replaceAll("(?i)hello", "Hi"); // "Hi World" text.split("\s+"); // ["Hello", "World"] ```_

PHP¶

```php // Basic usage $pattern = '/hello/i'; $text = "Hello World"; $matches = preg_match($pattern, $text); // 1 if found

// Common functions preg_match_all('/\w+/', $text, $matches); // Find all words preg_replace('/hello/i', 'Hi', $text); // "Hi World" preg_split('/\s+/', $text); // ["Hello", "World"]

// With capture groups preg_match('/(\w+)\s+(\w+)/', $text, $matches); // $matches[1] = "Hello", $matches[2] = "World" ```_

Flaggen und Modifier¶

Flaggen¶

```regex

Case insensitive¶

/pattern/i # JavaScript (?i)pattern # Inline flag re.IGNORECASE # Python

Global (find all matches)¶

/pattern/g # JavaScript re.findall() # Python (default behavior)

Multiline (^ and $ match line breaks)¶

/pattern/m # JavaScript re.MULTILINE # Python

Dot matches newline¶

/pattern/s # JavaScript re.DOTALL # Python

Extended (ignore whitespace, allow comments)¶

/pattern/x # Some languages re.VERBOSE # Python ```_

Leistungsspitzen¶

Optimierungsstrategien¶

```regex

Use specific character classes instead of .¶

\d+ # Better than .+ for digits [a-zA-Z]+ # Better than .+ for letters

Anchor patterns when possible¶

^pattern # Faster when pattern should be at start pattern$ # Faster when pattern should be at end

Use non-capturing groups when you don't need the capture¶

(?:abc)+ # Better than (abc)+ if you don't need the group

Avoid catastrophic backtracking¶

(a+)+b # Dangerous pattern a+b # Better alternative

Use atomic groups or possessive quantifiers¶

(?>``a+)b # Atomic group (some languages) a++b # Possessive quantifier (some languages) ```_

Häufige Fallstricke¶

```regex

Greedy quantifiers can be slow¶

.expensive # Can be slow on long strings .?expensive # Often faster (lazy)

Alternation order matters¶

cat|catch # "cat" will match first part of "catch" catch|cat # Better: longer alternative first

Escape special characters¶

. # Literal dot $ # Literal dollar sign ( # Literal parenthesis ```_

Prüfung und Debugging¶

Online-Tools¶

regex101.com - Interaktiver Regex-Tester mit Erläuterungen
regexr.com - Visual Regex Builder und Tester
regexpal.com - Einfaches Regex-Testwerkzeug
regexper.com - Visual Regex Diagramme

Teststrategien¶

```regex

Start simple and build complexity¶

\d # Start with basic digit matching \d+ # Add quantifier \d\{2,4\} # Add specific range ^\d\{2,4\}$ # Add anchors

Test edge cases¶

"" # Empty string "a" # Single character "aaa...aaa" # Very long strings "special!@#" # Special characters ```_

Diese umfassende Regex-Führung umfasst die wesentlichen Muster und Techniken, die für eine effektive Textverarbeitung über verschiedene Programmiersprachen und Werkzeuge benötigt werden. Praxis mit realen Beispielen, um diese leistungsstarken Muster-matching Fähigkeiten zu meistern.