 
  
    
    
    
 
  
An Intro to Regex: The Hacker’s Guide to Pattern Matching
Table of Contents
University’s over. YAAY. Another semester ending, with a VERY exciting new one starting in september (you can read more about that here ). Which means i now have the time to come back to regular, educational posts, producing a lot more than in recent weeks of.. uni struggle.
Let’s start this one off with an intro to Regular Expressions, where i’ll solve HackerRank challenges, try to explain how to approach problems and in the meanwhile, trying to teach you some regex.
I’m completely new to regex myself, the fundamentals i’ve heard but applying them..? ugh.
What is a Regular expression?
It is a series of characters defining a search pattern, mainly string search patterns.
Say you want to extract wikipedia from https://en.wikipedia.org/ - of course not only from that single string, but maybe from every found subdomain, No matter how nested it becomes?
Use Regex for that!
For reference, this overview seems quite nice in explaining the fundamentals.
  The single dot .
  
  
    
 
  
. matches any character except newline - ... means “just any three characters”.
Meanwhile, if we want a real dot in the output, something like
AC.BDI, we’d need to escape that.-..\....would be the pattern.. WOULD IT?..\....$- with$, you ensure that it is really the end of the string - as you can with the definitive start of the string:^.
  Matching digits - \d and \D
  
  
    
 
  
\d matches any digit \(d \in [0,9] \), while \D is quite the opposite, matching anything but digits.
So if we have something along the lines of ndnndnn, n representing a non-digit and d a digit, we could match that using ^\D\d\D\D\d\D\D$.
Sometimes the HackerRank challenges seem not that straightforward with whether something is allowed to come after a matched pattern.. if you have trouble, try the same pattern with and without the ending
$.
  Matching whitespace characters - \s and \S.
  
  
    
 
  
Types of whitespace characters
There are multiple whitespace characters in (more or less) use:
- Space (- Tab (\t)
- Newline (\n)
Also, “space-like things,” just older, from typewriters and old printers:
- Carriage return (\r): from typewriters, moving the print head to the start of the line, but not down.
- Form feed (\f): from old printers, telling them to advance to the next page (basically printer page break)
  \s in regex
  
  
    
 
  
To match all of those, we would use [ \r\n\t\f] (see the space at the beginning? sneaky).
To shorten this, \s \( = \) [ \r\n\t\f].
  \S in regex
  
  
    
 
  
As before, \S is just the opposite to this, matching anything but white-space characters.
Example
So, if we were searching for a string like " Hi    there. “, we’d do it like ^\s\S\S\s\S\S\S\S\S\.\s$, or shorter (many non-whitespace next to each other means we can shorten): ^\s\S{2}\s\S{5}\.\s$.
Regex Abbreviations (Repetition Quantifiers)
Ever get tired of typing out a thousand zs in a row just to match zzz? Regex has your back - and it’s got some super compact ways to say “repeat this thing a few times”.
This is your cheat-sheet for all those curly-brace {} expressions that make regex feel like you’re negotiating with a robot.
  #1 - Exact Matches: {n}
  
  
    
 
  
  Syntax: z{3}
  
  
    
 
  
  Translation: Match exactly 3 zs
  
  
    
 
  
  Matches: zzz
  
  
    
 
  
  Does NOT Match: zz, zzzz
  
  
    
 
  
💬 Think:
“Give me precisely 3. Not 2. Not 4. Just 3.”
  #2 - A Range: {min,max}
  
  
    
 
  
  Syntax: z{3,6}
  
  
    
 
  
  Translation: Match between 3 and 6 zs
  
  
    
 
  
  Matches: zzz, zzzz, zzzzz, zzzzzz
  
  
    
 
  
  Does NOT Match: zz, zzzzzzz
  
  
    
 
  
💬 Think:
“Let’s be flexible. I want between 3 and 6 of these.”
🧪 Useful when the input isn’t fixed but has bounds - like matching username lengths, padding, or predictable fuzzing targets.
  #3 - “At Least…”: {min,}
  
  
    
 
  
  Syntax: z{3,}
  
  
    
 
  
  Translation: Match 3 or more zs
  
  
    
 
  
  Matches: zzz, zzzz, zzzzzzzzzzzzzzzz
  
  
    
 
  
  Does NOT Match: zz, z
  
  
    
 
  
💬 Think:
“Give me 3, or go wild. No upper limit.”
🔥 CTF use-case: pattern detection in encoded blobs, overflow inputs, brutish log formats.
  #4 - The Ghost of {,max}
  
  
    
 
  
Yes, {,max} technically exists in some engines (like JavaScript), meaning “0 to max times”.
But most modern regex engines don’t like it. Instead:
- Use z{0,3}(✅)
- Avoid z{,3}(❌ mostly unsupported)
💬 Think:
“Just say what you mean. Regex hates ambiguity.”
📌 Bonus: Greedy vs Lazy
All {} quantifiers are greedy by default - they’ll match as much as they can.
Greedy:
z{3,6}
Matches the longest sequence (up to 6).
Lazy:
z{3,6}?
Matches the shortest valid one (starts with 3).
💬 Greedy: “I want it ALL!” 💬 Lazy: “I’ll take just enough to get by.”
🔐 TL;DR Cheat Table
| Pattern | Meaning | Matches | 
|---|---|---|
| z{3} | Exactly 3 zs | zzz | 
| z{3,6} | Between 3 and 6 zs | zzztozzzzzz | 
| z{3,} | 3 or more zs | zzz,zzzz,zzzz... | 
| z{,6} | (Mostly unsupported) 0 to 6 zs | ❌ use z{0,6}instead | 
| z{3,6}? | Lazy: match 3–6, prefer shortest | zzzif possible | 
Hacker Use-Cases
- Input sanitization bypasses: e.g., find oversized padding with a{100,}
- Pattern obfuscation: e.g., match irregular base64 with =+
- Buffer overflow fuzzing: simulate payload structure like A{256}
Remember, in regex, brevity is power - but clarity wins flags.
Curly braces aren’t just stylish - they’re your repeat offenders.
🎭 Regex Optionalities & Control Tricks
Regex isn’t just about matching letters and numbers : it’s about control. Whether you’re parsing log files, reversing obfuscation, or crafting payloads, these tools give you precision over what, how much, and when to match.
This section covers:
- |(OR)
- [](character sets)
- ?,- +,- *(quantifiers)
- ()(groups)
- ()?,- *?,- +?(lazy versions)
  | : The OR Operator
  
  
    
 
  
Syntax:
foo|bar
  Meaning: Match foo OR bar.
  
  
    
 
  
💬 Think:
“Give me one thing or the other : no middle ground.”
Example:
admin|root
✅ Matches: admin, root
❌ Does not match: administrator
🛠 Useful in:
- Matching multiple HTTP methods: GET|POST|DELETE
- Handling payload variations: alert|confirm|prompt
  [] : Character Sets (Character Classes)
  
  
    
 
  
Syntax:
[abc]
  Meaning: Match exactly one character that is either a, b, or c.
  
  
    
 
  
💬 Think:
“I’ll take any one of these options.”
Examples:
- [a-z]→ any lowercase letter
- [0-9]→ any digit (same as- \d)
- [aeiou]→ any vowel
- [^x]→ any character except- x(note the- ^)
Useful in Username checks, filters, binary patterns, input validation
  ? : Zero or One (Optional)
  
  
    
 
  
Syntax:
colou?r
  Meaning: Match color or colour (the u is optional)
  
  
    
 
  
💬 Think:
“Maybe it’s there, maybe it’s not.”
Matches:
- color
- colour
Common in:
- Optional elements in logs: https?matches bothhttpandhttps
- Weak password detection: [A-Z]?: maybe one capital letter
  + : One or More (Mandatory)
  
  
    
 
  
Syntax:
a+
  Meaning: Match one or more as.
  
  
    
 
  
✅ Matches: a, aa, aaaaa
❌ Does not match: (empty string)
💬 Think:
“There must be at least one.”
  * : Zero or More
  
  
    
 
  
Syntax:
a*
  Meaning: Match zero or more as.
  
  
    
 
  
✅ Matches: ``, a, aaaa
❌ Never fails unless the pattern is completely missing
💬 Think:
“Take as many as you’ve got. Or none.”
Example:
.*   → match anything (used in greedy matching)
  () : Grouping
  
  
    
 
  
Syntax:
(foo|bar)+
  Meaning: Treat foo|bar as a single unit.
  
  
    
 
  
💬 Think:
“Group this together, apply stuff to it.”
Use cases:
- Applying +,*,?to full patterns
- Capturing sub-patterns for extraction
  ()? : Optional Group
  
  
    
 
  
Syntax:
(auth(user)?)?
Meaning:
- useris optional
- authis optional
- Entire pattern optional
✅ Matches: auth, authuser, or nothing at all (auth is optional, however, user is inside the auth(...) group, so it can only appear after auth, if at all.)
Greedy vs Lazy (again!)
- +,- *,- ?→ greedy by default (grab as much as possible)
- Add ?→ makes it lazy
Examples:
a+     → grabs all the a's
a+?    → grabs the shortest match (just one)
.*?    → "anything, but as little as needed"
Use those for..
- Log parsing: Match optional IPs, headers, usernames
- Payload crafting: Optional XSS vectors, dynamic spacing
- Regex fuzzing: Control what parts of input mutate or repeat
Mastering optionality = mastering pattern precision.
No more, no less : just what the input demands.
Real example
Ask chatgpt a question, maybe producing a section for your notes. ChatGPT loves it’s little
---breaks. What if you didn’t want those? And the linebreak that comes before them EVERY TIME??
Unix version
\n--- - just match and remove!
Windows version
\r? → optionally matches carriage return for Windows, so: \r?\n---.
  Matching alphanumeric - \w and \W
  
  
    
 
  
  \w - matching any word character
  
  
    
 
  
\w matches
- a-z
- A-Z
- 0-9
- _(underscore)
  \W - again, just the opposite
  
  
    
 
  
As usual, \W matches everything except what \w matches.
Example
Match: wwNNwww, w marking word and N non-word characters.
Solution
^\w{2}\W{2}\w{3}.
Challenge
Match dwwwww., after . there can’t be anything, d marking digits, w marking word characters.
Text me if you need the solution! Love you guys, have a nice one.