Site Logo
Niklas Heringer - Cybersecurity & Math
Cover Image

An Intro to Regex: The Hacker’s Guide to Pattern Matching

University’s over. YAAY. Another semester ending, with a VERY exciting new one starting in september (you can read more about that here ). Which means i now have the time to come back to regular, educational posts, producing a lot more than in recent weeks of.. uni struggle.

Let’s start this one off with an intro to Regular Expressions, where i’ll solve HackerRank challenges, try to explain how to approach problems and in the meanwhile, trying to teach you some regex.

I’m completely new to regex myself, the fundamentals i’ve heard but applying them..? ugh.

What is a Regular expression?

It is a series of characters defining a search pattern, mainly string search patterns.

Say you want to extract wikipedia from https://en.wikipedia.org/ - of course not only from that single string, but maybe from every found subdomain, No matter how nested it becomes?

Use Regex for that!

For reference, this overview seems quite nice in explaining the fundamentals.


The single dot .

. matches any character except newline - ... means “just any three characters”.

Meanwhile, if we want a real dot in the output, something like AC.BDI, we’d need to escape that . - ..\.... would be the pattern.. WOULD IT? ..\....$ - with $, you ensure that it is really the end of the string - as you can with the definitive start of the string: ^.


Matching digits - \d and \D

\d matches any digit \(d \in [0,9] \), while \D is quite the opposite, matching anything but digits.

So if we have something along the lines of ndnndnn, n representing a non-digit and d a digit, we could match that using ^\D\d\D\D\d\D\D$.

Sometimes the HackerRank challenges seem not that straightforward with whether something is allowed to come after a matched pattern.. if you have trouble, try the same pattern with and without the ending $.


Matching whitespace characters - \s and \S.

Types of whitespace characters

There are multiple whitespace characters in (more or less) use:

Also, “space-like things,” just older, from typewriters and old printers:

\s in regex

To match all of those, we would use [ \r\n\t\f] (see the space at the beginning? sneaky). To shorten this, \s \( = \) [ \r\n\t\f].

\S in regex

As before, \S is just the opposite to this, matching anything but white-space characters.

Example

So, if we were searching for a string like " Hi there. “, we’d do it like ^\s\S\S\s\S\S\S\S\S\.\s$, or shorter (many non-whitespace next to each other means we can shorten): ^\s\S{2}\s\S{5}\.\s$.


Regex Abbreviations (Repetition Quantifiers)

Ever get tired of typing out a thousand zs in a row just to match zzz? Regex has your back - and it’s got some super compact ways to say “repeat this thing a few times”.

This is your cheat-sheet for all those curly-brace {} expressions that make regex feel like you’re negotiating with a robot.

#1 - Exact Matches: {n}

Syntax: z{3}

Translation: Match exactly 3 zs

Matches: zzz

Does NOT Match: zz, zzzz

💬 Think:

“Give me precisely 3. Not 2. Not 4. Just 3.”

#2 - A Range: {min,max}

Syntax: z{3,6}

Translation: Match between 3 and 6 zs

Matches: zzz, zzzz, zzzzz, zzzzzz

Does NOT Match: zz, zzzzzzz

💬 Think:

“Let’s be flexible. I want between 3 and 6 of these.”

🧪 Useful when the input isn’t fixed but has bounds - like matching username lengths, padding, or predictable fuzzing targets.

#3 - “At Least…”: {min,}

Syntax: z{3,}

Translation: Match 3 or more zs

Matches: zzz, zzzz, zzzzzzzzzzzzzzzz

Does NOT Match: zz, z

💬 Think:

“Give me 3, or go wild. No upper limit.”

🔥 CTF use-case: pattern detection in encoded blobs, overflow inputs, brutish log formats.

#4 - The Ghost of {,max}

Yes, {,max} technically exists in some engines (like JavaScript), meaning “0 to max times”.
But most modern regex engines don’t like it. Instead:

💬 Think:

“Just say what you mean. Regex hates ambiguity.”

📌 Bonus: Greedy vs Lazy

All {} quantifiers are greedy by default - they’ll match as much as they can.

Greedy:

z{3,6}

Matches the longest sequence (up to 6).

Lazy:

z{3,6}?

Matches the shortest valid one (starts with 3).

💬 Greedy: “I want it ALL!” 💬 Lazy: “I’ll take just enough to get by.”

🔐 TL;DR Cheat Table

Pattern Meaning Matches
z{3} Exactly 3 zs zzz
z{3,6} Between 3 and 6 zs zzz to zzzzzz
z{3,} 3 or more zs zzz, zzzz, zzzz...
z{,6} (Mostly unsupported) 0 to 6 zs ❌ use z{0,6} instead
z{3,6}? Lazy: match 3–6, prefer shortest zzz if possible

Hacker Use-Cases

Remember, in regex, brevity is power - but clarity wins flags.

Curly braces aren’t just stylish - they’re your repeat offenders.


🎭 Regex Optionalities & Control Tricks

Regex isn’t just about matching letters and numbers : it’s about control. Whether you’re parsing log files, reversing obfuscation, or crafting payloads, these tools give you precision over what, how much, and when to match.

This section covers:

| : The OR Operator

Syntax:

foo|bar

Meaning: Match foo OR bar.

💬 Think:

“Give me one thing or the other : no middle ground.”

Example:

admin|root

✅ Matches: admin, root ❌ Does not match: administrator

🛠 Useful in:

[] : Character Sets (Character Classes)

Syntax:

[abc]

Meaning: Match exactly one character that is either a, b, or c.

💬 Think:

“I’ll take any one of these options.”

Examples:

Useful in Username checks, filters, binary patterns, input validation

? : Zero or One (Optional)

Syntax:

colou?r

Meaning: Match color or colour (the u is optional)

💬 Think:

“Maybe it’s there, maybe it’s not.”

Matches:

Common in:

+ : One or More (Mandatory)

Syntax:

a+

Meaning: Match one or more as.

✅ Matches: a, aa, aaaaa ❌ Does not match: (empty string)

💬 Think:

“There must be at least one.”

* : Zero or More

Syntax:

a*

Meaning: Match zero or more as.

✅ Matches: ``, a, aaaa ❌ Never fails unless the pattern is completely missing

💬 Think:

“Take as many as you’ve got. Or none.”

Example:

.*   → match anything (used in greedy matching)

() : Grouping

Syntax:

(foo|bar)+

Meaning: Treat foo|bar as a single unit.

💬 Think:

“Group this together, apply stuff to it.”

Use cases:

()? : Optional Group

Syntax:

(auth(user)?)?

Meaning:

✅ Matches: auth, authuser, or nothing at all (auth is optional, however, user is inside the auth(...) group, so it can only appear after auth, if at all.)


Greedy vs Lazy (again!)

Examples:

a+     → grabs all the a's
a+?    → grabs the shortest match (just one)
.*?    → "anything, but as little as needed"

Use those for..

Mastering optionality = mastering pattern precision.

No more, no less : just what the input demands.


Real example

Ask chatgpt a question, maybe producing a section for your notes. ChatGPT loves it’s little --- breaks. What if you didn’t want those? And the linebreak that comes before them EVERY TIME??

Unix version

\n--- - just match and remove!

Windows version

\r? → optionally matches carriage return for Windows, so: \r?\n---.


Matching alphanumeric - \w and \W

\w - matching any word character

\w matches

\W - again, just the opposite

As usual, \W matches everything except what \w matches.

Example

Match: wwNNwww, w marking word and N non-word characters.

Solution

^\w{2}\W{2}\w{3}.


Challenge

Match dwwwww., after . there can’t be anything, d marking digits, w marking word characters.

Text me if you need the solution! Love you guys, have a nice one.