Regex — The (Love|Hate) relationship

2024-11-03 #tech #regex

Close up of a Regex pattern displayed in a computer screen

Photo by Jonny Fox on Medium

Back again sitting on my chair after doing a terrible form validation for one of my assignments. I am telling it is terrible not only because it’s code smelling, it’s because I wanted to show off that I know regex.
(yes, it went bad)

First of all Regular Expressions or commonly known as Regex (reh·jeks) is a sequence of characters which are used for search patterns. Regex is widely supported in modern programming languages like Python, JavaScript, Ruby and even SQL supports it OOTB. Doesn’t sound bad right? Wait for it. When it comes to regex there are two groups. You either absolutely love regex or you want to pull your hair seeing regex in the codebase that your frenemies wrote. Why though?

Regex can be quite complicated cause it uses a shit ton of keywords and flags. Have a look at the following.

^(?! )[a-zA-Z0-9]*(?:[._-](?! )[a-zA-Z0-9]+)*(?<! )$|^.{3,25}$

First time? :3
Well that was a ridiculously complex pattern though there can be some occasions where you have to deal with long ass patterns.

Not only regex can be unbearably complex (if you suffer from skill issues), what if you have fat fingered one of the characters or even forgot to include one. That’s actually what happened to me.

const nameRegex = /^[A-Za-z]+(\s[a-z]+)?$/;
const emailRegex =
/^([A-Za-z])+[0-9]*(\.[A-Za-z0-9]+)*\@[A-Za-z]+\.[A-Za-z]{2,}$/;
const ageRegex = /^[0-9]{1,2}$/;

I Had the above lines to validate name, email and age that user inputs to the form. You can figure out right away what was the issue, comparing what’s different in those two square brackets which is in the name pattern. Yes, it should have been ^[A-Za-z]+(\s[A-Za-z]+)?$ instead of what I have written in the last few minutes before the deadline. Means now it only accepts names where the second name starts with a lowercase letter.

form-result

Well now you know why. You may think it’s just me, wrong! XD
Russian Government, Cisco, Cloudflare, AWS and even Google were victims of fat fingering regex. More on here.

How does Regex actually work

Regex operates by matching patterns in a string of words. The pattern is made of special characters called metacharacters.

While those being the basic regular expressions, grouping, workarounds,
asserting
are considered as extended regular expressions. If you wanna learn Regex with zero nonsense, try regexlearn.

Apart from being complicated and error-prone, Regex have another
problematic feature or an issue in this case known as Greedy matching.

Greedy matching illustration Photo on formulashq

As the name implies, regex tries to matches much as possible leading to overmatching a set of strings nonetheless the pattern.

Greedy matching demo

It can be mitigated easily by adding some metacharacters here and there but we keep forgetting to do so.

Well there’s my opinion about Regex, and I encourage each and everyone of you to learn regex but don’t use them unless you have to. If you have to, make sure to test them 20 or more times before you casually push your precious code to production on Friday. :3

Now the title make sense right?