Regular Expression 101 for Programmers

This article is part of my web development series 'webDEV101'. You can read the whole series here:

Regular Expression
Regular expressions are used to match strings.
At some point in your web dev or general coding journey, you will inevitably encounter a situation where you want to find and replace in bulk. For instance, if you just realised you forgot to enclose all your Katex code with \(\)
inside your HTML before deployment, you will want to be able to find all KaTex codes and enclose them in bulk.
The problem is that KaTex codes are not fixed strings. You can't just search for them with literal text replacement. That's where regular expressions come in. They supercharge your find-and-replace so it can search for not just text but also text patterns, taking over the otherwise tedious manual find-and-replace process.
In addition to finding and replacing, Regex can also help you search for information efficiently. For instance, if you have a large amount of text, and you are interested in what names are mentioned for how many times, you can construct a Regex to search for names and return all matches and their count in one go.
The variations
Regular expression, or Regex for short, is supported natively in many text editors, programming languages, and file managers, such as the built-in editor in VSCode, Javascript, and Total Commander file manager.
There are a handful of variations in circulation today:
Abbreviation | Full name | Used by |
---|---|---|
POSIX | Portable Operating System Interface | UNIX and associated CLI tools |
PCRE | Perl Compatible Regular Expressions | Perl, PHP, R, and many others |
RE2 | Regular Expressions 2 | Google's software and libraries |
Oniguruma | Oniguruma Regular Expressions | Ruby, PHP, and others |
ECMA | ECMAScript® Language Specification | Javascript and many web applications |
The POSIX standard can be further divivded into:
Abbreviation | Full name | Used by |
---|---|---|
ERE | Extended Regular Expressions | Some POSIX-compliant systems |
BRE | Basic Regular Expressions | All POSIX-compliant systems |
The bad news is two of your favourite apps can be using different standards.
The good news is most of those standards are similar.
This tutorial focuses on the ECMA standard, which is widely used in web development and modern text editors.

The basics
Simple pattern
string
matches for the string itself. For instance:
abc
matches for 'Let's sing abc!'.
[list of characters]
matches for any ONE of the characters in the list. For instance:
[abc]
matches for a, b, or c;
[a-c]
is equivalent to [abc]
;
[ac-]
matches a, c, or -;
[^abc]
matches for anything except a,b, or c.
.
is the wildcard that matches anything except line terminators like \n
and \r
;
\d
matches for numerals. B\d
for instance, matches for B1, B2, B3, ...;
\w
matches for any alphanumerical character, including letters a-z and \d
numerals;
\s
matches for whitespace characters, including tabs and line terminators like \r
and \n
;
Press theSPACE
key on your keyboard and match literally, if you just wish to match a single whitespace. For instanceHe is handsome
matches for 'He is handsome and young!'
For \d
, \w
, and \s
, you can use the capitalised correspondent to match for anything EXCEPT that class. \D
matches for any character except numerals; \W
matches for any character except alphanumerals; while \S
matches for any character except whitespaces.
\t
matches for a tab;
\n
matches for a line feed, also called a newline character, which is created when you press ENTER
in most text editors. It is usually invisible.
For instance, if you wish to match for:
'Hello
World!'You would search for
Hello\nWorld!
A|B
, which is called disjunction, matches for either A or B. For instance, Green|Red
matches 'Green Apples' and 'Red Apples'. You can use disjunctions as many times as you like in parallel, like in A|B|C|D|E|F
.
Assertions
^String
matches the String only when it is at the beginning of the text. The official name for this ^
is input boundary beginning assertion. In many systems, a multiple line (m) flag modifies the condition to the beginning of a new line, such as after a \n
. We will discuss flags later.
For instance, ^Hello
matches only the first 'Hello' in 'Hello! Hello'
String$
matches the String only when it is at the end of the text. The official name for this ^
is input boundary end assertion. In many systems, a multiple line (m) flag modifies the condition to the end of a new line, such as after a \n
. We will discuss flags later.
For instance, ^Hello
matches only the last 'Hello' in 'Hello! Hello'
\b
is called word boundary assertion. \boo
would match 'oops' but not '🦘kangaroo'. oo\b
would match '🦘kangaroo' but not 'oops'.
\B
, similar to \D
,\W
,and \S
are the inverse of \d
,\w
, and \s
, is the inverse of \b
.
x(?=y)
is a lookahead assertion. It matches only for 'x's followed by a 'y'.
x(?!y)
is a negative lookahead assertion. It matches only for 'x's NOT followed by a 'y'.
(?<=y)x
is a lookbehind assertion. It matches only for 'x's PRECEDED by a 'y'.
(?<!y)x
is a negative lookbehind assertion. It matches only for 'x's NOT preceded by a 'y'.
Quantifiers
You will most likely wish to match for more than a few characters. Quantifiers can help.
Quantifier | Minimum | Maximum |
---|---|---|
? | 0 | 1 |
* | 0 | Infinity |
+ | 1 | Infinity |
{count} | count | count |
{min,} | min | Infinity |
{min,max} | min | max |
Quantifiers should be specified immediately after a character or character group1. For instance, A\S*
matches for any single word starting with 'A'
1: A character group is a set of characters, denoted by opening and closing parenthesis()
.(abc){3}
will match any 'abcabcabc' in the target text. More on groups later.
For the {count}
, {min,}
, and {min,max}
syntaxes, there should not be any space around the numbers, or they will be matched as literal characters instead.
Flags
(?i)
ignore letter case. (?i)Apple
matches for 'apple'.
(?g)
matches globally. The default in some Regex systems is to terminate matching after the first match. (?g)
ensures all matches are returned.
(?m)
matches multiple lines. As explained before, ^String
matches the String only when it is at the beginning of the TEXT, while String$
matches the String only when it is at the end of the TEXT. (?m)
changes the condition to the beginning or end of each LINE.
(?s)
allows .
to match newline characters \n
.
Groups
Groups, denoted with round brackets (group)
, are used to isolate a snippet within the Regex, to apply local flags or to capture it for future reference.
For instance, in a text editor find and replace:
Hello, I am John! john! Johnathan!
Find: (\bJohn\b)
Replace with: Crazy$1!
-> Hello, I am CrazyJohn! ! john! Johnathan!
Hello, I am John! john! Johnathan!
Find: (?i:\bJohn\b)
Replace with: Crazy$1!
-> Hello, I am Crazy$1! ! Crazy$1! ! Johnathan!
Hello, I am John! john! Johnathan!
Find: (?<name>(?i:\bJohn\b))
Replace with: Crazy${name}!
-> Hello, I am CrazyJohn! ! Crazyjohn! ! Johnathan!
(\bJohn\b)
is an unnamed capturing group. ()
simply tells Regex this is a group.\bJohn\b
is the string pattern. Recall \b
is word boundary, so this string pattern matches for any 'John' that is one word. $1
refers back to the match ('John' for the 1st match, 'john' for the 2nd) later on in the Regex or the replace-with field. If there are multiple capturing groups, the match from the first group is referred back to with $1
, the second $2
, so on so forth.
Note: if the global flag (?g) is enabled, and there are multiple matches for a given capturing group, some Regex systems will return $1 as a list of all matches, while some will only return the last match.
For instance, if we apply(?g)(\bJohn\b)
to Hello, I am John! john! Johnathan!, some systems will return$1=\['John', 'john'\]
, while other systems will return$1='john'
.
For this reason, the Replace All function in text editors, which is essentially Replace with an invisible (?g), can cause unexpected behaviours.
(?i:\bJohn\b)
is a modified non-capturing group. As seen, the matches has not been registered as $1
The purpose of this group is not to enable future reference, but to specify with the i
flag that this group specifically should be matched with case insensitivity, even though other groups may or may not.
(?<name>(?i:\bJohn\b))
is a named capturing group that nested a modified non-capturing group. Modifiers can not be applied directly to a capturing group, so this nesting structure is necessary to modify a capturing group. <name>
is the group name, used to refer back to the group, so in addition to$1
, we can also refer to it by ${name}
.
Note: Not all text editor Find and Replace support named groups.
Use in Javascript
Javascript subscribes to the ECMA standard.
A constant Regex:
const myRe = /\bA.+\b/g;
const myArray = myRe.exec("MyApple!");
console.log(`The value of lastIndex is ${myRe.lastIndex}`);
// "The value of lastIndex is 7"
Constant Regexs are denoted with /Regex/flags
. In this example the string pattern is \bA.+\b
, and g
is the global flag.
Think: do you understand why the last index is 7? (Hint: what is \bA.+\b
matching for?)
A dynamic Regex constructor function, to be used when the Regex will change during runtime, based on internal conditions, or from another source, such as user input:
const re = new RegExp("ab+c");
Find out more:

Remarks
If you followed this tutorial, you should now be familiar with regular expression syntax and basic methods. To learn more, consider reading the MDN Web Docs. GitHub Copilot can also answer specific questions you may have on regular expression!
Copyright Statement
Énoncé du droit d'auteur

Much of our content is freely available under the Creative Commons BY-NC-ND 4.0 licence, which allows free distribution and republishing of our content for non-commercial purposes, as long as Ronzz.org is appropriately credited and the content is not being modified materially to express a different meaning than it is originally intended for. It must be noted that some images on Ronzz.org are the intellectual property of third parties. Our permission to use those images may not cover your reproduction. This does not affect your statutory rights.
Nous mettons la plupart de nos contenus disponibles gratuitement sous la licence Creative Commons By-NC-ND 4.0, qui permet une distribution et une republication gratuites de notre contenu à des fins non commerciales, tant que Ronzz.org est correctement crédité et que le contenu n'est pas modifié matériellement pour exprimer un sens différent que prévu à l'origine.Il faut noter que certaines images sur Ronzz.org sont des propriétés intellectuelles de tiers. Notre autorisation d'utiliser ces images peut ne pas couvrir votre reproduction. Cela n'affecte pas vos droits statutaires.
Member discussion