Intro to Regular Expressions: Special meaning characters

Welcome back to our Series "Intro to Regular Expressions". If you haven't done so, you may want to read the previous article in this series:

Special meaning characters

As you know, a Regular Expression (we will call them “Regex” from here on) consists of multiple characters. Some of these characters have a literal meaning - that is, they stand exactly what they are. An “A” stands form an “A”, a “7” stands for a seven. However, some characters have a special meaning in a Regex. They do not stand for themselves, but, for example, for any character from a certain set.

An example of such a character is the dot (“.”) character. It will not just match an actual dot in your text; in a Regex, a dot matches any character. It matches a “B”, the digit “8”, and even a space (yes, a space, or blank, is an actual character.) Even an actual dot. Because a dot is a character too – and the rules say “any character”.

So, how would that be used in reality? For example, the pattern f.re would match any four-letter word which starts with an f and ends with “re”. Like F any character R E. It would match “fire”, but also “fore”, and “fare”. Actually, it would also match “f4re” or “f-re” – however unlikely it is that your text would contain such words.

Another example is the backslash-d character (\d). Of course many will rightly point out that a backslash and a d are two characters, but in a Regex, those two characters construct what is formally known as a pattern element, and for that reason we will call that a “special character” for the purpose of this blog. The \d will match any digit. (Recall that a digit is a single numerical symbol, such as “2” or “5”. “17” is not a digit, but a number, consisting of two digits.) So, to expand on our previous example, the Regex 7\d1 would match any 3-digit number that starts with a 7 and ends in 1. The middle digit can be any digit. So “731”, “751”, “771” would all be examples of strings that would be matched by this Regex. The string “7e1” however, would not be matched – because “e” is not a digit.

We used the word “string” in the previous paragraph. In case you’re not a programmer: “string” is just a fancy word for a sequence of characters.

Now that we know that a dot has a special meaning – what do you do when you want to match just an actual dot, and not “any character”? There is a way to remove the special meaning from a special-meaning character: you just prefix it with a backslash. Let’s say you are searching for the word “yesterday”, but only if it occurs at the end of a sentence. So you would search for it to occur together with a full-stop. Using yesterday. as a pattern would not work reliably, because it would also match “yesterdays”, which is not what you want. So you just use yesterday\. That backslash “demotes” the dot to be just a dot - and not a special character.

There are many such special characters. In our macOS app ReX-T, an app used to design and test Regular Expressions, we provide a large library with all the meta characters and examples of their use. Moreover, you can create and save your own useful Regular Expressions for use at another time.

Library with Detail.png

How about downloading ReX-T now from the App Store and experimenting a little with what you have just learned?

Come back next week to read the next part of our series on Regular Expressions. You will learn that anchors are not just used on boats, but in Regex patterns too!

Comments

No comments yet.

You can submit a comment or question related to this blog post. Your post may be subject to moderation and therefor not appear immediately. Please be patient.