Intro to Regular Expressions: What are they?

In this series of blog posts we will look at various aspects of Regular Expressions, starting with an overview about what Regular Expressions (or “Regex”) are and what you can use them for.

What are Regular Expressions?

A regular expression, at it’s most simple, is a pattern of characters used to search text.

Let’s say you wish to search in a rather long text for all occurrences of the word “color”. Now, as you know, color can be spelled two ways, depending upon your background. In “English” English, or as the British like to call it, “proper” English, it is spelled with a “u” as in “colour", but in American English, for whatever reason, the “u” is dropped.

Most programs allow you to search just using literal text.

So, to find the word “color” you would simply search for that word, spelled without the “u”. But, let’s add a wrinkle, and say you do not know what “flavor” of English the text is. You could simply do two searches, one with, and one without the letter “u”.

But, surely there could be an easier way? What if you could do a search that would find both spellings in one search?

And, of course, this is where Regular Expressions come in.

Searching for Patterns

In most programs, when you search for a text fragment, the program will only find literal occurrences of that text.

Could we frame the search in another way? What if we could say, “find the word ‘color’ and select it, but if there is also the word ‘colour’ select it as well”. If you think about it, you could break it down to searching for the word “color” with an option to include a letter “u”, if it exists, after the second “o” and before the letter “r”. In other words, you’re describing what you are searching for in the form of a pattern.

A regular expression, in a simple way, allows you to do this: it describes what you are looking for, instead of providing the desired text literally.

Here is what you could do. If your program supports Regular Expressions, you enter this:

colo(u?)r

As you will have guessed, the question mark makes the preceding character an optional one. So it will both find the character sequences “c o l o r” and “c o l o u r” - in other words, whether or not there is a “u” between the “l” and the second “o”.

In Regular Expression-enabled programs, the search function gives you an option of entering a Regular Expression (sometimes called a “Pattern”).

A regular expression search field

Such patterns can express what you are searching for in a way that lets you do incredibly complex searches – like “I am looking for a text that starts with a vowel, has at least three consonants, and ends with a digit or the letter ‘b’.” Of course, the actual way of expressing this search – the pattern syntax – can be pretty complicated, too. With our series of blog posts, of which you are reading the first part just now, we want to give you a gentle introduction into how such patterns are constructed.

Next week: How to use Regex to solve real-life problems

Next week we will show you everything you need to use Regular Expressions to search for different spellings of words, and some more complex uses.

So be sure to drop by again here soon! Or why don’t you subscribe to our newsletter to get notified when the next part is online?

Comments

No comments yet.

You can submit a comment or question related to this blog post. Your post may be subject to moderation and therefor not appear immediately. Please be patient.