Help Center Knowledge Base

Knowledge Base Document #KB-00010

RegEx Glossary

Image ReX-T Icon

ReX-T Glossary

This glossary is meant to give you a brief overview of some of the most often used terms in working with regular expressions. You will encounter many of these terms in our app and in this documentation we would like to show you a short definition of each of them.

This is not meant to explain the concept behind these terms in detail; if you need further information, we suggest you visit our Regular Expression Help Center at http://www.esclepiusllc.com/post/2019/04/rex-t-and-regular-expressions.

Anchor: An anchor is a zero-width assertion that matches a certain feature of the source, like the beginning of a word, or the end of a line.

Assertions: An assertion tests whether a certain condition is true at the location where a pattern has matched so far. Examples of simple assertions are the anchors ^(start of source string) and \b (word boundary.) An example of a more complex assertion is the negative lookahead assertion (?!something) which asserts that something is not present at this location.

Atomic Group: Also known as Once-Only Subexpression. Text matched by an atomic group is fixed in such a way that the grouped match will not be changed when trying to accommodate the rest of the pattern by means of Backtracking. This differs from normal pattern behavior, which is to revisit already made matches to facilitate an overall pattern match. The atomic group is written as (?>pattern), where pattern is an actual pattern. This concept is similar to Possessive Quantifiers.

Backtracking: An algorithm the regular expression machine uses internally to recombine multiple match candidates until a valid full pattern match is found. Understanding backtracking is only necessary to fine-tune pattern performance or to make use of an Atomic Group. For Example, the pattern (\d*+)(\d{3}) which uses the possessive \d*, would not match the string ‘12345123’, because the first group would match as many (read: all) digits in the source - and not let go of any of them, even though the second group would "need" 3 digits to match.

Character Classes: A character class matches any one of its class members. It is constructed by placing the class members within square brackets – either literally like [abcde], or in the form of a range like [a-z]. For example, in the pattern b[ea]d, the [ea] is a character class.

Common Control Characters: Some common control characters include Cntrl-D (The end of one or more texts), Cntrl-H (A backspace), Cntrl-I (A (horizontal)tab), Cntrl-K (A (vertical)tab), Cntrl-M (A carriage return), Cntrl-X (Cancel), Cntrl-[ (Escape).

Greedy: Greediness is a property of Quantifiers. Greedy quantifiers will try to match as many characters as possible (as opposed to Lazy.) By default, the quantifiers * + are greedy. You can make them behave lazy by appending ? to them.

Groups: A part of a pattern that is surrounded by a pair of parentheses (). Groups can also be nested. Text that is matched by the grouped part of the pattern is available separately in the result. Assertions are special forms of groups. Quantifiers after a group apply to the whole group, and not just an individual character.

Lazy: Laziness is a property of quantifiers. It means that the quantifier will try to match as few characters as possible. Appending a ? to the quantifier (as in b+?) makes it lazy.

Literal: The character in the search pattern that represents itself. For example the letter c will match a ‘c’.

Lookahead assertion: An assertion that anchors the next pattern element to the beginning of the assertion’s content. It effectively works as a custom-defined anchor. An assertion like (?=abc), for example, would require that the next pattern element occur at the beginning of an abc sequence.

Lookbehind assertion: Just like it’s companion above, an assertion that anchors the next pattern element to the end of the assertion’s content. As stated above, it effectively works as a custom-defined anchor. An assertion like (?<=abc), for example, would require that the next pattern element occur at the end of an abc sequence.

Metacharacter: Any character which has a special meaning is a metacharacter. For example, the dot, . will usually match any character, except \n.

Possessive: Possessiveness is a property of quantifiers. Possessive quantifiers are Greedy (matching as many characters as possible), but in addition, text matched by a possessive quantifier is fixed in such a way that the match will not be changed when trying to accommodate the rest of the pattern – analogous to the way an Atomic Group works. A quantifier can be made possessive by appending a + .

Quantifiers: A quantifier in a search pattern specifies how many times the previous pattern element needs to be matched. Quantifiers are + (one or more), * (zero or more), ? (zero or one) and intervals (like {3,5} [between 3 and 5 times.]) A quantifier can be either Lazy, Possessive or Greedy. If not specified otherwise in the matching options, quantifiers are greedy by default, but can be switched to lazy by appending a ?.

Regular Expression: A sequence of one or more characters that form a pattern to search for in a source. A character can be an Anchor, a Literal, a Metacharacter, or one of the Quantifiers.

White Space character: Any character that represents horizontal or vertical empty space, like a tab or linebreak character, is a whitespace character. This also includes the ZERO WIDTH SPACE Unicode character (U+200B).

Word boundary: A word boundary is any location in the source text where one side of that location is a word character, and the other side is non-word character – usually a white space. A word boundary can be matched by the anchor \b.

Word Character: A word character is any character which occurs in natural-language words. It includes letters (upper- and lowercase) and digits, and also the underscore character (_). Whitespace, punctuation, the hyphen and other such characters are not considered word characters.

← Back to the index page

Was this document helpful?

Document Metadata

Viewed
Viewed by 654 users; 173 found this helpful
Created
08/01/2019 5:04:07 PM by (admin) Administrator
Keywords
help-rext-glossary, ReX-T, regular expression, glossary