If you want to create filters, perform searches or set up goals in Angelfish Software, you need an understanding of regular expressions. This article is a basic introduction.
Regular expressions (also known as regex) are used to find a specific pattern. For example, you can find all pages within a subdirectory, or all keywords more than ten characters long.
Regular expressions provide a powerful and flexible way to describe what the pattern should look like, using a combination of special and alphanumeric characters. Here is a list of commonly-used regex characters in Angelfish:
^ Caret: Match from the beginning of the field
$ Dollar: Match to the end of the field
. Period: Match any single character
| Pipe: OR
* Asterisk: Match zero or more of the previous item
? Question Mark: Match zero or one of the previous item
(i.e. makes the item optional)
 Brackets: Match one item in this list
() Parentheses: Match contents of parenthesis as item
+ Plus Sign: Match one or more of the previous item
\ Backslash: Escape symbol for any of the above characters
Anchors match a specified pattern from the beginning or at the end of a field. The caret and dollar symbols are anchors.
The caret symbol "^" matches a pattern from the beginning. For example:
^car will match "car", "carpet" and "cartoon".
It won't match "scar", "red carpet" or "new cars"
The dollar symbol "$" matches a pattern to the end of the field. For example:
car$ will match "car", "scar" and "red car".
It won't match "cars", "carpet" or "cartoon"
You can also combine a caret and dollar in a single pattern:
^car$ will only match "car", not "cars" or "scar"
^$ will match only empty strings
Regex can also be used to match ranges or combinations of characters.
Brackets "" allow you to specify individual characters that appear in the string. Brackets look at each individual character, not whole words.
[agf] will match "a", "g" or "f".
[fish] will match "f" "i" "s" or "h"
You can include a long list of characters in brackets, but it's usually easier to match a range of characters. For example:
[a-z] will match any lowercase letter
[0-9] will match any number
[a-z0-9] will match any letter or number.
[a-dx-z] will match a, b, c, d, x, y, or z.
Parentheses "()" allow you to match a specific string of characters, like (cat) or (dog). To match a multiple strings, enclose them in parentheses and use a pipe "|" between each string. For example:
To match "cat" or "dog", type (cat)|(dog) OR (cat|dog)
The period "." will match any single character. For example:
car.s will match "carrs", "car?s", "car5s", etc.
With Regex, you can specify the number of times a pattern should occur.
A question mark "?" after a character will match zero or one occurrence of the character. For example:
crawl? matches "crawl" or "craw".
(www\.)?website\.com matches "www.website.com" or "website.com"
A plus sign "+" matches one or more occurrences. For example:
a+ will match "a", "aa", "aaaaaaaaaa", etc.
/+ will match "/", "//", "////////", etc.
.+ is a wildcard that will only match if the field is not empty
An asterisk "*" will match any number of occurrences (including zero). For example:
a* will match all of the above.
.* is a wildcard that will match an empty or non-empty field.
Escaping Special Characters
Occasionally you'll want to match a character that is also a regex special character. For example:
.com will match "website.com" and will also match "marcom.net"
The backslash "\" allows you to escape the value of a regex special character. Using the above example, \.com would match "website.com" and ignore "marcom.net".
If you want to match a series of special characters in a row, you need to escape each one individually.
To match "$?", you would type \$\?
And since a backslash itself is a special character, you would need to type two backslashes into regex in order to match a single literal backslash.
If you're unsure whether a character is a special character or not, you can escape it without any negative consequences.