Regular expressions (“regex’s” for short) are sets of symbols and syntactic elements used to match patterns of text and they are pretty powerful. Regular expressions have been around for a very long time (in computer industry scale) and was first introduced as part of the powerful UNIX search tool grep.

The regex syntax used commonly today is compliant with Extended Regular Expressions (EREs) defined in IEEE POSIX 1003.2 (Section 2.8). EREs are supported by Apache, PHP4+, Javascript 1.3+, MS Visual Studio, MS Frontpage, most visual editors, vi, emac, the GNU family of tools (including grep, awk and sed) as well as many others. Extended Regular Expressions (EREs) will support Basic Regular Expressions (BREs are essentially a subset of EREs). The BRE syntax is considered obsolete and is only still around to preserve backward compatibility.

I believe mastering at least the most basic elements of regex is essential for any programmer. Further I know that having direct access to references, examples, ready to use patterns etc. is essential to speed up your work.

This is a toolbox for getting started and/or becoming more serious about regex. It provides details on commonly needed regexs that you can just pick up and use right away. Lets get started![exec]$filestr = file_get_contents(‘http://www.tripwiremagazine.com/googleadsensebelowmoretag.inc’);
echo $filestr;[/exec]

Regex – Getting started

Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. Regular expressions are unmatched when it comes to crunching text. The formal definition of regular expressions is purposely very compact and avoids redundancy by using Quantifiers. fx. * = 0 or more and + = 1 or more. This means that you can use a+ to match “a”, “aa”, “aaaaaaa” and so on. If you want to match a or b or c you can use the syntax [abc] where the [ ] means that one of the characters between the brackets will be matched. Applying [abc] to tripwiremagazine.com will match the two ‘a’s and the ‘c’ but as separate results. Using [a-z] will match all letters except the dot but still as separate results ‘t’, ‘r’, ‘i’….. Adding a + like this [a-z]+will match all letters as one result until the dot is reached. Results will be tripwiremagazine and com. Any character used in a regex will match itself in the text input except for the following literals: .|*?+(){}[]^$\. These characters will match themselves when preceded by a \.

Regular Expressions rely on an regex engine and as usual in the software world, different regular expression engines are not fully compatible with each other. (Find more details on regex engines here: Comparison of Regular Expression Engines) Having said that most regular expressions are portable of you try not to use custom extensions found in some engines. If porting a regex you should always test it carefully before taking it into serious use. This is also a ground rule if you copy an expression from the Internet. Even though it looks good at first look don’t trust it to do exactly what you need. Test is important and very simple with one of the tools you will find below.

So where are Regular Expressions useful? Common uses of are:

  • Input validation, like validating an e-mail address or an url in a html form before it is submitted.
  • url matching in web servers, fx Apache .htaccess.
  • Syntax highlighting in editors
  • Grabbing HTML Tags
  • Trimming Whitespace
  • Matching Valid Dates
  • Pulling out code for the first image in a html page, used on some blogs to add an image to archive lists
  • Track down a killer, joke

Regular expressions is really about knowing what syntax to use and there are many details to memorise. I would recommend that you get yourself a cheat sheet. In my opinion a good cheat sheet is essentials and I personally recommend the one Added Bytes provide for free.

I also recommend that you take a look at these references if you’re new to regex or just need more details and examples on specific uses of regex. Regular Expression Reference, Regular Expressions – User guide and Regular Expression Tutorial. And if you’re serious about this and want to put in some more time check the articles, tutorials and screen-casts below.

Besides having a good cheat sheet and introductions and reference charts it is a really good idea to just try it out and get a feeling of how it works. You need some text to apply you regex on – could be an email address or whatever you like. If you’re ready to see regex in action before going on just follow the link here. It will direct you to a very useful flash based tool showing in “real time” what a specific regex match on a piece of text.

Try copying in this rather simplified e-mail matching regex: ([w-.]+)@((?:[w]+.)+)([a-zA-Z]{2,4})

You could but should not use this script in your production code but use the one provided later in the article. This one is still good as an example on matching complex patterns. Let us pull it apart to see what it does and consists of. There are 3 groups matching different parts of an e-mail.

Worth knowing regular expressions

Matching a username

When building a user management system you typically want to limit usernames to a restricted format.

Without regular expressions, this would be a tedious exercise that would involve splitting the string into it’s component characters and examining each one individually. With regular expressions, it’s as simple as it gets. First, let’s define what is allowed:

  1. Alphanumeric characters (letters and numbers)
  2. The underscore character (_)
  3. Enforce a 3 character minimum and a 14 character maximum length.

Here’s the regular expression matching our criteria. It should be fairly simple to adapt it to your own specific needs.

/[a-zA-Z0-9_]{3,14}/

Matching e-mail addresses

Regex for matching an e-mail address is one of the most discussed regex subjects I think. It is not simple if at all possible to build a regex that ensures 100% of e-mail addresses to be matched. Still it is possible to get very close to 100% and there are many opinions on how to do it. A really good resource for understanding the complexity of e-mail address matching can be found here. This is where the following regex is found:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Regular Expression Matching a Valid Date

Date formats can be hard to get under control. There are so many ways a date can be formatted and what to be used tends to depend on personal preference and where you are from. Regex can of cause be used to validate dates but it is not a magic bullet. Any date format requires its own unique regwx. Here is a few examples that you can use and extend to fit your needs. You should be aware that this is not a calendar component and exception leap years will not be handled.

For yyyy-mm-dd or yyyy/mm/dd use:

(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])

For mm-dd-yyyy or mm/dd/yyyy use:

(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d

For dd-mm-yyyy or dd/mm/yyyy use:

(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\d\d

More details on matching date with regex can be found here.

Matching Time

Just as for dates time formats are not always the same but it is not as hard to match a valid time as it is to match a valid date. This regex is a bit open ended as it matches different time formats. If you need to limit input fx. in a form to only match 24 h format this regex is not a match.

^((0?[1-9]|1[012])(:[0-5]\d){0,2}(\ [AP]M))$|^([01]\d|2[0-3])(:[0-5]\d){0,2}$

Matching an IP Address (by Sean Schricker)

Checking for a valid IP-Address is useful but can be done in many ways. Here is a very compact one.

\b(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}([01]?\d?\d|2[0-4]\d|25[0-5])\b

Matching Whole Lines of Text

When matching a set of characters on a line only the matched pattern is returned. If you’re looking for the existence of xyz all instances of xyz will be returned (if global flag is on / otherwise it is the first instance). If you want the whole line where a match has been found returned you can do that really simple.By using the wilcard “.” and the * before and after the pattern to be matched. ^ and $ match the beginning and the end of a string.If you want to dig even deeper into this subject you can find more details here.

Matching Numeric Ranges with a Regular Expression

Because regular expressions deal with text and don’t handle numbers systems, matching numbers in a specific range isn’t as simple as you may think. The regex [0-255] wont match a number between 0 and 255. A regular expression engine treats “0” as a single character, and “255” as three characters. To match all characters from 0 to 255, you will need a regex that matches between one and three characters. Matching 0-9 is simple and can be achieved with [0-9]. The same is true or 10-99 being matched by the regex [1-9][0-9]. If you continue this way by defining the regex for the number ranges you will soon get there.

This is the regex that does the job:

[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]

You may want to use word boundaries “\b…\b” or anchors “^…$” to defines the context of the search. This fx. could be used to avoid 123 and 234 to be matched in 123456.

Matching a phone number (by James Burton)

Validating phone numbers is often required. It is not possible to know if a phone number that has a valid format really is the right number. But getting rid of completely wrong inputs or helping users that made a typo is is the aim here. This regex support international country codes.

^(\(?\+?[0-9]*\)?)?[0-9_\- \(\)]*$

Find html page title

This regex will find the text within the <title> and </title> tags of a html page.

<title>(.*)</title>

Matching an XHTML/XML tag

Matching an XML or XHTML tag can be extremely useful if you’re scraping a website for data, or trying to quickly extract information from an XML document. This regex can do just that if you replace the “tag” with the tag you’re searching for fx. input or form.

<tag[^>]*>(.*?)</tag>

Matching an XHTML/XML tag with a certain attribute value

Extending the previous regex a little lets you match only tags with a specific attribute value. This is fx. great to pull out all instances of a <div> with a specific css class. Replance tag, attribute and value as illustrated in the example below.

<tag[^>]*attribute\s*=\s*(['\"])value\1[^>]*>(.*?)</tag>

Get more ready to use regular expressions:

Regular Expressions That Are Often Needed in Practice
Dozens of useful regex-patterns that are often used in programming of web-applications.

RegExLib.com
The Internet’s first regular expression library. Complete with 2,511 expressions from over 1,500 contributors. You can search and find nearly any pattern matching snippet that you might need for a web project.

Tutorials and screen-casts

Screen-casts

Regular Expressions for Dummies
An introductory screencast with a quiz at the end to see what you’ve learned.

Regex for Dummies: Day 2
Build off of the first ThemeForest screencast by learning about matching.

A Crash-Course In Regular Expressions
An introductory crash-course by Jeffrey Way. A little bit outdated, but still useful tutorial that shows how to use regular expressions to check if an e-mail is valid or not. “To a novice web developer, regular expressions look like the most scary thing on the planet. Who could possibly dismantle such a block of code and decipher its meaning? Luckily, its bark is much worse than its bite. You’ll quickly find that regular expressions are rather straight-forward and easy to understand – once you learn the syntax.”

Regular expressions (the series)
A 5-part series on the basics of regular expressions.

Articles

MSDN’s Introduction to Regular Expressions (Scripting)
These sections introduce the concept of regular expressions and explain how to create and use them.

Demystifying Regular Expressions
In this article a simple usage of regular expressions is described. Its intention is to bring users to try the most powerful search and replace paradigm available and hopefully start using it.

Regular Expression Quickstart
A primer for grasping some of the basics of regex, pieced together in an easy-to-read format.

Using Regular Expressions with PHP
A brief overview of how to use regex syntax with PHP.

PHP Freaks: Regular Expressions
A detailed introduction to the basics of regular expressions; the article also describes regex concepts such as metacharacters, greediness, lazy match, pattern modifiers and others.

Introductory Guide to Regular Expressions
A quick guide to the basics of spotting patterns in regex, complete with a simple example of a javascript regular expression with forms.

The Joy of Regular Expressions [1]
This Sitepoint tutorial uses simple examples that don’t include incoherent demo strings like “aabbcc” to show how regex really works. The article covers all of the core concepts like exact matching, positive matching, pattern modifiers and more.

The Joy or Regular Expressions [2]
This second regex tutorial by Sitepoint provides plenty of useful examples like how to find images with .jpg extensions, and even finding xss security holes in your code with regex.

Demystifying Regular Expressions
Regular expressions on the surface appear pretty complex. Not only does the language look rather odd, but it also requires logic beyond just following protocols. This article helps to take away some of the stigma some might have with regex in an easy-to-follow guide with examples.

PHP Regular Expression Examples
Many different code examples for possible uses of regular expressions with PHP. A few that might be helpful: processing credit cards, dates, email addresses, and many more.

Know Your Regular Expressions
IBM has an excellent write-up on how to use regular expressions across UNIX applications.

RegexTools

Regexbuddy

Regexbuddy is a commercial product but if you’re new to regex it may be worth the spend. This software is build in a way that I think makes it great for learning to use regex. Take a look at this demo and see how an e-mail regex is build from scratch and how the tools helps out.

RegExr

Great online flash based regex test tool. I really like it because it works on the entire test text and updates the result in “real time”. There is an Adobe Air version as well and it is quite hot!

Flex 3 Regualr Expression Explorer

This tool provides with popular regular expressions submitted by the community and also lets you try out a regular expression on a test input.

Regular Expression Tester Firefox Plugin

This Firefox plugin offers developers functions for testing their regular expressions. The tool includes options like case sensitive, global and multi line search, color highlighting of found expressions and of special characters, a replacement function incl. backreferences, auto-closing of brackets, testing while writing and saving and managing of expressions.

The Regex Coach

A cross-platform downloadable tool that teaches you about regular expressions in an interactive environment, all from your desktop.

Rubular

An online regular expression tester for the Ruby language.

Notepad++

Notepad++ is much more than just replacement for Notepad. It has a lot of features, such as Syntax Highlighting, Syntax Folding, Auto Completion, Multi Documents tab view, Full Drag and Drop supported, Zoom in and out, Bookmark, Macro Recoding and powerful search feature. Users can have their own custom defined syntax highlighting, and this is among my most favorite feature of Notepad++. It supports search and replace with regular expressions.

Expreso

Expresso is a free award winning regular expression development tool. You can build complex regular expressions by selecting components from a palette and test expressions against real or sample input data. The tool can generate Visual Basic, C#, or C++ code and displays all matches in a tree structure, showing captured groups, and all captures within a group. You can also maintain and expand a library of frequently used regular expressions and use a builder and an analyzer to create and test your expressions. Registration is required.

Pin It on Pinterest

Shares
Share This

Share This

Share this post with your friends!

Shares
WordPress Appliance - Powered by TurnKey Linux