Ok, first off, regular expressions are a great functionality of Perl, but the can also be found in other languages and environments, such as linux shells, or PHP... I'm guessing most people would be interesten in useing regular expressions in PHP...
Let's go on with a complex matching regular expresions
my $var = "This is some text here that we need to search.";
if ($var =~ m/th/) {
print "I found a th\n";
}
In the following example, regular expression will locate a 'th' in 'that', and not in 'This', because matching is case sensitive. If we wanted case insensitive matching, we would write this:my $var = "This is some text here that we need to search.";
if ($var =~ m/th/i) {
print "I found a th\n";
}
And now, match will be found at 'This'. Regular expressions always try to find a match as soon as possible, and to make it as large as possible.Summary of regular expression matching
m/search_text/ - Find search_text m/^search_text/ - Match search_text but only at the begining of the line. Operator ^ does this m/search_text$/ - Match search_text but only at the end of the line. Operator $ does this. m/^search_text$/ - Match search_text, but only if it's the entire text m/search_text/i - Match search_text, but case insensitive
Of course, if regular expressions were this easy, there wouldn't be a need for a tutorial. Regular expressions are quite powerfull, and can match anything in the text. For example, these wildcards can be used in regular expressions to find anything:
. - Match any character \w - Match words (alphanumeric characters and "_") \W - Match non-words \s - Match whitespace character \S - Match non-whitespace character \d - Match digit character \D - Match non-digit character \t - Match tab \n - Match newline \r - Match return \f - Match formfeed \a - Match alarm characted (bell, beep, and others) \e - Match escape \O45 - Octal characters match; in this case, it's 45 octal; Replace O with 0... I had to do this because PHP parses as nothing, as you can see :) \x6fa - Hexadecimal character match; in this case. it's 6FA hexAlso, combined with these wildcards, you can use repetition operators:
* - Match 0 or more times
+ - Match 1 or more times
? - Match 0 or 1 times
{n} - Match exactly n times
{n,} - Match at least n times
{n,m} - Match at least n, but not more than m times
Ok, I'll add a few examples for this so far:
$var =~ m/\+\d{1,3}\ \(\d{1,3}\)\ \d{3,4}-\d{3,4}/; # This example will match a telephone number in the following format +381 (21) 123-456 or +381 (21) 1234-567
$var =~ m/^Hello/; # This example will match "Hello, world", but not " Hello, world" or "hello, world", because search is case sensitive, and requires the line to begin with Hello
$var =~ m/galahad/i; # This line would match wherever a Galahad or galahad is found in text; search is case insensitive
Also, note how I escaped a space, a +, and brackets. I did this because they are also used by regular expressions, and escaping them makes regular expressions treat them as a common text. You escape a character with backslash (\)...Slashes also have to be escaped.Ok, we're half way there... Now we go on to character groups and character classes...
What character groups do, is allow alternative phrases to be used. In the next example, it would be a match if we had a Susan, Marie, or Jennifer in the text
$var =~ m/(Susan|Marie|Jennifer)/;Character groups also allow for retrieval of selected text, when used in selections, and placing them in scalars $1, $2, .. Buit I will cover that a bit later.
Character classes allow for character ranges. For example, this short line would match if we have names starting with A through N:
$var =~ m/^[A-N]/;Character classes consist of one character, and one character only. The following will NOT work:
$var =~ m/^[Ab-Ne]/;
As per experience of others, character classes can be a bit quirky, so avoid using them, since character groups will almost always give you what you need. And now off to:
Selections AKA Parsing
Ok, we established that regular expressions are a mighty thing, but so far, they don't do anything spectacular. I mentioned character groups a bit earlier, and mentioned they can be used to retrieve selections. And here's how, and were regular expressions excell and get very usefull.
Say we have a phone number +381 (21) 123-456. Country code is 381, and area code is 21. And let's say we need all these in separate variables. Here's what we would do:
my $phone="+381 (21) 123-456"; $phone =~ /\+(.+)\ \((.+)\)\ (.+)/; my $country = $1; # $country will contain 381 my $area = $2; # $area will contain 21 my $num = $3; # $num will contain 123-456Pretty powerfull, huh? This is probably the best thing about regular expressions..
And one more thing you can do with regular expressions is...
Substitutions:
These are quite simple to master:
my $var = "Trap17 sucks"; $var =~ s/sucks/rules/; # $var now contains "Trap17 rules"
Other things to note:
- If you want to make your search case insensitive, just add an i at the end of the regular expression, eg. m/match/i
- If you want to change all instances of a word, add an g at the end of the regular expression, eg. s/to_replace/replacer/g
- You can combine i and g, and have s/to_replace/replacer/gi, or s/blank//gi; The last one replaces all occurences of blank, with nothing ("")
- =~ means matches
= !~ means does not match
And voila, you now have sufficient knowledge to make rather powerfull regular expressions, and incorporate them in your PHP scripts, or Perl scripts, or wherever. I hope you found this tutorial usefull. Also, don't hesitate to experiment with regular expressions, because, that's the best way to learn something. And of course, don't hesitate to ask questions, if any of this was unclear...














