Loading...


bookmark - Perl Regular Expresions Regex Learn how to use regular expressions

Perl Regular Expresions Regex - Learn how to use regular expressions

 
 Discussion by Galahad with 2 Replies.
 Last Update: July 18, 2008, 6:53 pm
 
bookmark - Perl Regular Expresions Regex Learn how to use regular expressions  
Quickly Post to Perl Regular Expresions Regex Learn how to use regular expressions w/o signup Share Info about Perl Regular Expresions Regex Learn how to use regular expressions using Facebook, Twitter etc. email your friend about Perl Regular Expresions Regex Learn how to use regular expressions Print
Reply / Comment New Discussion / Topic Share / Bookmark E-Mail a Friend Print

I've searched the Tutorials section, but haven't found a RegEx tutorial, so I thought I'd add one, since it's very usefull to know regular expressions if you're a programmer... If I overlooked a tutorial explaining RegEx, my bad, just erase this tutorial...

Ok, first off, regular expressions are a great functionality of Perl, but the can also be found in other languages and environments, such as linux shells, or PHP... I'm guessing most people would be interesten in useing regular expressions in PHP...

Let's go on with a complex matching regular expresions

CODE

my $var = "This is some text here that we need to search.";
if ($var =~ m/th/) {
print "I found a th\n";
}

In the following example, regular expression will locate a 'th' in 'that', and not in 'This', because matching is case sensitive. If we wanted case insensitive matching, we would write this:

CODE

my $var = "This is some text here that we need to search.";
if ($var =~ m/th/i) {
print "I found a th\n";
}

And now, match will be found at 'This'. Regular expressions always try to find a match as soon as possible, and to make it as large as possible.

Summary of regular expression matching

CODE

m/search_text/ - Find search_text
m/^search_text/ - Match search_text but only at the begining of the line. Operator ^ does this
m/search_text$/ - Match search_text but only at the end of the line. Operator $ does this.
m/^search_text$/ - Match search_text, but only if it's the entire text
m/search_text/i - Match search_text, but case insensitive


Of course, if regular expressions were this easy, there wouldn't be a need for a tutorial. Regular expressions are quite powerfull, and can match anything in the text. For example, these wildcards can be used in regular expressions to find anything:

CODE

. - Match any character
\w - Match words (alphanumeric characters and "_")
\W - Match non-words
\s - Match whitespace character
\S - Match non-whitespace character
\d - Match digit character
\D - Match non-digit character
\t - Match tab
\n - Match newline
\r - Match return
\f - Match formfeed
\a - Match alarm characted (bell, beep, and others)
\e - Match escape
\O45 - Octal characters match; in this case, it's 45 octal; Replace O with 0... I had to do this because PHP parses as nothing, as you can see :)
\x6fa - Hexadecimal character match; in this case. it's 6FA hex

Also, combined with these wildcards, you can use repetition operators:

CODE

* - Match 0 or more times
+ - Match 1 or more times
? - Match 0 or 1 times
{n} - Match exactly n times
{n,} - Match at least n times
{n,m} - Match at least n, but not more than m times


Ok, I'll add a few examples for this so far:

CODE

$var =~ m/\+\d{1,3}\ \(\d{1,3}\)\ \d{3,4}-\d{3,4}/; # This example will match a telephone number in the following format +381 (21) 123-456 or +381 (21) 1234-567
$var =~ m/^Hello/; # This example will match "Hello, world", but not " Hello, world" or "hello, world", because search is case sensitive, and requires the line to begin with Hello
$var =~ m/galahad/i; # This line would match wherever a Galahad or galahad is found in text; search is case insensitive

Also, note how I escaped a space, a +, and brackets. I did this because they are also used by regular expressions, and escaping them makes regular expressions treat them as a common text. You escape a character with backslash (\)...Slashes also have to be escaped.

Ok, we're half way there... Now we go on to character groups and character classes...
What character groups do, is allow alternative phrases to be used. In the next example, it would be a match if we had a Susan, Marie, or Jennifer in the text

CODE

$var =~ m/(Susan|Marie|Jennifer)/;

Character groups also allow for retrieval of selected text, when used in selections, and placing them in scalars $1, $2, .. Buit I will cover that a bit later.

Character classes allow for character ranges. For example, this short line would match if we have names starting with A through N:

CODE

$var =~ m/^[A-N]/;

Character classes consist of one character, and one character only. The following will NOT work:

CODE

$var =~ m/^[Ab-Ne]/;


As per experience of others, character classes can be a bit quirky, so avoid using them, since character groups will almost always give you what you need. And now off to:

Selections AKA Parsing

Ok, we established that regular expressions are a mighty thing, but so far, they don't do anything spectacular. I mentioned character groups a bit earlier, and mentioned they can be used to retrieve selections. And here's how, and were regular expressions excell and get very usefull.

Say we have a phone number +381 (21) 123-456. Country code is 381, and area code is 21. And let's say we need all these in separate variables. Here's what we would do:

CODE

my $phone="+381 (21) 123-456";
$phone =~ /\+(.+)\ \((.+)\)\ (.+)/;
my $country = $1; # $country will contain 381
my $area = $2; # $area will contain 21
my $num = $3; # $num will contain 123-456

Pretty powerfull, huh? This is probably the best thing about regular expressions..

And one more thing you can do with regular expressions is...

Substitutions:

These are quite simple to master:

CODE

my $var = "Trap17 sucks";
$var =~ s/sucks/rules/; # $var now contains "Trap17 rules"


Other things to note:
- If you want to make your search case insensitive, just add an i at the end of the regular expression, eg. m/match/i
- If you want to change all instances of a word, add an g at the end of the regular expression, eg. s/to_replace/replacer/g
- You can combine i and g, and have s/to_replace/replacer/gi, or s/blank//gi; The last one replaces all occurences of blank, with nothing ("")
- =~ means matches
= !~ means does not match

And voila, you now have sufficient knowledge to make rather powerfull regular expressions, and incorporate them in your PHP scripts, or Perl scripts, or wherever. I hope you found this tutorial usefull. Also, don't hesitate to experiment with regular expressions, because, that's the best way to learn something. And of course, don't hesitate to ask questions, if any of this was unclear...

   Sun Jul 13, 2008    Reply         

awesome tutorial! I love regex!
Best thing to use when u fetch remote content :)

But in perl it is a little harder than php, in my opinion, but this tutorial
makes it so clear and easy!

Thanks!

   Mon Jul 14, 2008    Reply         

You're very welcome...

Perl can be scary by itself, I know I was frustrated with how it works... It's completely different than conventional programming languages, but then, it's the same... If you catch my drift... An regular expressions can be particularly scary and frustrating... I still haven't got the full hang of it, but every day I get to know Perl a little better... It's aprticularly usefull to know Perl because here at Trap17 we have full cgi support, so we can make Perl scripts that do complex tasks... I plan on series of tutorials for Perl, from how to make a CGI script, connect to a MySQL database, to some other stuff... Just now, I'm working on a primitive mail junk filter... And thanks to RegEx, it's a breeze... It's not a smart filter, it doesn't have the ability to learn, but with a few good rules, whitelists and blacklists, it get's the job done much better and quicker than, for example, SpamAssasin... I suppose I could put it here, and make a tutorial out of it... There's an idea :) I always like to help beginners get a hang of something new, and help them avoid stuff that made me cry with frustration and anger :D

   Fri Jul 18, 2008    Reply         


Quickly Post to Perl Regular Expresions Regex Learn how to use regular expressions w/o signup Share Info about Perl Regular Expresions Regex Learn how to use regular expressions using Facebook, Twitter etc. email your friend about Perl Regular Expresions Regex Learn how to use regular expressions Print
Reply / Comment New Discussion / Topic Share / Bookmark E-Mail a Friend Print

Similar Topics:

Matching Accents In Perl Regular Ex...

Does anyone knows how can i match accents in perl regular expresions. I do not manage to match not english characaters such as á, é or ñ. I hope there is someone taht knows the answer because i have been searching in google a long time without success. Thanks in advanced ...more

   07-Sep-2005    Reply         

Perl For Automated Web Form Search

Hi all, I'd like to write a script to automate a search in order to collect data from an online database. The database is an archive of newspaper articles. The search is for certain words/text. It is searchable via a form only. The rub is that only a small portion (a month) at a tim ...more

   30-Jul-2008    Reply         

Regular Expression For A Particular...

Hi folks, me again with yet another JavaScript-y problem. Firstly, I have next to no experience with regular expressions, so apologies for any slow-wittedness on my part with this particular topic. I'm trying to use regular expressions (in JavaScript, if it's at all relevant) to ...more

   06-Nov-2009    Reply         

Adding Flash Music Player To Home/any Page How-to (put any song you want!!!)   Adding Flash Music Player To Home/any Page How-to (put any song you want!!!) (6) (23) Surf Internet Without Opening It. Useful For School How-to!  Surf Internet Without Opening It. Useful For School How-to!