Regular Expression teach class

In PHP we can create regular expressions of two types:

  • Posix
  • PERL.

In here since we will use PERL preg_match() function. If we would like to use Posix syntax we would use eregi() function.

Comparing the same expression

PERL syntax:

#^\/articles\/([^.\/ ]+)[\/]*$#

Posix syntax:

^\/articles\/([^.\/ ]+)[\/]*$

Special characters:

char meaning  
^ beginning of string  
$ end of string  
. any character except newline  
* match 0 or more times  
+ match 1 or more times  
? match 0 or 1 times; or: shortest match  
| alternative  
( ) grouping; “storing”  
[ ] set of characters  
{ } repetition modifier  
\ quote or special  
\t tab  
\n newline  
\r return (CR)  
\xhh character with hex. code hh  
\b “word” boundary  
\B not a “word” boundary  
\w matches any single character classified as a “word” character (alphanumeric or “_”)  
\W matches any non-“word” character  
\s matches any whitespace character (space, tab, newline)  
\S matches any non-whitespace character  
\d matches any digit character, equiv. to [0-9]  
\D matches any non-digit character  
a* zero or more a’s  
a+ one or more a’s  
a? zero or one a’s (i.e., optional a)  
a{m} exactly m a’s  
a{m,} at least m a’s  
a{m,n} at least m but at most n a’s repetition? Same as repetition but the shortest match is taken  
[characters] matches any of the characters in the sequence  
[x-y] matches any of the characters from x to y (inclusively) in the ASCII code  
[-] matches the hyphen character “-“  
[\n] matches the newline; other single character denotations with \ apply normally, too  
[^something] matches any character except those that [something] denotes; that is, immediately after the leading “[”, the circumflex “^” means “not” applied to all of the rest

Example understanding regex: #^\/articles\/([^.\/ ]+)[\/]*$#

^ and $ = respectively the beginning and the end of the pattern that we match.

\ = the escape character (where \/ means actually / character.

[\/]*$ = We may have / character at the end but also we may not.

If it would read [\/]+$ this would mean we must have one or more / characters at the end.

This is because: * It will match the preceding pattern zero or more times. + It will match the preceding pattern one or more times.

There is also: ? It will match the preceding pattern zero or one time.

At the very beginning we should have /articles/ text: ^\/articles\/

Everything inside () brackets is a match. [^.\/ ]+ = any character but not / and not “ “ (white space).

To recap meta-characters:

char meaning  
. any character  
* zero of more of the preceding  
+ one or more of the preceding  
{} minimum to maximum quantifier  
? ungreedy modifier  
! at start of string means “negative pattern”  
^ start of string, or “negative” if at the start of a range  
$ end of string  
[] match any of contents  
- range if used between square brackets  
() group, referenced group  
    alternative, or
\ the escape character itself  

tags: & category: -