Regular Expression teach class
In PHP we can create regular expressions of two types:
- Posix
- PERL.
In here since we will use PERL preg_match()
function.
If we would like to use Posix syntax we would use eregi()
function.
Comparing the same expression
PERL syntax:
#^\/articles\/([^.\/ ]+)[\/]*$#
Posix syntax:
^\/articles\/([^.\/ ]+)[\/]*$
Special characters:
char | meaning | |
---|---|---|
^ | beginning of string | |
$ | end of string | |
. | any character except newline | |
* | match 0 or more times | |
+ | match 1 or more times | |
? | match 0 or 1 times; or: shortest match | |
| | alternative | |
( ) | grouping; “storing” | |
[ ] | set of characters | |
{ } | repetition modifier | |
\ | quote or special | |
\t | tab | |
\n | newline | |
\r | return (CR) | |
\xhh | character with hex. code hh | |
\b | “word” boundary | |
\B | not a “word” boundary | |
\w | matches any single character classified as a “word” character (alphanumeric or “_”) | |
\W | matches any non-“word” character | |
\s | matches any whitespace character (space, tab, newline) | |
\S | matches any non-whitespace character | |
\d | matches any digit character, equiv. to [0-9] | |
\D | matches any non-digit character | |
a* | zero or more a’s | |
a+ | one or more a’s | |
a? | zero or one a’s (i.e., optional a) | |
a{m} | exactly m a’s | |
a{m,} | at least m a’s | |
a{m,n} | at least m but at most n a’s repetition? Same as repetition but the shortest match is taken | |
[characters] | matches any of the characters in the sequence | |
[x-y] | matches any of the characters from x to y (inclusively) in the ASCII code | |
[-] | matches the hyphen character “-“ | |
[\n] | matches the newline; other single character denotations with \ apply normally, too | |
[^something] | matches any character except those that [something] | denotes; that is, immediately after the leading “[”, the circumflex “^” means “not” applied to all of the rest |
Example understanding regex:
#^\/articles\/([^.\/ ]+)[\/]*$#
^
and $
= respectively the beginning and the end of the pattern that we match.
\
= the escape character (where \/
means actually /
character.
[\/]*$
= We may have /
character at the end but also we may not.
If it would read [\/]+$
this would mean we must have one or more /
characters at the end.
This is because:
*
It will match the preceding pattern zero or more times.
+
It will match the preceding pattern one or more times.
There is also:
?
It will match the preceding pattern zero or one time.
At the very beginning we should have /articles/
text:
^\/articles\/
Everything inside ()
brackets is a match.
[^.\/ ]+
= any character but not /
and not “ “ (white space).
To recap meta-characters:
char | meaning | |
---|---|---|
. | any character | |
* | zero of more of the preceding | |
+ | one or more of the preceding | |
{} | minimum to maximum quantifier | |
? | ungreedy modifier | |
! | at start of string means “negative pattern” | |
^ | start of string, or “negative” if at the start of a range | |
$ | end of string | |
[] | match any of contents | |
- | range if used between square brackets | |
() | group, referenced group | |
alternative, or | ||
\ | the escape character itself |
…
tags: & category: -