A nice tutorial can be found here
http://www.regular-expressions.info/tutorialcnt.html
Short rundown
Character Classes/Sets :
- [ae] matches a or e
- [a-z] matches any character in the range a..z
- [0-9] matches any digit in the range 0..9
- [^u] matches all chars except the character u
- \w stands for [A-Za-z0-9_]
- \s stands for [ \t\r\n]
- \d stands for [0-9]
- \W stands for [^\w]
- \D stands for [^\d]
- The only metacharacters that don't need escaping in a class are ^,-,],\
The Dot:
- Can be used to replace any character ( except \n by default - but this can be disabled with the SingleLine option )
- USE NEGATED CHARACTER SETS INSTEAD OF THE DOT WHENEVER POSSIBLE
Anchors:
- ^ matches start of string
- $ matches end of string
- These 2 ignore the new line characters unless the MultiLine option is turned on
Word Boundaries:
- \bword\b matches word in "I went on wordpress to write a word"
Alternation:
- cat|dog matches cat or dog depending on which one is encountered first
Optional Items:
- colou?r matches both colour and color
- Feb 23(rd)? to the string Today is Feb 23rd, 2003, the match will always be Feb 23rd and not Feb 23. You can make the question mark lazy (i.e. turn off the greediness) by putting a second question mark after the first
Repetition:
- u+ matches one or more u
- u* matches zero or more u
- \b[1-9][0-9]{3}\b matches numbers between 1000 and 9999
- \b[1-9][0-9]{2,4}\b matches numbers between 100 and 99999
- These are greedy operators so you must use ? to make them lazy
Grouping and Backreference:
- ([a-c])x\1x\1 will match axaxa, bxbxb and cxcxc but won't match axbxc
more to be added when I finish reading :)
Abonați-vă la:
Postare comentarii (Atom)
Niciun comentariu:
Trimiteți un comentariu