I found this href="http://download.oracle.com/javase/tutorial/essential/regex/quant.html"
rel="noreferrer">excellent tutorial on regular expressions and while I
intuitively understand what "greedy", "reluctant" and "possessive" quantifiers do, there
seems to be a serious hole in my
understanding.
Specifically, in the following
example:
Enter your regex: .*foo
// greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I
found the text "xfooxxxxxxfoo" starting at index 0 and ending at index
13.
Enter your regex: .*?foo // reluctant
quantifier
Enter input string to search: xfooxxxxxxfoo
I
found the text "xfoo" starting at index 0 and ending at index 4.
I found the
text "xxxxxxfoo" starting at index 4 and ending at index 13.
Enter
your regex: .*+foo // possessive quantifier
Enter input string to search:
xfooxxxxxxfoo
No match
found.
The explanation
mentions eating the entire input string, letters been
consumed, matcher backing off,
rightmost occurrence of "foo" has been regurgitated,
etc.
Unfortunately, despite the nice
metaphors, I still don't understand what is eaten by whom... Do you know of another
tutorial that explains (concisely) how regular expressions engines
work?
Alternatively, if someone can explain in
somewhat different phrasing the following paragraph, that would be much
appreciated:
The
first example uses the greedy
quantifier .* to find "anything",
zero
or more times, followed by the letters
"f" "o" "o". Because
the quantifier is
greedy, the .* portion of the
expression first eats the entire input
string. At this point, the
overall
expression cannot succeed, because the
last three letters
("f" "o" "o") have
already been consumed (by
whom?). So the matcher
slowly backs off (from
right-to-left?) one letter at a time
until the rightmost
occurrence of
"foo" has been regurgitated (what does this
mean?), at which
point the match succeeds and
the
search ends.
The
second example, however, is
reluctant, so it starts by first
consuming (by whom?) "nothing". Because "foo"
doesn't appear at the beginning of the
string, it's forced to swallow
(who swallows?) the
first letter (an "x"), which
triggers
the first match at 0 and 4. Our test
harness continues
the process until
the input string is exhausted. It
finds another match at 4 and 13.
The third
example fails to find a
match because the quantifier is
possessive. In this case, the entire
input string is consumed by .*+,
(how?)
leaving nothing left over to
satisfy
the "foo" at the end of the
expression. Use a
possessive
quantifier for situations where you
want to
seize all of something without
ever backing off (what does back
off mean?); it will outperform
the equivalent greedy
quantifier in
cases where the match is not
immediately
found.
No comments:
Post a Comment