I have problem with matching the html attributes (in a
various html tags) with regex. To do so, I use the
pattern:
myAttr=\"([^']*)\"
HTML
snippet:
src="1-p2.jpg" myAttr="http://example.com" class="alignleft"
/>
it selects text
from the myAttr
the end />
but I
need to select the myAttr="..."
(" href="http://example.com" rel="noreferrer">http://example.com")
Answer
You have an apostrophe
('
) inside your character class but you wanted a quote
("
).
myAttr=\"([^"]*)\"
That
said, you really href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">shouldn't
be parsing HTML with regexes. (Sorry to link to
that answer again. There are other answers to that question
that are more of the "if you know what you are doing..." variety. But it is good to be
aware of.)
Note that even if you limit your
regexing to just attributes you have a lot to
consider:
- Be careful not
to match inside of comments. - Be careful not to match
inside of CDATA sections. - What if attributes are
bracketed with single quotes instead of double
quotes? - What if attributes have no quotes at
all?
This is
why pre-built, serious parsers are generally called for.
No comments:
Post a Comment