Similar question might be asked many times but I have a bit complex one.
I know when we want to parse only the text between
tag in this scenario,
My work
This is my work.
Learning regex.
we can form a Regex like this:
>([^<]*)<
But that works only because the
tag is on the top. But if the tag is the second one, it won't work.
Okay, my scenario is,
JAVA1
JAVA2
JAVA3
PHP1
PHP2
PHP3
There are many similar tags in the file, and I want to retrieve only the text between Answer I think your regex approach, while technically possible, is going to cause more trouble down the line. For example, if the source HTML changed so the To parse HTML you should use PHP's DOMDocument functions, which are more robust in the face of changing HTML code and are far more readable to whoever may be maintaining your code (including you). This method will also support looking at other element attributes more easily. The sample code below should work for your use case: If you want to learn more about building and using xpath queries, I suggest the article PHP DOM: Using XPath over at SitePoint.com.
tags. and
And, I've used '#
, which is working fine. But it is also including all other (.*) #' tags in the output, which I don't want.
I want only the texts Java1
and PHP1
and I guess if I could able to retrieve the text between the tags by excluding the tags, I may acieve it.
Am I correct? or Wrong? If so, how to achieve what I want?
Thanks in advance!!
headers
attribute appeared before the class
attribute the regex would fail. Also, your code will become pretty unreadable very quickly if you're using regex to search through HTML source code.$doc = '
JAVA1 JAVA2 JAVA3 PHP1 PHP2 PHP3 ';
$dom = new DOMDocument();
$dom->loadHTML($doc);
$xpath = new DOMXpath($dom);
$tds = $xpath->query("//td[@class='td1']");
// the query could also be "//td[@headers='searchth1']" or even
// "//td[@headers='searchth1'][@class='td1']" depending on what you want to target
foreach($tds as $td){
var_dump($td->nodeValue);
}
No comments:
Post a Comment