Monday, 22 April 2019

PHP Parse HTML code











How can I parse HTML code held in a PHP variable if it something like:



T1

Lorem ipsum.

T2

The quick red fox...

T3

... jumps over the lazy brown FROG!



I want to only get the text that's between the headings and I understand that it's not a good idea to use Regular Expressions.


Answer



Use PHP Document Object Model:



   $str = '

T1

Lorem ipsum.

T2

The quick red fox...

T3

... jumps over the lazy brown FROG';
$DOM = new DOMDocument;
$DOM->loadHTML($str);

//get all H1

$items = $DOM->getElementsByTagName('h1');

//display all H1 text
for ($i = 0; $i < $items->length; $i++)
echo $items->item($i)->nodeValue . "
";
?>


This outputs as:




 T1
T2
T3





[EDIT]: After OP Clarification:



If you want the content like Lorem ipsum. etc, you can directly use this regex:




   $str = '

T1

Lorem ipsum.

T2

The quick red fox...

T3

... jumps over the lazy brown FROG';
echo preg_replace("#.*?#", "", $str);
?>


this outputs:





Lorem ipsum.The quick red fox...... jumps over the lazy brown FROG



No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...