php - DOM parser that allows HTML5-style

Wednesday, 5 December 2018

php - DOM parser that allows HTML5-style

Update: html5lib (bottom of question) seems to get close, I just need to improve my understanding of how it's used.

I am attempting to find an HTML5-compatible DOM parser for PHP 5.3. In particular, I need to access the following HTML-like CDATA within a script tag:

Most parsers will end parsing prematurely because HTML 4.01 ends script tag parsing when it finds ETAGO () inside a . All of the parsers I have tried so far have either failed, or they are so poorly documented that I haven't figured out if they work or not.



My requirements:



Real parser, not regex hacks.

Ability to load full pages or HTML fragments.

Ability to pull script contents back out, selecting by the tag's id attribute.




Input:





Example of failing output (no closing ):






Some parsers and their results:






Source:




header('Content-type: text/plain');
$d = new DOMDocument;
$d->loadHTML('');
echo $d->saveHTML();


Output:


Warning: DOMDocument::loadHTML(): Unexpected end tag : td in Entity, line: 1 in /home/adam/public_html/2010/10/26/dom.php on line 5









Source:




header('Content-type: text/plain');
require_once 'FluentDOM/src/FluentDOM.php';
$html = "";
echo FluentDOM($html, 'text/html');


Output:











Source:




header('Content-type: text/plain');

require_once 'phpQuery.php';

phpQuery::newDocumentHTML(<<
EOF

);


echo (string)pq('#foo');


Output:










Possibly promising. Can I get at the contents of the script#foo tag?


Source:




header('Content-type: text/plain');

include 'HTML5/Parser.php';

$html = "";
$d = HTML5_Parser::parse($html);

echo $d->saveHTML();



Output:







-

December 05, 2018











Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest




No comments:







Post a Comment




Newer Post


Older Post

Home




Subscribe to:
Post Comments (Atom)



php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...






regex - Splitting string and removing whitespace Python
I would like to split a String by comma ','  and remove whitespace from the beginning and end of each split. For example, if I have ...





How to solve the java.lang.ArrayIndexOutOfBoundsException: 1 >= 0 error
in my java program?
I got an error in my Java program. I think this happens because of the constructor is not intialized properly. My Base class Program public ...





database - Android Studio: show username in textview with SQLite
I have an app which needs a login and a registration with SQLite. I have the database and a user can login and register. But i would like th...












Search This Blog




















Blog Archive








        ► 
      



2020

(79)





        ► 
      



January 2020

(79)









        ► 
      



2019

(5283)





        ► 
      



December 2019

(475)







        ► 
      



November 2019

(449)







        ► 
      



October 2019

(447)







        ► 
      



September 2019

(466)







        ► 
      



August 2019

(486)







        ► 
      



July 2019

(423)







        ► 
      



June 2019

(418)







        ► 
      



May 2019

(439)







        ► 
      



April 2019

(431)







        ► 
      



March 2019

(433)







        ► 
      



February 2019

(394)







        ► 
      



January 2019

(422)









        ▼ 
      



2018

(3641)





        ▼ 
      



December 2018

(463)

How do I send a file as an email attachment using ...
php - Use of undefined constant j - assumed 'j'
php - Regex match the double quote in BBCode attri...
What regex will match every character except comma...
javascript closure - how come I refer to an undecl...
php - One MySQL Database & Multiple Table
excel - Parse CSV, ignoring commas inside string l...
html - Which "href" value should I use for JavaScr...
mad men - How accurate are the smoking and drinkin...
mod rewrite - apache mod_rewrite is not working or...
excel - Copy and paste to another sheet first empt...
Is there a better way to do optional function para...
ios - XCode 6 Playground Measuring Code Performance
PHP Parse error: syntax error, unexpected T_CONSTA...
matlab - How to add text in a box with a leader to...
c - How do I apply a structure offset?
realism - Why didn't the bomb cause a tsunami in D...
php - Postfix screwing up email headers!
Equivalent of C extern declaration in JavaScript
Jquery success function is not called after execut...
javascript - How do I write the result from res.js...
shell - Difference between single and double quote...
html - Android monospace space ( ) width is differ...
arrays - Accessing number object in php
python - How to fix "Attempted relative import in ...
What are the differences between a pointer variabl...
c# - (De-)Serialize Known Types similar to Microsoft
First Steps in Learning C++
c++ - Reading a text document character by character
css3 - What does the "~" (tilde/squiggle/twiddle) ...
html - How do I vertically center text?
How to interact from python script to .C or .O file?
php - Setting Up ChromePhp For Wordpress Using Xampp
php - Sending Boolean with FormData Javascript - V...
Why can I type alias functions and use them withou...
When to use LinkedList over ArrayList in Java?
html - CSS Sibling Selector w/ Hover
php - Parse error: syntax error, unexpected 'text'...
c++ - Reading from text file until EOF repeats las...
realism - Could a fire hose really support the wei...
mysql - How to alter a column and change the defau...
c# - Collection was modified; enumeration operatio...
design patterns - What is dependency injection?
Pre & post increment operator behavior in C, C++, ...
exception - Why should I not wrap every block in "...
How do I send a file as an email attachment using ...
java - Both next() and nextLine() not helping to s...
java - How to measure time elapsed, immune to syst...
python - Saving utf-8 texts in json.dumps as UTF8,...
Regex Group in Perl: how to capture elements into ...
How to sort array by date In JavaScript?
android.os.NetworkOnMainThreadException sending an...
java - Android - Output List of events from Google...
javascript - How to make code wait while calling a...
x86 - What does multicore assembly language look l...
c - Improve INSERT-per-second performance of SQLite?
jquery - Split last word value in JavaScript string
Using try vs if in python
c++ - How to emulate EBO when using raw storage?
c# - How do I generate a random int number?
php - Magic quotes isn't off (Strange problam!)
java - Why is processing a sorted array faster tha...
java - Can an abstract class have a constructor?
Java String Scanner input does not wait for info, ...
PHP HTML e-mail
Hot Linked Questions
javascript - Get localStorage value into php
php - Webpage encoding utf-8
javascript - Passing HTML5 Local Storage Value to ...
css selectors - Select all child elements recursiv...
excel - VBA code for protecing range of cells (use...
performance - What will be faster, >= or >?
How to check if a String contains another String i...
css - Target div only if specific child tag does n...
c - Why does mulss take only 3 cycles on Haswell, ...
html - Make body have 100% of the browser height
PHP & MySQLi Fatal error: Call to a member functio...
javascript - Why does jQuery or a DOM method such ...
javascript - How to determine if variable is 'unde...
How do I check if an array includes a value in Jav...
arrays - What does ** do in C language
python - Parsing boolean values with argparse
php - Syntax error or access violation: 1064
Hot Linked Questions
node.js - How to show data from mysql in nodejs wi...
html - Doesn't CSS first-child or last-child work ...
image processing - Can SIFT run in realtime?
JQuery event in javascript object
java - Drawing an image in JScrollPane within scale
collections - How can I turn a List of Lists into ...
javascript - Watch for a property creation event?
Getting error to email through python script
java - Reading textfile with Scanner results in 'I...
c# - Setup projects in Visual Studio 2015
php - sending email using gmail server
film techniques - How do they prevent animal cruel...
java - What does static succeeded only by two curl...
Python method/function arguments starting with ast...
java - How can I pad an integer with zeros on the ...
How to map JSON Data in Javascript ES6








        ► 
      



November 2018

(416)







        ► 
      



October 2018

(456)







        ► 
      



September 2018

(463)







        ► 
      



August 2018

(447)







        ► 
      



July 2018

(442)







        ► 
      



June 2018

(420)







        ► 
      



May 2018

(227)







        ► 
      



January 2018

(307)









        ► 
      



2017

(1271)





        ► 
      



December 2017

(487)







        ► 
      



November 2017

(432)







        ► 
      



October 2017

(352)



























Theme images by nicolecioe. Powered by Blogger.