Wednesday, 2 January 2019

Python XML Parsing




*Note: lxml will not run on my system. I was hoping to find a solution that does not involve lxml.



I have gone through some of the documentation around here already, and am having difficulties getting this to work how I would like to. I would like to parse some XML file that looks like this:




1375

Key 11375

Key 2Some String
Key 3Another string
Key 4Yet another string
Key 5Strings anyone?




In the file I am trying to manipulate, there are more 'dict' that follow this one. I would like to read through the XML and output a text/dat file that would look like this:




1375, "Some String", "Another String", "Yet another string", "Strings anyone?"



...



Eof



** Originally, I tried to use lxml, but after many tries to get it working on my system, I moved on to using DOM. More recently, I tried using Etree to do this task. Please, for the love of all that is good, would somebody help me along with this? I am relatively new to Python and would like to learn how this works. I thank you in advance.


Answer



You can use xml.etree.ElementTree which is included with Python. There is an included companion C-implemented (i.e. much faster) xml.etree.cElementTree. lxml.etree offers a superset of the functionality but it's not needed for what you want to do.




The code provided by @Acorn works identically for me (Python 2.7, Windows 7) with each of the following imports:



import xml.etree.ElementTree as et
import xml.etree.cElementTree as et
import lxml.etree as et
...
tree = et.fromstring(xmltext)
...



What OS are you using and what installation problems have you had with lxml?


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...