Tuesday 22 October 2019

python - Regex include line breaks





I have the following xml file




A




B
C





D



Picture number 3?




and I just want to get the text between

and
.

So I've tried this code :



import os, re

html = open("2.xml", "r")
text = html.read()
lon = re.compile(r'
\n(.+)\n
', re.MULTILINE)
lon = lon.search(text).group(1)
print lon



but It doesn't seem to work.


Answer



1) Don't parse XML with regex. It just doesn't work. Use an XML parser.



2) If you do use regex for this, you don't want re.MULTILINE, which controls how ^ and $ work in a multiple-line string. You want re.DOTALL, which controls whether . matches \n or not.



3) You probably also want your pattern to return the shortest possible match, using the non-greedy +? operator.



lon = re.compile(r'
\n(.+?)\n
', re.DOTALL)


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...