I'm using href="http://php.net/manual/en/function.simplexml-load-string.php"
rel="nofollow">simplexml_load_string
to load an
XML document into an object. This seemed to be working great up until I came across this
element:
1.
Some
text.
After
running that ran through simplexml_load_string
, what came out
was:
["some_string_val"]=>
string(20) "1.    Some
text"
I tried
using:
html_entity_decode($string,
ENT_QUOTES,
"Windows-1252");
And
that seemed to convert the
's to plain text, but when
I tried to run that through simplexml_load_string
I get the
same result. I also tried with UTF-8, and a few others, with similar or worse
results.
So, what can I do to convert the
's to UTF-8 so it can be parsed correctly by
simplexml_load_string
? Keeping the HTML entities intact is not
a concern because this is going into a
CSV.
EDIT: This has been unjustly marked as a
duplicate for a couple of
reasons:
- This is not
language agnostic; this is dealing with a specific set of PHP functions, unlike the post
which this was marked a duplicate of - This is not going to
an HTML page or a PDF, it is going to a CSV, so I cannot set a header. The accepted
solution will not work in my case
Answer
I think it parses correctly. It just the way
that function works, replacing those codes with special
characters.
You can fix the result string,
converting it into cp1251
$str =
iconv('utf-8', 'cp1251',
$str);
Also I would
delete double spaces before writing it into CSV
file
$str =
str_replace(chr(160), ' ', $str);
$str= trim(preg_replace('/\s+/', ' ',
$str));
No comments:
Post a Comment