Friday, 14 December 2018

PHP str_getcsv() does not parse CSV correctly if it contains Japanese character



I am trying to convert an excel file to an array using file() function. Some fields are containing Japanese character. For those field, I am not getting correct data.




Here is my code line



$data = array_map('str_getcsv', file($path));

Answer



I can only guess without details such like what input Japanese letters were how wrongly converted.



str_getcsv() sees system locale, so setting Japanese locale might fix the issue.



This code




setlocale(LC_ALL, 'ja_JP');
$data = array_map('str_getcsv', file('japanese.csv'));
var_dump($data);


works with the following CSV file (japanese.csv, saved in UTF-8) on my local.



日本語,テスト,ファイル
2行目,CSV形式,エンコードUTF-8



The results are



array(2) {
[0]=>
array(3) {
[0]=>
string(9) "日本語"
[1]=>

string(9) "テスト"
[2]=>
string(12) "ファイル"
}
[1]=>
array(3) {
[0]=>
string(7) "2行目"
[1]=>
string(9) "CSV形式"

[2]=>
string(20) "エンコードUTF-8"
}
}


As you see, str_getcsv() requires you to know what kind of languages are used in input CSV file. This time you may be sure that the input are always in Japanese, but it is not usable for parsing CSV if its language is unpredictable. Also, you would need to be careful that the directed locale could be missing if your code is used on different environment.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...