Friday, 30 November 2018

php - Apache's default encoding is ISO-8859-1 but websites are UTF-8?



I have to deal with encoding for the first time and I'm confused by how PHP, Apache, and browsers handle encodings. PHP and Apache use ISO-8859-1 by default, but most websites are UTF-8. At what point is ISO-8859-1 converted into UTF-8? Also, since PHP uses ISO-8859-1, how come it can read UTF-8 webpages?


Answer



Apache doesn't "use" any encoding by default, its job hardly has anything to do with understanding or converting text encodings. PHP doesn't "use" ISO-8859 by default, PHP's strings have no associated encoding.




What is true is that many of PHP's core string functions assume ASCII or ISO-8859 encoding in their operations and are not equipped to deal with other encodings properly. However, and it's worth stating this again, PHP strings as a data type do not have any encoding per se, nothing is preventing you from having strings in any encoding you wish in PHP and PHP even offers functions to deal correctly with string manipulation in virtually any arbitrary encoding. So, as long as you do it correctly, nothing is preventing you from handling and outputting UTF-8 with PHP.



Apache then does not care in the least what exactly you're sending to the client, it does not stand in anyone's way with regards to outputting text in any encoding (or binary data for that matter). The only thing it may do is add an HTTP header like this to the response:



Content-Type: text/html; charset=iso-8859-1


This header is only there to inform the client what content it receives. This header is not based in any way on the actual content you're sending, Apache neither cares nor checks nor converts anything. It just sets this header and that's all it does. You should configure Apache to set the correct charset value that corresponds to the encoding you're actually outputting from PHP, its default is simply iso-8859-1. Or you may set a Content-Type header yourself from PHP to prevent Apache from adding one. That's all.



For more information, see What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text, Handling Unicode Front To Back In A Web App.



No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...