Thursday, 1 August 2019

MySQL and PHP: UTF-8 with Cyrillic characters




I'm trying to insert a Cyrillic value in the MySQL table, but there is a problem with encoding.




Php:




$servername = "localhost";
$username = "a";
$password = "b";
$dbname = "c";

$conn = new mysqli($servername, $username, $password, $dbname);


mysql_query("SET NAMES 'utf8';");
mysql_query("SET CHARACTER SET 'utf8';");
mysql_query("SET SESSION collation_connection = 'utf8_general_ci';");

if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}

$sql = "UPDATE `c`.`mainp` SET `search` = 'test тест' WHERE `mainp`.`id` =1;";


if ($conn->query($sql) === TRUE) {
}
$conn->close();

?>


MySQL:




| id |    search   |            
| 1 | test ав |


Note: PHP file is utf-8, database collation utf8_general_ci


Answer




You are mixing APIs here, mysql_* and mysqli_* doesn't mix. You should stick with mysqli_ (as it seems you are anyway), as mysql_* functions are deprecated, and removed entirely in PHP7.





Your actual issue is a charset problem somewhere. Here's a few pointers which can help you get the right charset for your application. This covers most of the general problems one can face when developing a PHP/MySQL application.




  • ALL attributes throughout your application must be set to UTF-8

  • Save the document as UTF-8 w/o BOM (If you're using Notepad++, it's Format -> Convert to UTF-8 w/o BOM)

  • The header in both PHP and HTML should be set to UTF-8




    • HTML (inside tags):






    • PHP (at the top of your file, before any output):



      header('Content-Type: text/html; charset=utf-8');


  • Upon connecting to the database, set the charset to UTF-8 for your connection-object, like this (directly after connecting)



    mysqli_set_charset($conn, "utf8"); /* Procedural approach */

    $conn->set_charset("utf8"); /* Object-oriented approach */


    This is for mysqli_*, there are similar ones for mysql_* and PDO (see bottom of this answer).


  • Also make sure your database and tables are set to UTF-8, you can do that like this:



    ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;



    (Any data already stored won't be converted to the proper charset, so you'll need to do this with a clean database, or update the data after doing this if there are broken characters).







  • If you're using json_encode(), you might need to apply the JSON_UNESCAPED_UNICODE flag, otherwise it will convert special characters to their hexadecimal equivalent.



Remember that EVERYTHING in your entire pipeline of code needs to be set to UFT-8, otherwise you might experience broken characters in your application.




In addition to this list, there may be functions that has a specific parameter for specifying a charset. The manual will tell you about this (an example is htmlspecialchars()).



There are also special functions for multibyte characters, example: strtolower() won't lower multibyte characters, for that you'll have to use mb_strtolower(), see this live demo.




Note 1: Notice that its someplace noted as utf-8 (with a dash), and someplace as utf8 (without it). It's important that you know when to use which, as they usually aren't interchangeable. For example, HTML and PHP wants utf-8, but MySQL doesn't.



Note 2: In MySQL, "charset" and "collation" is not the same thing, see Difference between Encoding and collation?. Both should be set to utf-8 though; generally collation should be either utf8_general_ci or utf8_unicode_ci, see UTF-8: General? Bin? Unicode?.




Note 3: If you're using emojis, MySQL needs to be specified with an utf8mb4 charset instead of the standard utf8, both in the database and the connection. HTML and PHP will just have UTF-8.







Setting UTF-8 with mysql_ and PDO




  • PDO: This is done in the DSN of your object. Note the charset attribute,




    $pdo = new PDO("mysql:host=localhost;dbname=database;charset=utf8", "user", "pass");

  • mysql_: This is done very similar to mysqli_*, but it doesn't take the connection-object as the first argument.



    mysql_set_charset('utf8');


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...