MySQL and UTF-8

When working with UTF-8 on MySQL, it is not enough to define the CHARACTER SET and the COLLATE parameters to utf-8 when creating the database. You also have to tell MySQL that the queries you’ll be calling are utf-8. Indeed, by default the character set used by the connection and the result sets is latin-1:

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

When doing your queries yourself with mysql_query, this can be a source of confusion, as your data is stored properly in UTF-8, but still comes back funny. That’s something that recently bit me as I was fiddling with an old version of ezSQL which didn’t allow the user to change the encoding1.

You can force utf-8 by executing the following:

SET NAMES 'utf8';

Which is equivalent to:

SET character_set_client = utf8;
SET character_set_results = utf8;
SET character_set_connection = utf8;

In recent PHP (>= 5.2), you can also execute:

mysql_set_charset('utf8',$conn);

Libraries like Propel usually handle that quite well by specifying a configuration option, and relieving the developer from these worries. Typically, the runtime configuration settings for Propel would be:

<config>
 <propel>
  <datasources>
   <datasource>
    <connection>
     <!-- ... -->
     <settings>
      <setting id="charset">utf8</setting>
     </settings>

For Rails, it is also very similar. When defining your database instance in config/database.yml, you can also give the encoding parameter:

development:
  adapter: mysql
  encoding: utf8
  reconnect: false
  database: pouet_dev
  pool: 5
  username: root
  password: pouet
  host: localhost
  socket: /var/run/mysqld/mysqld.sock

For Hibernate, arbitrary connection properties can be passed by using the property name, with hibernate.connection preprended to the name.

<property name="hibernate.connection.characterEncoding">UTF-8</property>

This parameter is the MySQL Connector/J parameters used by the driver to indicate the encoding (note that the documentation indicates that SET NAMES 'utf8' would not work with Connector/J). Examples will probably follow…

1 Not sure recent versions do either?

 
---

Comment

your_ip_is_blacklisted_by sbl.spamhaus.org

---