## MySQL and UTF-8

When working with UTF-8 on MySQL, it is not enough to define the CHARACTER SET and the COLLATE parameters to utf-8 when creating the database. You also have to tell MySQL that the queries you’ll be calling are utf-8. Indeed, by default the character set used by the connection and the result sets is latin-1:

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)


When doing your queries yourself with mysql_query, this can be a source of confusion, as your data is stored properly in UTF-8, but still comes back funny. That’s something that recently bit me as I was fiddling with an old version of ezSQL which didn’t allow the user to change the encoding1.

You can force utf-8 by executing the following:

SET NAMES 'utf8';


Which is equivalent to:

SET character_set_client = utf8;
SET character_set_results = utf8;
SET character_set_connection = utf8;


In recent PHP (>= 5.2), you can also execute:

mysql_set_charset('utf8',\$conn);


Libraries like Propel usually handle that quite well by specifying a configuration option, and relieving the developer from these worries. Typically, the runtime configuration settings for Propel would be:

<config>
<propel>
<datasources>
<datasource>
<connection>
<!-- ... -->
<settings>
<setting id="charset">utf8</setting>
</settings>


For Rails, it is also very similar. When defining your database instance in config/database.yml, you can also give the encoding parameter:

development:
encoding: utf8
reconnect: false
database: pouet_dev
pool: 5
host: localhost
socket: /var/run/mysqld/mysqld.sock


For Hibernate, arbitrary connection properties can be passed by using the property name, with hibernate.connection preprended to the name.

<property name="hibernate.connection.characterEncoding">UTF-8</property>


This parameter is the MySQL Connector/J parameters used by the driver to indicate the encoding (note that the documentation indicates that SET NAMES 'utf8' would not work with Connector/J). Examples will probably follow…