Mar 3 2010

WordPress 2.9 Illegal Mix of Collations in Database

In the course of investigating a problem with runaway Apache processes, I discovered this collation issue.

In the apache error log, the first error to confront me was this:
WordPress database error Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' for query SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '2933' AND comment_approved != 'trash' AND .... etc

Collation conflict …. I looked at the database structure, sure enough, there’s a mix of utf8_general_ci and latin1_swedish_ci. Most of the default WordPress database tables are Latin1 and it looks like the newer tables and all of my plugins are UTF 8.

Looking at WordPress installs that I haven’t upgraded yet, all of them are set to UTF 8. But these are newer installs. The Wordpress installation that has this problem has been in place since version 1.* and Wordpress used to default to Latin 1 encoding. The UTF 8 tables are newer.

So, in other words, if you’re maintaining a WordPress install that’s been around for a while, you probably also have this issue in your database.

Now, to get this particular database fixed up …
One suggestion I came across was to export the schema and data separately then recreate the database with UTF 8 charset and reimport the data. However, that will likely break the existing content because the content has Latin 1 characters which will cause the newly minted UTF tables to choke a little.

… When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion!
ref

The better way is to run an ALTER tables query. The steps are:

  1. backup everything up
  2. ALTER all TEXT and related fields to their binary counterparts using the SQL statements generated below
  3. alter the character set
  4. change the binary data type fields back to TEXT
  5. Add DB_CHARSET and DB_COLLATE definitions to wp-config.php

Using MySQL’s information_schema to generate the actual ALTER tables statements needed, I wanted to pipe out the statements to a reusable text document so I could do a test run on a copy of the problem database first. Generating the ALTER tables statements using information_schema will then include whatever extra tables or rows have been added by plugins.

The following 4 queries are lifted directly from Haidong Ji via the Wordpress Codex. I added the pipe out to text file. Note that the path to mysql executable is on Ubuntu. Your path might be different, for example, another common location might be /usr/local/mysql/bin.

1. ALTER all TEXT and related string field types to binary field types counterparts using the SQL statements generated below. The list of conversions being made is as follows:

CHAR -> BINARY
VARCHAR -> VARBINARY
TINYTEXT -> TINYBLOB
TEXT -> BLOB
MEDIUMTEXT -> MEDIUMBLOB
LONGTEXT -> LONGBLOB

  • echo "SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'char', 'binary'),';') FROM columns WHERE table_schema = 'testblog' and data_type LIKE '%char%';" | /usr/bin/mysql -uuser -ppassword information_schema > /home/userdir/char2binary.txt
  • echo "SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'text', 'blob'),';') FROM columns WHERE table_schema = 'testblog' and data_type LIKE '%text%';" | /usr/bin/mysql -uuser -ppassword information_schema > /home/userdir/text2blob.txt
  • echo "SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', column_type, 'CHARACTER SET utf8;') FROM columns WHERE table_schema = 'testblog' and data_type LIKE '%char%';" | /usr/bin/mysql -uuser -ppassword information_schema > /home/userdir/char2utf.txt
  • echo "SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', column_type, 'CHARACTER SET utf8;') FROM columns WHERE table_schema = 'testblog' and data_type LIKE '%text%';" | /usr/bin/mysql -uuser -ppassword information_schema > /home/userdir/text2utf.txt

2. Change the default character set of the database from Latin 1 to UTF 8. This will ensure all new tables created are UTF 8, but doesn’t affect existing tables (which is fine since we’ve already changed those above, right?).
This can be done either using phpmyadmin (select database -> click the Operations tab -> select UTF 8 from dropdown menu at bottom) or in your shell ALTER DATABASE MyDb CHARACTER SET utf8;

3. change the binary data type fields back to TEXT
This can get sticky if your original structure had datatypes that … were supposed to be binary or blobs. I looked through my database before altering it and did not find any blobs or binaries. But this could vary depending upon plugins used, etc. Changing the binaries back to TEXT essentially involves running the above sql scripts in reverse.


  • echo "SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'binary', 'char'), ';') FROM columns WHERE table_schema = 'testblog' and data_type LIKE '%binary%';" | /usr/bin/mysql -uuser -ppassword information_schema > /home/userdir/rev-binary2char.txt
  • echo "SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'blob', 'text'), ';') FROM columns WHERE table_schema = 'testblog' and data_type LIKE '%blob%';" | /usr/bin/mysql -uuser -ppassword information_schema > /home/userdir/rev-blob2text.txt

4. Add DB_CHARSET and DB_COLLATE definitions to wp-config.php

references

wordpress article on changing collation
article by alex king

VN:F [1.9.3_1094]
Rating: 4.7/10 (3 votes cast)