You should avoid using MySQL collation utf8_unicode_ci as it can be quite slow

MySQL

2016-09-14 18:44 (8 years ago) ytyng

This isn't a completely quantitative discussion, but...

We added a full-text index to a field in a MyISAM table on a certain service, using bigrams for search indexing. This was for about 200,000 records.

Until now, the collation for the character set of that field was utf8_general_ci (the default), but since we wanted to match both Katakana and Hiragana in Japanese, we decided to change the collation to utf8_unicode_ci.

However, the performance drastically worsened, and the service stopped working altogether. When I checked with SHOW FULL PROCESSLIST;, I saw that the search queries were getting stuck.

So, we reverted from utf8_unicode_ci back to the original setting. We decided to normalize the search data upon insertion to handle the variations between Katakana and Hiragana in Japanese.

The takeaway is that it's better to avoid utf8_unicode_ci. In fact, for search-related operations, it's probably better to use Elasticsearch or Cloudsearch instead of relying on MySQL with full-text indexing.

You should avoid using MySQL collation utf8_unicode_ci as it can be quite slow

MySQL

2016-09-14 18:44 (8 years ago) ytyng

Recent Posts

Categories

Feeds

Archive

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011