When Connecting to a MySQL 5.7 Server Using a MySQL 8.0 Client, Charset Specification Cannot Be Done from the Client

2021-08-07 16:03 (3 years ago) ytyng

When connecting to a MySQL 5.7 server with a MySQL 8.0 client, the charset may not be set

↑ Detailed explanation available here. As described in the title, the character set cannot be specified from the client side.

Overview

When I upgraded Ubuntu from 18 to 20 and retrieved data from MySQL 5.7 RDS using Django running on it, all Japanese characters were displayed as ????????.

Library Version

$ cat /etc/issue
Ubuntu 20.04.2 LTS \n \l

$ apt list --installed | grep mysql

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

default-libmysqlclient-dev/focal,now 1.0.5ubuntu2 amd64 [installed]
libmysqlclient-dev/focal-updates,focal-security,now 8.0.26-0ubuntu0.20.04.2 amd64 [installed,automatic]
libmysqlclient21/focal-updates,focal-security,now 8.0.26-0ubuntu0.20.04.2 amd64 [installed,automatic]
mysql-client-core-8.0/focal-updates,focal-security,now 8.0.26-0ubuntu0.20.04.2 amd64 [installed]
mysql-common/focal,now 5.8+1.0.5ubuntu2 all [installed,automatic]

Investigation

As a test,

./manage.py dbshell

and then,

mysql> show variables like "%chara%";

it shows

mysql> show variables like "%chara%";
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.7.33.R2/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.01 sec)

This is the result.

Ideally, character_set_client and other character sets should be set to utf8mb4. In fact, doing the same thing on ubuntu18 results in utf8mb4.

Also, this phenomenon of character sets becoming latin1 does not change even if you add --default-character-set=utf8mb4 to the mysql command on ubuntu18 or write character set settings in /etc/mysql/conf.d/mysql.cnf.

How to forcefully handle it on the client side

After connecting to mysql,

SET NAMES utf8mb4

then the Japanese characters will not be garbled.

Incidentally, since it is undesirable to issue unnecessary SQL commands in a production environment, this is not done in production. However, in Django, this can be achieved by setting the DB configuration as follows:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
'OPTIONS': {
'charset': 'utf8mb4',
'init_command': 'set names utf8mb4',
},
},
}

Solution

In this state, it seems impossible to avoid it with client-side settings (other than the above SET NAMES utf8mb4), so set the character set on the server side. In the case of RDS, in the parameter group,

character_set_client
character_set_connection
character_set_database
character_set_results
character_set_server

set all of these to utf8mb4.

In reality, since there is usually no reason to use a character set other than utf8mb4, it feels like this should have been done from the beginning.

These parameters have a dynamic attribute, so they can be applied without restarting RDS after setting.

Currently unrated

Comments

Archive

2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011