When connecting to a MySQL 5.7 server with a MySQL 8.0 client, the charset may not be set
↑ Detailed explanation available here. As described in the title, the character set cannot be specified from the client side.
When I upgraded Ubuntu from 18 to 20 and retrieved data from MySQL 5.7 RDS using Django running on it, all Japanese characters were displayed as ????????.
$ cat /etc/issue
Ubuntu 20.04.2 LTS \n \l
$ apt list --installed | grep mysql
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
default-libmysqlclient-dev/focal,now 1.0.5ubuntu2 amd64 [installed]
libmysqlclient-dev/focal-updates,focal-security,now 8.0.26-0ubuntu0.20.04.2 amd64 [installed,automatic]
libmysqlclient21/focal-updates,focal-security,now 8.0.26-0ubuntu0.20.04.2 amd64 [installed,automatic]
mysql-client-core-8.0/focal-updates,focal-security,now 8.0.26-0ubuntu0.20.04.2 amd64 [installed]
mysql-common/focal,now 5.8+1.0.5ubuntu2 all [installed,automatic]
As a test,
./manage.py dbshell
and then,
mysql> show variables like "%chara%";
it shows
mysql> show variables like "%chara%";
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.7.33.R2/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.01 sec)
This is the result.
Ideally, character_set_client and other character sets should be set to utf8mb4. In fact, doing the same thing on ubuntu18 results in utf8mb4.
Also, this phenomenon of character sets becoming latin1
does not change even if you add --default-character-set=utf8mb4
to the mysql
command on ubuntu18 or write character set settings in /etc/mysql/conf.d/mysql.cnf
.
After connecting to mysql,
SET NAMES utf8mb4
then the Japanese characters will not be garbled.
Incidentally, since it is undesirable to issue unnecessary SQL commands in a production environment, this is not done in production. However, in Django, this can be achieved by setting the DB configuration as follows:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
'OPTIONS': {
'charset': 'utf8mb4',
'init_command': 'set names utf8mb4',
},
},
}
In this state, it seems impossible to avoid it with client-side settings (other than the above SET NAMES utf8mb4), so set the character set on the server side. In the case of RDS, in the parameter group,
character_set_client
character_set_connection
character_set_database
character_set_results
character_set_server
set all of these to utf8mb4
.
In reality, since there is usually no reason to use a character set other than utf8mb4, it feels like this should have been done from the beginning.
These parameters have a dynamic attribute, so they can be applied without restarting RDS after setting.
Comments