Skip to main content
GrN.dk

Main navigation

  • Articles
  • Contact
  • Your Digital Project Manager
  • About Greg Nowak
  • Services
  • Portfolio
  • Container
    • Excel Freelancer
    • Kubuntu - tips and tricks
    • Linux Apache MySQL and PHP
    • News
    • Image Gallery
User account menu
  • Log in

Breadcrumb

  1. Home

Mysqldump Encoding: How to Avoid Broken Characters in Exports

Broken characters in a database export are rarely just a file problem. They usually mean the client connection, database defaults, or individual columns are not using the charset you think they are. If you are preparing a handover, a staging refresh, or a platform migration, the safe move is to verify the source encoding first and then dump with that exact setting. Guessing is what creates mojibake, bad imports, and long cleanup work later.

Start by checking the source, not the export command

Current MySQL documentation recommends utf8mb4 for new work, and modern MySQL uses utf8mb4 by default in many places. But plenty of real systems still carry older choices such as latin1 or utf8mb3. That means you should not force latin1 unless the database metadata or the column definitions actually point there.

These quick checks are worth doing before you dump anything:

SHOW VARIABLES LIKE 'character_set_%';
SHOW VARIABLES LIKE 'collation_%';

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME = 'DATABASE_NAME';

SELECT TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'DATABASE_NAME'
  AND CHARACTER_SET_NAME IS NOT NULL
ORDER BY TABLE_NAME, ORDINAL_POSITION;

If the schema default says utf8mb4 but a few legacy tables or columns still say latin1, you have a mixed setup. In that case, one blanket export flag may not tell the full story, and it is better to inspect the problematic tables directly before handing the dump to a client or another team.

The safer default for most exports

For most modern MySQL environments, start simple and let the dump describe itself. Current MySQL docs say mysqldump uses utf8mb4 when no charset is specified, and the tool writes a SET NAMES statement by default. That makes the dump more portable because the restore side is not forced to guess what encoding the file expects.

mysqldump -u USER -p --single-transaction DATABASE_NAME > dump.sql

If you prefer to be explicit, that is fine too:

mysqldump -u USER -p --single-transaction --default-character-set=utf8mb4 DATABASE_NAME > dump.sql

You also rarely need to add --opt manually anymore. MySQL documents it as enabled by default, and that default bundle already includes --set-charset.

--single-transaction is a good fit for InnoDB-backed systems because it usually gives you a consistent logical export without table-wide locks. The more important point for encoding is that you keep the dump self-describing unless you have a strong reason not to.

When a legacy charset like latin1 is still correct

If your schema or columns genuinely use latin1, then forcing that connection charset can still be the right move. This is the narrow case the original one-line fix was aiming at:

mysqldump -u USER -p --single-transaction --default-character-set=latin1 DATABASE_NAME > dump-latin1.sql

What I would not do by default is add --skip-set-charset. Current docs show that --set-charset is enabled by default and is part of --opt. So when you add --skip-set-charset, you are deliberately removing the SET NAMES line from the dump. That can be useful in a tightly controlled pipeline, but it also makes the file less safe for handoff because the import side must now supply the correct connection charset manually.

If you do choose to skip the charset statement, restore with the matching client setting so the file is read the way it was written:

mysql --default-character-set=latin1 DATABASE_NAME < dump-latin1.sql

Common traps that waste time

  • Using utf8 as a shortcut name without checking what version and behavior you are on. In current MySQL docs, utf8 is a deprecated alias for utf8mb3, not the full utf8mb4 most teams actually want.
  • Assuming the database default tells the whole story. The real problem may sit in a handful of older columns that still use another charset or collation.
  • Treating a dump as a repair tool. If the data was already stored with the wrong bytes, exporting it cleanly does not magically correct the original corruption.
  • Handing a dump to another vendor or environment without a test restore. A quick restore into a temporary database can save hours of finger-pointing later.

A practical handover checklist

If this export matters to a client project, staging launch, or migration window, keep the process boring and documented:

  • Verify schema and column charsets before you dump.
  • Use utf8mb4 by default unless the source clearly requires something else.
  • Keep the dump self-describing by leaving SET NAMES in place unless your restore pipeline is explicitly managing charset itself.
  • Run one test import and spot-check records that contain accents, symbols, or multi-language content.
  • If you find mixed legacy encodings, pause and map the cleanup before promising a quick migration.

That is the difference between a working handover and a weekend spent debugging broken names, product titles, and email templates.

If you are planning a migration, an agency handoff, or a cleanup of a legacy MySQL stack, Greg can help turn it into a controlled delivery plan instead of a risky export task.

Need help with this kind of work?

Need a safe database export or migration plan? Get in touch with Greg.

Sources

  • MySQL 9.7 Reference Manual :: 6.5.4 mysqldump — A Database Backup Program
  • MySQL 9.7 Reference Manual :: Chapter 12 Character Sets, Collations, Unicode
  • MySQL 9.7 Reference Manual :: 6.5.1.1 mysql Client Options
  • MySQL 9.7 Reference Manual :: 28.3.37 The INFORMATION_SCHEMA SCHEMATA Table
  • MySQL 9.7 Reference Manual :: 28.3.8 The INFORMATION_SCHEMA COLUMNS Table
Last modified
2026-05-31

Tags

  • mysql
  • Linux
  • database
  • migration

Review Greg on Google

Greg Nowak Google Reviews

 

  • Mysqldump Encoding: How to Avoid Broken Characters in Exports
  • Install a Specific Version of MySQL Server on Ubuntu
  • Fixing the HTTPRL Core Drupal Network Configuration Error in Drupal 7
  • OpenAI's Responses API shift turns old internal assistants into a paid migration project
  • Sending Mail with Drupal: Reliable Setup for Modern Sites
RSS feed

GrN.dk web platforms, web optimization, data analysis, data handling and logistics.