Garbled Text in Bad Places
There was a time, in a land far away, that the internet and Japanese text did not live in harmony. Pull up your favorite late 90s/early 2000 webpage, and you get greeted with the 文字化け (もじばけ – mojibake – garbled text). Japanese is slightly more complicated than the English alphabet, and computers wanted nothing to do with it. So all that beautiful kanji came out as a mess of random keystrokes.
Luckily these medieval days have fairly vanished, and your chance of visiting a site or having an experience with 文字化け is fairly slim. If you search hard enough, I’m sure you can find some relics, or you could probably mess with your computer settings to bring them back.
So are you safe? As long as you aren’t visiting a Geocities.jp site, no problem right?
Well how about Amazon.jp? Yes that Amazon. And how about a 文字化け on an important package you order.
I think we have a winner.
Take a look at the お届け先 (destination address). See anything out of place? While your first inclination might be that Japanese addresses are crazy, so this is probably normal, 文字化け makes it’s way into the most important part of a package.
And what happened?
This is Japan. What do you think happened? It was delivered safely to the proper address, 文字化け and all. Were you expecting anything less from ヤマト(Yamato), the Japanese shipping king? No you weren’t. Neither garbled text nor ninja robots nor alien samurai stays these couriers from the swift completion of their appointed rounds. This would beat the American post office’s motto in a second.
———————–
Source Picture
Founder of Jalup. iOS Software Engineer. Former attorney, translator, and interpreter. Still watching 月曜から夜ふかし weekly since 2013.
So…
Let us say, hypothetically of course, that I know of a cousin of a cousin of mine who finds this 文字化け thing far too often… do you mean to imply that this person is living in the past, then?
Anyway, at least now that I know the name of the problem I might actually be able to google for a solution :p.
Haha, hypothetically your cousin’s cousin might just have an older computer with older settings.
文字化けね。。。
ソフト・テスターとして、これをテストしたり、バッグを書いたりする必要だった。文字化けって言うのは知りなかった。
仕事でこの言葉をよく使う。どうも。
「文字化け」を一生忘れないように!
If anybody’s curious for a more technical explanation of how 文字化け came to be they should give http://www.joelonsoftware.com/articles/Unicode.html a read. It’s not really specific to Japanese but it does cover how different text formats were handled in the early days of computing and the trouble it caused when the Internet became a thing. It’s especially fascinating reading if you’re into both Japanese and programming.
Great read (one irony of being a pure mathematician is knowing full well how this stuff ought to work in general abstract terms while neglecting the practice).
The sense of humor is pretty great too.
Anyway, the following quote from your link
“Anyway, what does the poor reader of this website, which was written in Bulgarian but appears to be Korean (and not even cohesive Korean), do? He uses the View | Encoding menu and tries a bunch of different encodings (there are at least a dozen for Eastern European languages) until the picture comes in clearer. If he knew to do that, which most people don’t.”
actually solved the problem I ha… I mean, the problem the hypothetical cousin of my cousin had, so now I’m feeling pretty good about giving that thing a read.
Very interesting article on the subject of 文字化け. Thanks for adding it here!
In fact even that information is out of date now, the latest UTF-8 revision only allows up to 4 bytes to encode a code point, not 6 bytes as stated in that article (which seems like a step backwards, but it’s for compatibility reasons as UTF-16 can only use to up 4 bytes).