Student guest post: Unicode and you, and what to do with weird band names

Students in JOMC 457, Advanced Editing, are writing guest posts for this blog this semester. This is the second of those posts. Joe Chapman is a senior from Asheville, N.C. He is the editor of the Diversions section in The Daily Tar Heel, and he has a keen interest in both music and journalism.

GL▲SS †33†H. Gr†ll Gr†ll. ℑ⊇◊⊆ℜ. Perhaps more recognizably, Spın̈al Tap (See: metal umlaut.) These are all band names that would be virtually impossible to propagate without the use of copy-paste. While an accented ‘e’ or umlauted ‘a’ will occasionally make its way into a story with words such as café or doppelgänger, band names like ///▲▲▲\\\ present a challenge for editors — how do you handle using these bizarre foreign characters in print and online?

Now here’s a headline for you: In 2010, The Guardian wrote on its music blog about the bands whose /\/ /\ /\/\ € $ were made out of $ ¥ /\/\ ß 0 \ $. While the headline is a little bold, it serves the article well. But its use of non-standard characters has me wondering how many people could actually see the headline and the band names in the article.

I’m going to spare you the technical history of rendering fonts on computers, the problems with multibyte character lengths and how they were fixed and the different standards for character encoding, but it’s a good read, I promise. Instead, let me say this: There’s a protocol browsers use to render text called UTF-8. UTF-8 is useful because it’s capable of rendering pretty much every single character from any language.

But just because your browser uses UTF-8 doesn’t mean you can start seeing ★★★.

Printing special characters is easy: All you need is a font that has support for the characters you need (and most Adobe fonts have support for at least the Latin alphabet). If for some reason all your computer has installed is the old version of Courier, there are plenty of open source alternatives for rendering special characters.

But rendering these characters online presents a problem. Practically no computer has the same set of fonts installed, and there is no font anywhere near supporting every UTF-8 character. The ‘look of disapproval’ emoticon and meme, for example, renders improperly on the majority of Apple computers running Safari.

ಠ_ಠ If you’re stuck seeing squares or question marks, unfortunately your browser doesn’t have adequate UTF-8 compatibility. If you see a rather unamused, glaring emoticon, congratulations — you’re using a modern browser with an adequate font library. (And the look of disapproval is probably unwarranted. I apologize).

Japanese emoticons make interesting use of special characters as well. 。◕ ‿ ◕。

So how do you decide if it’s appropriate to use special characters on your website? It comes down to knowing your audience.

Browsers with red columns have known issues with rendering more commonly used special characters.

While Internet Explorer 9 is much more robust in handling UTF-8, 15 percent of all Internet users still use an older version of IE. In IE’s default settings, Times New Roman is used to render Latin characters. And that’s a shame, because Times New Roman barely has any support for special characters.

So without some fiddling on the user’s end, they’ll be seeing question marks when you try and render a hip band name. (Similarly with Safari: the S5 works just fine, but there are still some stragglers using older versions.)

A quick and dirty fix would be to save a small, slim image of the band name and insert the image in line with the text. Or, just do what GL▲SS †33†H does with its bandcamp URL and get as close as you can in plain English.

Ask your website administrator to break down your viewers’ browser usage. If your website uses any sort of analytics software, it’s probably recording what browser people use each time they visit your site. If you have a high percentage of users with modern browsers — browsers like Chrome and Firefox are suited to handle most special characters out of the box — then you’re probably safe using nontraditional characters.

For me, bands like GL▲SS †33†H don’t really take it far enough. If I had to come up with a band name and I was feeling particularly sadistic, I’d probably go with something that would be a little more devastating to a website’s markup than black squares or interrobangs.

There are certain unicode characters out there that are instructional, non-characters and allow for text to be written vertically or backwards for special languages, like Arabic or Thaana. When combining the instructional characters with symbols from the runic alphabet, you can break text out of its bound space, and come up with something that look like you’re ripping a hole in the matrix.

Good luck getting this in print:

Ķ̥̥̹̗̭̗̻̫̳̝̦̫̭͇̖̾̋̋̓̈ͪ̏͊ͧ͑̊͊ͪͮ̚͟ͅE̶̝̫̖̭̤̘͙͒̒̆͋ͯ̂̄͐ͥ̈ͫ͑̄͂ͥ͞͡͠N̛̟̣̯ͤ͌̇ͧ̿̇̉͊͋͗̒ͨ͑̄ͯ̿͋̕͝ͅN̶̨͇͓͇̠̻̗͈̪̝͓͚ͨͩ̑ͤͧ̿͑͗́͘͡Y̨̨̛͍̙̜̖͑̋̋͋̋ͣ̔ͣ’̸̷͎̳̙̬̲̞͇̖͓̘̳̘͎̭̗̬͊̌́̄̉̇̾̐̌̚ͅͅS̴̶̞̮̘̱͕̣̲̭̠͔̝̦͉̪̭͓̉ͦ͋ͩ̃ͥ͛̈́͋͝

̵̵̴̤̹͇̮̼̱͙̜̱̝̠̟̯͍̜̗̼̞̒̋͑͌̽̅̍̕͡W̵̛͍̘̲̦͓̝͒̄̽̇ͬͧ̍ͫ̿̓̋̅̈́͆ͮ͒̚̚Ő̷ͧ̄̔̀ͥ̑͌͆ͧ̂̃̌ͧͧͣ͡͏͎̼͙̗͈̦͕̝̦̣̱͔̺͉͞ͅR̵ͤͧ̐̾͂̐̐͒̋̓͘҉̬̩͔̹L̶̖͉͇̟̫͍͈̹̞̖͙̯̤͐ͣ̄̓͘ͅḐ̷̶̡̗͕̺͓͂ͬͦ́̉ͩͤ̎

͊̎̂ͥͥ̑̽̇́̌͌͐ͬ̏̾̆͆ͤ͏̶̵̶̰͖̫̖̘͍̮̻̲̱̟̫͖̗̳̖̤͇͞O̡̢̧͇̳̥̜̭̲ͣͬ̿̉͐̿ͣͫ̀͠F̷̧̛̲̼̹͔̪̘̥̥̫̥͓̭͖̭̃ͨ́̈́̅́̉ͮ̐ͣ́͐ͮͮ̚͜͝

̴̥͔̺̮̪̝̯̥̻͇̤ͣͯͦ́͂ͮ̿̆ͯ͑ͦ̏͋̾̌̀̚B̶̸̻͎͈̭̱̳͍̗̥͉̙̳̌̌ͦ̿͒́E̛̛̛͖̬̼̙̬͈͖̭͇̞̤̖̟̺̙̭͈̐ͪ͋ͬ̄̿̈́ͫ̄ͯ̍̌ͮͪ͌̽͋͛͡E͎̘̱̲ͮͥ̈ͫ̅ͯ͆͗ͧ͒́̚F̴̨̞̞̗̾ͩ̎̊̅̋ͬ͂̉ͧ̿̌̚̚̕

And yes, you can tweet that. But your friends are probably going to assume you’ve been thoroughly hacked.

Interested in getting special characters to work on older browsers? Wikipedia has some guidelines for you.