This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new!

Re:utf isnt all that its cracked up to be.

Posted by: David Breakey on November 02, 2004 01:20 AM

Sure, if you're completely unconcerned with backwards compatibility. Unfortunately, the real world doesn't work out that nicely. For instance, compose a mail message in a 32-bit encoding scheme and watch it almost invariably get mangled by all the routers and mail processing hubs between you and the recipient; now encode the same message in UTF8…

Incidentally, do you even know how UTF8 works? The number doesn't indicate the potential encoding range at all; UTF8 is every bit as capable of representing the full Unicode space as any of the others. It does this by being a variable encoding, using from one to four bytes to encode a single character.

Each scheme is designed to address different requirements. UTF8 is intended for when English is a dominant language, in which case it is more space efficient, or when full compatibility with the ASCII7 standard is a must.

Incidentally, can you provide some specific examples of how UTF32 can't represent Asian languages completely? I haven't come across anything yet that isn't a result of the various standards groups arguing over the best way to encode them…the technical implementation is perfectly capable, even using UTF8.

Incidentally, don't you mean UCS-4, which is also a Unicode standard?


Return to Introduction to Unicode