This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Re:utf isnt all that its cracked up to be.

Posted by: Anonymous Coward on November 02, 2004 08:37 PM
You're confused.

UTF-8, when expanded to "Unicode Transcription Form
in 8 bits", can handle all Unicode characters up to
0x10FFFF (IIRC).

UTF-8, when expanded to "UCS Transcription Form in
8 bits" (UCS = ISO-10646-1), can handle all ISO
characters up to 0xFFFFFFFF, *although* the ISO and
Unicode have agreed to never use these.

So there is no difference between UCS-4 and UTF-8,
except that the latter is variable-length, ASCII
compatible, not prone to endianness bugs and _can_
imply larger files for some asian scripts.<nobr> <wbr></nobr>//mirabile - http://mirbsd.de/

#

Return to Introduction to Unicode