Posted by: Anonymous Coward
on November 02, 2004 08:37 PM
UTF-8, when expanded to "Unicode Transcription Form in 8 bits", can handle all Unicode characters up to 0x10FFFF (IIRC).
UTF-8, when expanded to "UCS Transcription Form in 8 bits" (UCS = ISO-10646-1), can handle all ISO characters up to 0xFFFFFFFF, *although* the ISO and Unicode have agreed to never use these.
So there is no difference between UCS-4 and UTF-8, except that the latter is variable-length, ASCII compatible, not prone to endianness bugs and _can_ imply larger files for some asian scripts.<nobr> <wbr></nobr>//mirabile - http://mirbsd.de/