from this post: http://www.thescripts.com/forum/thread214891.html
It's important to distinquish between characters (or charsets) and
character encodings. They are two different things. A charset is a map
that defines which numeric value represents a particular glyph. A
character encoding defines how numeric values are serialized into a
stream of bytes. For example Unicode can be encoded as UTF-8 which which
is space effecient and provides compatibility with the ASCII and ISO-8859-1
charsets. Or it could be encoded as UCS4-LE which is not space effient
but it can be easier to do heavy text processing with it.
Here's a nice link about programming with extended charsets although it
is a little UTF-8/*nix centric:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
没有评论:
发表评论