2007-09-24

字符集(charset)和字符编码(character encoding)的区别

from this post: http://www.thescripts.com/forum/thread214891.html

It's important to distinquish between characters (or charsets) and
character encodings. They are two different things. A charset is a map
that defines which numeric value represents a particular glyph. A
character encoding defines how numeric values are serialized into a
stream of bytes. For example Unicode can be encoded as UTF-8 which which
is space effecient and provides compatibility with the ASCII and ISO-8859-1
charsets. Or it could be encoded as UCS4-LE which is not space effient
but it can be easier to do heavy text processing with it.

Here's a nice link about programming with extended charsets although it
is a little UTF-8/*nix centric:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

没有评论: