Alan Freedman -- The Computer Language Company - Computer Desktop Encyclopedia
Computer Desktop Encyclopedia
Longest-Running Tech Reference on the Planet

A CDE Definition

You'll love The Computer Desktop Encyclopedia (CDE) for Tech Term of the Day (TTOD)

LOOK UP ANOTHER TERM


16-bit characters

See Unicode.



Unicode

A character code that defines every character in most of the speaking languages in the world. Although commonly thought to be only a two-byte coding system, Unicode characters can use only one byte, or up to four bytes, to hold a Unicode "code point" (see below). The code point is a unique number for a character or some symbol such as an accent mark or ligature. Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the number in hex; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F (see hex chart).

Character Encoding Schemes
There are several formats for storing Unicode code points. When combined with the byte order of the hardware (big endian or little endian), they are known officially as "character encoding schemes." They are also known by their UTF acronyms, which stand for "Unicode Transformation Format" or "Universal Character Set Transformation Format."

UTF-8, 16 and 32
The UTF-8 coding scheme is widely used because words from multiple languages and every type of symbol can be mixed together in the same message without having to reserve multiple bytes for every character as in UTF-16 or UTF-32. With UTF-8, if only ASCII text is required, a single byte is used per character with the high-order bit set to 0. If non-ASCII characters require more than one byte, the high-order 1 bits of the byte define how many bytes are used. See byte order, DBCS and emoji.


 Unicode   ISO       Number
 Coding    10646      of
 Scheme  Equivalent  Bytes  Order**

 UTF-8               1-4    BE or LE

 UTF-16    (UCS-2)   2      BE or LE
 UTF-16BE  (UCS-2)   2      BE
 UTF-16LE  (UCS-2)   2      LE

 UTF-32    (UCS-4)   4      BE or LE
 UTF-32BE  (UCS-4)   4      BE
 UTF-32LE  (UCS-4)   4      LE


 Pure ASCII
 (compatible with early 7-bit
   email systems)

 UTF-7               1-4    BE or LE


 **Byte Order (see byte order)
   BE = big endian
   LE = little endian





Personal Use Only

Before/After Your Search Term
BeforeAfter
15 ATM16-bit color
15-pin D-sub16-bit computer
1550 nm16-bit computing
15ATM16-bit CPU
15K RPM disk16-bit driver
16/4416-bit processing
16/44.1 kHz16-bit sample
16/4816-bit sound
16-bit16-bit version
16-bit audio16 bits

Terms By Topic
Click any of the following categories for a list of fundamental terms.
Computer Words You Gotta KnowSystem design
Job categoriesUnix/Linux
Interesting stuffPersonal computers
InternetIndustrial Automation/Process Control
Communications & networkingAssociations/Standards organizations
HistoryDesktop publishing
Audio/VideoGraphics
MainframesSecurity
ProgrammingHealthcare IT
System design