'Phags-pa script
0x
ANSEL
APL (codepage)
ASCII
ATASCII
Arabic (Unicode block)
Arabic alphabet
Arabic diacritics
Aramaic language#Imperial Aramaic
ArmSCII
Armenian alphabet
Avestan alphabet
Balinese script
Bamum language
Batak script
Baudot code
Baybayin
Bengali script
Bi-directional text
Big5
Binary Ordered Compression for Unicode
Brāhmī script
Braille
Buhid script
Burmese script
Byte
Byte-order mark
Byte-oriented
Byte order mark
C0 and C1 control codes
CCCII
CCSID
CDC display code
CESU-8
CJK Unified Ideographs
CNS 11643
Canadian Aboriginal syllabics
Carian script
Cham alphabet
Character encoding
Character encodings in HTML
Character property (Unicode)
Charset detection
Cherokee syllabary
Code page 1133
Code page 437
Code page 720
Code page 737
Code page 775
Code page 850
Code page 852
Code page 855
Code page 857
Code page 858
Code page 860
Code page 861
Code page 862
Code page 863
Code page 865
Code page 866
Code page 869
Code page 932
Code page 936
Code page 949
Code page 950
Code point
Combining character
Combining grapheme joiner
Common Locale Data Repository
Comparison of Unicode encodings
ConScript Unicode Registry
Control character
Coptic alphabet
Cork encoding
Cuneiform script
Currency sign
Cypriot syllabary
Cyrillic alphabet
DEC Radix-50
Deseret alphabet
Devanagari script
Diacritic
Duplicate characters in Unicode
EBCDIC 037
EBCDIC 1047
EBCDIC 285
EBCDIC 500
EBCDIC 875
EBCDIC 930
EUC-CN
EUC-JP
EUC-KR
EUC-TW
Egyptian hieroglyphs
Extended Unix Code
Fieldata
Fraser alphabet
GBK
GB 18030
0x
ANSEL
APL (codepage)
ASCII
ATASCII
Arabic (Unicode block)
Arabic alphabet
Arabic diacritics
Aramaic language#Imperial Aramaic
ArmSCII
Armenian alphabet
Avestan alphabet
Balinese script
Bamum language
Batak script
Baudot code
Baybayin
Bengali script
Bi-directional text
Big5
Binary Ordered Compression for Unicode
Brāhmī script
Braille
Buhid script
Burmese script
Byte
Byte-order mark
Byte-oriented
Byte order mark
C0 and C1 control codes
CCCII
CCSID
CDC display code
CESU-8
CJK Unified Ideographs
CNS 11643
Canadian Aboriginal syllabics
Carian script
Cham alphabet
Character encoding
Character encodings in HTML
Character property (Unicode)
Charset detection
Cherokee syllabary
Code page 1133
Code page 437
Code page 720
Code page 737
Code page 775
Code page 850
Code page 852
Code page 855
Code page 857
Code page 858
Code page 860
Code page 861
Code page 862
Code page 863
Code page 865
Code page 866
Code page 869
Code page 932
Code page 936
Code page 949
Code page 950
Code point
Combining character
Combining grapheme joiner
Common Locale Data Repository
Comparison of Unicode encodings
ConScript Unicode Registry
Control character
Coptic alphabet
Cork encoding
Cuneiform script
Currency sign
Cypriot syllabary
Cyrillic alphabet
DEC Radix-50
Deseret alphabet
Devanagari script
Diacritic
Duplicate characters in Unicode
EBCDIC 037
EBCDIC 1047
EBCDIC 285
EBCDIC 500
EBCDIC 875
EBCDIC 930
EUC-CN
EUC-JP
EUC-KR
EUC-TW
Egyptian hieroglyphs
Extended Unix Code
Fieldata
Fraser alphabet
GBK
GB 18030
UTF-1 is a way of transforming ISO 10646/Unicode into a stream of bytes. Due to the design it is not possible to resynchronise if decoding starts in the middle of a character (this makes truncation hard, among other things) and simple byte-oriented search routines cannot be reliably used with it. UTF-1 is also fairly slow due to its use of division. Due to these issues, UTF-1 never gained wide acceptance and has been almost totally replaced by UTF-8.
Design
UTF-1 is a multi-byte encoding like UTF-8; a single Unicode code point can be encoded in one, two, three, or five octets. While the ASCII range is encoded as one octet as in UTF-8 the ASCII octets 0x21 - 0x7E (decimal 33 - 126) are also used in UTF-1 multi-byte encodings, therefore UTF-1 is unsuited for many Internet protocols including MIME.
UTF-1 does not use the C0 and C1 control codes in other encodings – any 0x00–0x20 or 0x7F–0x9F octet stands for the corresponding code points in ISO-8859-1 (U+0000–0020 and U+007F–009F, respectively). This design with 66 protected octets tried to be ISO 2022 compatible.
UTF-8 - Wikipedia, the free encyclopedia
UTF-8 (UCS[1] Transformation Format — 8-bit) is a multibyte character encoding for Unicode. ... UTF-8 encodes each of the 1,112,064[7] code points in the Unicode ...
The UTF-1 encoding scheme uses "modulo 190" arithmetic (256 − 66 = 190), it was designed to encode the complete 31 bits of the original Universal Character Set (UCS-4). For comparison, UTF-8 protects all 128 ASCII octets, and needs two bits in trail bytes of multi-byte encodings for this purpose, resulting in "modulo 64" arithmetic (8 − 2 = 6, 26 = 64). BOCU-1 protects only the minimal set required for MIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256 − 13 = 243).
codepoint
UTF-16BE
UTF-16LE
UTF-8
UTF-1
U+007F
007F
7F00
7F
7F
U+0080
0080
8000
C280
80
U+009F
009F
9F00
C29F
9F
U+00A0
00A0
A000
C2A0
A0A0
U+00BF
00BF
BF00
C2BF
A0BF
U+00C0
00C0
C000
C380
A0C0
U+00FF
00FF
FF00
C3BF
A0FF
U+0100
0100
0001
C480
A121
U+015D
015D
5D01
C59D
A17E
U+015E
015E
5E01
C59E
A1A0
U+01BD
01BD
BD01
C6BD
A1FF
U+01BE
01BE
BE01
C6BE
A221
U+07FF
07FF
FF07
DFBF
AA72
U+0800
0800
0008
E0A080
AA73
U+0FFF
0FFF
FF0F
E0BFBF
B548
U+1000
1000
0010
E18080
B549
U+4015
4015
1540
E48095
F5FF
U+4016
4016
1640
E48096
F62121
U+D7FF
D7FF
FFD7
ED9FBF
F72FC3
U+E000
E000
00E0
EE8080
F73A79
U+F8FF
F8FF
FFF8
EFA3BF
F75C3C
U+FDD0
FDD0
D0FD
EFB790
F762BA
U+FDEF
FDEF
EFFD
EFB7AF
F762D9
U+FEFF
FEFF
FFFE
EFBBBF
F7644C
U+FFFD
FFFD
FDFF
EFBFBD
F765AD
U+FFFE
FFFE
FEFF
EFBFBE
F765AE
U+FFFF
FFFF
FFFF
EFBFBF
F765AF
U+10000
D800DC00
00D800DC
F0908080
F765B0
U+38E2D
D8A3DE2D
A3D82DDE
F0B8B8AD
FBFFFF
U+38E2E
D8A3DE2E
A3D82EDE
F0B8B8AE
FC21212121
U+FFFFF
DBBFDFFF
BFDBFFDF
F3BFBFBF
FC2137B27A
U+100000
DBC0DC00
C0DB00DC
F4808080
FC2137B27B
U+10FFFF
DBFFDFFF
FFDBFFDF
F48FBFBF
FC21396E6C
See also
Comparison of Unicode encodings
Universal Character Set
References
ISO IR 178 (PDF, 256 KB, the retired UTF-1 specification)
v · d · eUnicode
Unicode
Unicode Consortium · ISO/IEC 10646 (Universal Character Set)
Code points
Code point · Plane · Block · Mapping characters · Character property · Character charts
Characters
Special purpose
BOM · Combining grapheme joiner · Left-to-right mark and Right-to-left mark · Zero-width non-breaking space · Zero-width joiner · Zero-width non-joiner · Zero-width space
Miscellaneous lists
Combining character · Duplicate characters · Graphic characters
Processing
Algorithms
Bi-directional text · Collation (ISO 14651) · Equivalence
Transformation
BOCU-1 · CESU-8 · UTF-1 · UTF-7 · UTF-8 · UTF-9/UTF-18 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-EBCDIC · Punycode · SCSU · Comparison
On pairs
of code points
Equivalence · Combining character · Duplicates · Homoglyph · Precomposed character (List) · Compatibility characters · Z-variant
Usage
Unicode and e-mail · Unicode and HTML · Character entity references · Unicode input · Internationalized domain name · Numeric character reference · Private Use U+F8FF · Typefaces (fonts) · Script (Unicode)
Related standards
Common Locale Data Repository (CLDR) · GB 18030 · Han unification · ISO/IEC 8859 (8-bit encodings) · ISO 14651 (Collation) · ISO 15924 (Script codes)
Related topics
Anomalies · ConScript Unicode Registry · Ideographic Rapporteur Group · International Components for Unicode · MUFI · People related to Unicode
Scripts and symbols in Unicode
Common and
inherited scripts
Combining marks · Diacritics · Punctuation · Space
Modern scripts
Arabic (diacritics · Unicode blocks) · Armenian · Balinese · Batak · Bamum · Bengali · Bopomofo · Braille · Buginese · Buhid · Canadian Aboriginal · Cham · Cherokee · CJK Unified Ideographs (Han) · Cyrillic · Deseret · Devanagari · Ethiopic · Georgian · Greek · Gujarati · Gurmukhi · Kanji · Hanja · Hán tự · Hangul · Hanunoo · Hebrew (diacritics) · Hiragana · Javanese · Kannada · Katakana · Kayah Li · Khmer · Lao · Latin · Lepcha · Limbu · Lisu · Malayalam · Mandaic · Meetei Mayek · Mongolian · Manchu · Myanmar · N'Ko · New Tai Lue · Ol Chiki · Oriya · Osmanya · Rejang · Samaritan · Saurashtra · Shavian · Sinhala · Sundanese · Syloti Nagri · Syriac · Tagalog · Tagbanwa · Tai Le · Tai Tham · Tai Viet · Tamil · Telugu · Thaana · Thai · Tibetan · Tifinagh · Vai · Yi
Ancient and
historic scripts
Avestan · Brāhmī · Carian · Coptic · Sumero-Akkadian · Cypriot · Egyptian Hieroglyphs · Glagolitic · Gothic · Imperial Aramaic · Inscriptional Pahlavi · Inscriptional Parthian · Kaithi · Kharoshthi · Linear B · Lycian · Lydian · Ogham · Old Italic · Old Persian · Phags-pa · Phoenician · Old South Arabian · Old Turkic · Runic · Ugaritic
Symbols
Cultural, political, and religious symbols · Currency · Mathematical operators and symbols · Phonetic symbols (including IPA)
v · d · eCharacter encodings
Character sets
Early telecommunications
ASCII · ISO/IEC 646 · ISO/IEC 6937 · T.61 · sixbit code pages · Baudot code · Morse code
ISO/IEC 8859
-1 · -2 · -3 · -4 · -5 · -6 · -7 · -8 · -9 · -10 · -11 · -12 · -13 · -14 · -15 · -16
Bibliographic use
ANSEL · ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 · MARC-8
National standards
ArmSCII · CNS 11643 · GOST 10859 · GB 2312 · HKSCS · ISCII · JIS X 0201 · JIS X 0208 · JIS X 0212 · JIS X 0213 · KPS 9566 · KS X 1001 · PASCII · TIS-620 · TSCII · VISCII · YUSCII
EUC
CN · JP · KR · TW
ISO/IEC 2022
CN · JP · KR · CCCII
MacOS codepages ("scripts")
Arabic · CentralEurRoman · ChineseSimp / EUC-CN · ChineseTrad / Big5 · Croatian · Cyrillic · Devanagari · Dingbats · Farsi · Greek · Gujarati · Gurmukhi · Hebrew · Icelandic · Japanese / ShiftJIS · Korean / EUC-KR · Roman · Romanian · Symbol · Thai / TIS-620 · Turkish · Ukrainian
DOS codepages
437 · 720 · 737 · 775 · 850 · 852 · 855 · 857 · 858 · 860 · 861 · 862 · 863 · 864 · 865 · 866 · 869 · Kamenický · Mazovia · MIK · Iran System
Windows codepages
874 / TIS-620 · 932 / ShiftJIS · 936 / GBK · 949 / EUC-KR · 950 / Big5 · 1250 · 1251 · 1252 · 1253 · 1254 · 1255 · 1256 · 1257 · 1258 · 1361 · 54936 / GB18030
EBCDIC codepages
37/1140 · 273/1141 · 277/1142 · 278/1143 · 280/1144 · 284/1145 · 285/1146 · 297/1147 · 420/16804 · 424/12712 · 500/1148 · 838/1160 · 871/1149 · 875/9067 · 930/1390 · 933/1364 · 937/1371 · 935/1388 · 939/1399 · 1025/1154 · 1026/1155 · 1047/924 · 1112/1156 · 1122/1157 · 1123/1158 · 1130/1164 · JEF · KEIS
Platform specific
ATASCII · CDC display code · DEC-MCS · DEC Radix-50 · Fieldata · GSM 03.38 · HP roman8 · PETSCII · TI calculator character sets · ZX Spectrum character set
Unicode / ISO/IEC 10646
UTF-8 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-7 · UTF-EBCDIC · GB 18030 · SCSU · BOCU-1
Miscellaneous codepages
APL · Cork · HZ · IBM code page 1133 · KOI8 · TRON
Related topics
control character (C0 C1) · CCSID · Character encodings in HTML · charset detection · Han unification · ISO 6429/IEC 6429/ANSI X3.64 · mojibake
UTF-8: Information from Answers.com
UTF-8 ( U nicode T ransformation F ormat -8 ) A format in the Unicode coding system that uses from one to four bytes
UTF-1
UTF-1 es una manera de transformar ISO 10646/Unicode en una corriente de octetos. ... UTF-1 es también bastante lento debido a su uso de la división. ...
UTF-1 :: The W2N.net - Wikipedia
Find all the detailed information about 'UTF-1', only at The W2N.net - Wikipedia.
Groove|Asia Directory: UTF-1
UTF-1 is a way of transforming ISO 10646/Unicode into a stream of bytes. ... UTF-1 does not use the C0 and C1 control codes in other encodings – any 0x 00–0x20 or ...
UTF8, Perl and You Presentation
2 - A Very Brief Primer on Character Encoding. it may be the same for 1-byte UTF-8 but... 1-byte UTF-8 is used for code points in the range 0x00 to 0x7F. ...
UTF-8 - Network Dictionary Wiki
[1] Any byte oriented string search algorithm can be used with UTF-8 data (as long as one ... UTF-8 does not require slower mathematical operations such as ...
Unicode Transformation Formats
UTF-1. The first transformation format for Unicode was the UTF-1 specified in Annex G of ... UTF-1's disadvantages led to the invention of UTF-2 alias (filesystem ...
Ape Entertainment is working closely with series creators and Diamond Comic Distributors to make the transition an easy one for everyone involved We are working with Diamond Comic Distributors and everyone else involved to fulfill all orders placed for U T F 1 that were solicited while at Speakeasy said Ape Entertainment s
http://www.comicsbulletin.com/news/11419601493198.htm



















