ANSEL
APL (codepage)
ASCII
ATASCII
ArmSCII
Baudot code
Big5
Binary Ordered Compression for Unicode
C0 and C1 control codes
CCCII
CCSID
CDC display code
CJK
CNS 11643
Character encoding
Character string
Charset detection
Chinese character encoding
Code page 1133
Code page 437
Code page 720
Code page 737
Code page 775
Code page 850
Code page 852
Code page 855
Code page 857
Code page 858
Code page 860
Code page 861
Code page 862
Code page 863
Code page 865
Code page 866
Code page 869
Code page 932
Code page 936
Code page 949
Code page 950
Control character
Cork encoding
DBCS
DEC Radix-50
EBCDIC 037
EBCDIC 1047
EBCDIC 285
EBCDIC 500
EBCDIC 875
EBCDIC 930
EUC-CN
EUC-JP
EUC-KR
EUC-TW
Extended Unix Code
Fieldata
GB18030
GB2312
GBK
GB 18030
GB 2312
GOST 10859
GSM 03.38
HKSCS
HP-UX
HP roman8
HZ (character encoding)
Half-width kana
Han unification
ISCII
ISO-2022
ISO-2022-JP
ISO-2022-KR
ISO/IEC 10646
ISO/IEC 2022
ISO/IEC 6429
ISO/IEC 646
ISO/IEC 6937
ISO/IEC 8859
ISO/IEC 8859-1
ISO/IEC 8859-10
ISO/IEC 8859-11
ISO/IEC 8859-12
ISO/IEC 8859-13
ISO/IEC 8859-14
ISO/IEC 8859-15
ISO/IEC 8859-16
ISO/IEC 8859-2
ISO/IEC 8859-3
ISO/IEC 8859-4
ISO/IEC 8859-5
ISO/IEC 8859-6
ISO/IEC 8859-7
ISO/IEC 8859-8
ISO/IEC 8859-9
ISO 2022
ISO 6438
ISO 646
Iran System encoding standard
JEF codepage
JIS X 0201
APL (codepage)
ASCII
ATASCII
ArmSCII
Baudot code
Big5
Binary Ordered Compression for Unicode
C0 and C1 control codes
CCCII
CCSID
CDC display code
CJK
CNS 11643
Character encoding
Character string
Charset detection
Chinese character encoding
Code page 1133
Code page 437
Code page 720
Code page 737
Code page 775
Code page 850
Code page 852
Code page 855
Code page 857
Code page 858
Code page 860
Code page 861
Code page 862
Code page 863
Code page 865
Code page 866
Code page 869
Code page 932
Code page 936
Code page 949
Code page 950
Control character
Cork encoding
DBCS
DEC Radix-50
EBCDIC 037
EBCDIC 1047
EBCDIC 285
EBCDIC 500
EBCDIC 875
EBCDIC 930
EUC-CN
EUC-JP
EUC-KR
EUC-TW
Extended Unix Code
Fieldata
GB18030
GB2312
GBK
GB 18030
GB 2312
GOST 10859
GSM 03.38
HKSCS
HP-UX
HP roman8
HZ (character encoding)
Half-width kana
Han unification
ISCII
ISO-2022
ISO-2022-JP
ISO-2022-KR
ISO/IEC 10646
ISO/IEC 2022
ISO/IEC 6429
ISO/IEC 646
ISO/IEC 6937
ISO/IEC 8859
ISO/IEC 8859-1
ISO/IEC 8859-10
ISO/IEC 8859-11
ISO/IEC 8859-12
ISO/IEC 8859-13
ISO/IEC 8859-14
ISO/IEC 8859-15
ISO/IEC 8859-16
ISO/IEC 8859-2
ISO/IEC 8859-3
ISO/IEC 8859-4
ISO/IEC 8859-5
ISO/IEC 8859-6
ISO/IEC 8859-7
ISO/IEC 8859-8
ISO/IEC 8859-9
ISO 2022
ISO 6438
ISO 646
Iran System encoding standard
JEF codepage
JIS X 0201
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 (942) characters, or 830584 (943) characters, as sequences of 7-bit codes. Only ISO-2022 compliant character sets can have EUC forms. Up to four coded character sets (referred to as G0, G1, G2, and G3 or as code sets 0, 1, 2, and 3) can be represented with the EUC scheme. G0 is almost always an ISO-646 compliant coded character set (e.g. US-ASCII/KS X 1003/ISO 646:KR in EUC-KR and US-ASCII/the lower half of JIS X 0201 in EUC-JP) that is invoked on GL (i.e. with the most significant bit cleared).
To get the EUC form of an ISO-2022 character, the most significant bit of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 to each of these original 7-bit codes); this allows software to easily distinguish whether a particular byte in a character string belongs to the ISO-646 code or the ISO-2022 (EUC) code.
euc.JP: tech docs, BeOS tools
This site provides technical documents on L10N/I18N and peripheral devices as well as BeOS related tools.
The most commonly-used EUC codes are variable-width encodings with a character belonging to G0 (ISO-646 compliant coded character set) taking one byte and a character belonging to G1 (taken by a 94x94 coded character set) represented in two bytes. The EUC-CN form of GB2312 and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes whereas a single character in EUC-TW can take up to four bytes.
Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors.
Contents
1 EUC-CN
1.1 Related encoding systems
2 EUC-JP
3 EUC-KR
4 EUC-TW
5 See also
6 References
7 External links
EUC-CN
EUC-CN is the usual way to use the GB2312 standard for simplified Chinese characters. Unlike the case of Japanese, the ISO-2022 form of GB2312 is not normally used, though a variant form called HZ was sometimes used on USENET.
EUC-CN can also be used to encode the Unicode-based GB18030 character encoding, which includes traditional characters, although GB18030 is more frequently used without EUC encoding, since GB18030 is already a Unicode encoding. However, GB18030 encoded in EUC-CN is a variable-width encoding, because GB18030 contains more than 8836 (94×94) characters.
Related encoding systems
EUC-JP - Wikipedia
1980年代前半、日本語UNIXシステム諮問委員会がUNIX
An encoding related to EUC-CN is the "748" code used in the WITS typesetting system developed by Beijing's Founder Technology (now obsoleted by its newer FITS typesetting system). The 748 code contains all of GB2312, but is not ISO 2022–compliant and therefore not a true EUC code. (It uses an 8-bit lead byte but distinguishes between a second byte with its most significant bit set and one with its most significant bit cleared, and is therefore more similar in structure to Big5 and other non–ISO 2022–compliant DBCS encoding systems.) The non-GB2312 portion of the 748 code contains traditional and Hong Kong characters and other glyphs used in newspaper typesetting.
EUC-JP
EUC-JP is a variable-width encoding used to represent the elements of three Japanese character set standards, namely JIS X 0208, JIS X 0212, and JIS X 0201.
A character from the lower half of JIS-X-0201 (ASCII, code set 0) is represented by one byte, in the range 0x21 – 0x7E.
A character from the upper half of JIS-X-0201 (half-width kana, code set 2) is represented by two bytes, the first being 0x8E, the second in the range 0xA1 – 0xDF.
A character from JIS-X-0208 (code set 1) is represented by two bytes, both in the range 0xA1 – 0xFE.
A character from JIS-X-0212 (code set 3) is represented by three bytes, the first being 0x8F, the following two in the range 0xA1 – 0xFE.
Groove|Asia Directory: EUC-JP
EUC-JP includes characters represented by up to three bytes whereas a ... In Japan, the EUC-JP encoding is heavily used by Unix or Unix-like operating systems ...
This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by ISO-2022-JP, which is based on the same character set standards.
In Japan, the EUC-JP encoding is heavily used by Unix or Unix-like operating systems (except for HP-UX), while Shift_JIS or its extensions (Windows code page 932 and MacJapanese) are used on other platforms. Therefore, whether Japanese web sites use EUC-JP or Shift_JIS often depends on what OS the author uses.
EUC-JISX0213 is similar to but different from EUC-JP in that two planes of JIS X 0213 take place of JIS-X-0208 and JIS-X-0212. There is a similar relationship between Shift_JIS and Shift-JISX0213.
EUC-KR
EUC-KR is a variable-width encoding to represent Korean text using two coded character sets, KS X 1001 (formerly KS C 5601)1 2 and KS X 1003 (formerly KS C 5636)/ISO 646:KR/US-ASCII. KS X 2901 (formerly KS C 5861) stipulates the encoding and RFC 1557 dubbed it as EUC-KR. A character drawn from KS X 1001 (G1, code set 1) is encoded as two bytes in GR (0xA1-0xFE) and a character from KS X 1003/US-ASCII (G0, code set 0) takes one byte in GL (0x21-0x7E).
It is the most widely used legacy character encoding in Korea on all three major platforms (Unix-like OS, Windows and Mac), but its use has been very slowly decreasing as UTF-8 gains popularity, especially on Linux and Mac OS X. It is usually referred to as Wansung (완성) in Republic of Korea. The default Korean codepage for Windows (code page 949) is a proprietary, but upward compatible extension of EUC-KR referred to as Unified Hangeul Code (통합 완성형, Tonghab Wansunghyung). Mac Korean used in classic Mac OS is also compatible with EUC-KR.
As with most other encodings, UTF-8 is now preferred for new use, solving problems with consistency between platforms and vendors.
EUC-TW
EUC-TW is a variable-width encoding that supports US-ASCII and 16 planes of CNS 11643, each of which is 94x94. It is a rarely used encoding for traditional Chinese characters as used on Taiwan. Big5 is much more common. A character in US-ASCII (G0, code set 0) is encoded as a single byte in GL( 0x21-0x7E) and a character in CNS 11643 plane 1 (code set 1) is encoded as two bytes in GR (0xA1-0xFE). A character in plane 1 through 16 of CNS 11643 (code set 2) is encoded as four bytes with the first byte always being 0x8E(Single Shift 2) and the second byte indicating the plane (the plane number is obtained by subtracting 0xA0 from the second byte). The third and fourth bytes are in GR (0xA1-0xFE). Note that the plane 1 of CNS 11643 is encoded twice as code set 1 and a part of code set 2. UTF-8 is becoming more common than EUC-TW, as with most code pages.
See also
CJK
Japanese language and computers
Korean language and computers
Chinese character encoding
References
^ "KS X 1001:1992". http://examples.oreilly.com/cjkvinfo/AppL/ksx1001.pdf.
^ "KS C 5601:1987". 1988-10-01. http://www.itscj.ipsj.or.jp/ISO-IR/149.pdf.
External links
EUC-JP codeset table (minus the ASCII and halfwidth parts)
GB18030-2000 — The New Chinese National Standard
The New Generation of Pre-Press Software in China—mentions the 748 code
Description of the EUC-TW code (in Chinese)
Manual page of EUC-JISX0213 in Perl Encode module
International Register of Coded Character Sets—The coded character sets of China, Japan, South Korea, North Korea and Taiwan (ISO/IEC)
Chinese, Japanese, and Korean character set standards and encoding systems
Euc-jp
EUC-JP is a variable-width encoding used to represent the elements of three Japanese character set standards, namely JIS X 0208, JIS X 0212, and JIS X 0201. ...
v · d · eCharacter encodings
Category:Character sets
Early telecommunications
ASCII · ISO/IEC 646 · ISO/IEC 6937 · T.61 · sixbit code pages · Baudot code · Morse code
ISO/IEC 8859
-1 · -2 · -3 · -4 · -5 · -6 · -7 · -8 · -9 · -10 · -11 · -12 · -13 · -14 · -15 · -16
Bibliographic use
ANSEL · ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 · MARC-8
National standards
ArmSCII · CNS 11643 · GOST 10859 · GB 2312 · HKSCS · ISCII · JIS X 0201 · JIS X 0208 · JIS X 0212 · JIS X 0213 · KPS 9566 · KS X 1001 · PASCII · TIS-620 · TSCII · VISCII · YUSCII
EUC
CN · JP · KR · TW
ISO/IEC 2022
CN · JP · KR · CCCII
MacOS codepages ("scripts")
Arabic · CentralEurRoman · ChineseSimp / EUC-CN · ChineseTrad / Big5 · Croatian · Cyrillic · Devanagari · Dingbats · Farsi · Greek · Gujarati · Gurmukhi · Hebrew · Icelandic · Japanese / ShiftJIS · Korean / EUC-KR · Roman · Romanian · Symbol · Thai / TIS-620 · Turkish · Ukrainian
DOS codepages
437 · 720 · 737 · 775 · 850 · 852 · 855 · 857 · 858 · 860 · 861 · 862 · 863 · 864 · 865 · 866 · 869 · Kamenický · Mazovia · MIK · Iran System
Windows codepages
874 / TIS-620 · 932 / ShiftJIS · 936 / GBK · 949 / EUC-KR · 950 / Big5 · 1250 · 1251 · 1252 · 1253 · 1254 · 1255 · 1256 · 1257 · 1258 · 1361 · 54936 / GB18030
EBCDIC codepages
37/1140 · 273/1141 · 277/1142 · 278/1143 · 280/1144 · 284/1145 · 285/1146 · 297/1147 · 420/16804 · 424/12712 · 500/1148 · 838/1160 · 871/1149 · 875/9067 · 930/1390 · 933/1364 · 937/1371 · 935/1388 · 939/1399 · 1025/1154 · 1026/1155 · 1047/924 · 1112/1156 · 1122/1157 · 1123/1158 · 1130/1164 · JEF · KEIS
Platform specific
ATASCII · CDC display code · DEC-MCS · DEC Radix-50 · Fieldata · GSM 03.38 · HP roman8 · PETSCII · TI calculator character sets · ZX Spectrum character set
Unicode / ISO/IEC 10646
UTF-8 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-7 · UTF-EBCDIC · GB 18030 · SCSU · BOCU-1
Miscellaneous codepages
APL · Cork · HZ · IBM code page 1133 · KOI8 · TRON
Related topics
control character (C0 C1) · CCSID · charset detection · Han unification · ISO 6429/IEC 6429/ANSI X3.64 · mojibake
SourceForge.net: netatalk: Detail: 520946 - EUC-JP and Shift ...
Get netatalk at SourceForge.net. Fast, secure and free downloads from the largest Open Source applications ... This patch enable to use euc-jp or shift-jis coded ...
EUC-JP Encoding
This section provides a quick introduction of EUC-JP encoding, which maps a JIS X0208 character to a 2-byte sequence by adding 128 (0x80) to both bytes of the ...
j-Rep2excel
You must run this product with javascript enabled | Rep2excel Manager ... Detect & Remove Report Header & Footer in Each Page. Upload And Convert. Rep2excel Manager ...

