'Phags-pa script
0 (number)
1 (number)
2 (number)
3 (number)
4 (number)
5 (number)
6 (number)
7 (number)
8 (number)
9 (number)
A
ANSEL
APL (codepage)
ASCII
ATASCII
Acknowledge character
Ampersand
Apostrophe
Arabic (Unicode block)
Arabic alphabet
Arabic diacritics
Aramaic language#Imperial Aramaic
ArmSCII
Armenian alphabet
Asterisk
At sign
Avestan alphabet
B
BOCU-1
Backslash
Backspace
Balinese script
Bamum language
Batak script
Baudot code
Baybayin
Bell character
Bengali script
Bi-directional text
Big5
Binary Ordered Compression for Unicode
Brāhmī script
Bracket
Braille
Buhid script
Burmese script
Byte order mark
C
C0 and C1 control codes
C1 control code
CCCII
CCSID
CDC display code
CESU-8
CJK Unified Ideographs
CNS 11643
COBOL
Canadian Aboriginal syllabics
Cancel character
Carian script
Carriage Return
Cham alphabet
Character encoding
Character encodings in HTML
Character property (Unicode)
Charset detection
Cherokee syllabary
Circumflex
Code page 1133
Code page 437
Code page 720
Code page 737
Code page 775
Code page 850
Code page 852
Code page 855
Code page 857
Code page 858
Code page 860
Code page 861
Code page 862
Code page 863
Code page 865
Code page 866
Code page 869
Code page 932
Code page 936
Code page 949
Code page 950
Code point
Colon (punctuation)
Combining character
Combining grapheme joiner
Comma
Common Locale Data Repository
Comparison of Unicode encodings
ConScript Unicode Registry
Control Sequence Introducer
Control character
0 (number)
1 (number)
2 (number)
3 (number)
4 (number)
5 (number)
6 (number)
7 (number)
8 (number)
9 (number)
A
ANSEL
APL (codepage)
ASCII
ATASCII
Acknowledge character
Ampersand
Apostrophe
Arabic (Unicode block)
Arabic alphabet
Arabic diacritics
Aramaic language#Imperial Aramaic
ArmSCII
Armenian alphabet
Asterisk
At sign
Avestan alphabet
B
BOCU-1
Backslash
Backspace
Balinese script
Bamum language
Batak script
Baudot code
Baybayin
Bell character
Bengali script
Bi-directional text
Big5
Binary Ordered Compression for Unicode
Brāhmī script
Bracket
Braille
Buhid script
Burmese script
Byte order mark
C
C0 and C1 control codes
C1 control code
CCCII
CCSID
CDC display code
CESU-8
CJK Unified Ideographs
CNS 11643
COBOL
Canadian Aboriginal syllabics
Cancel character
Carian script
Carriage Return
Cham alphabet
Character encoding
Character encodings in HTML
Character property (Unicode)
Charset detection
Cherokee syllabary
Circumflex
Code page 1133
Code page 437
Code page 720
Code page 737
Code page 775
Code page 850
Code page 852
Code page 855
Code page 857
Code page 858
Code page 860
Code page 861
Code page 862
Code page 863
Code page 865
Code page 866
Code page 869
Code page 932
Code page 936
Code page 949
Code page 950
Code point
Colon (punctuation)
Combining character
Combining grapheme joiner
Comma
Common Locale Data Repository
Comparison of Unicode encodings
ConScript Unicode Registry
Control Sequence Introducer
Control character
UTF-EBCDIC is a character encoding used to represent Unicode characters. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing ASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.
To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first. The main difference between this encoding and UTF-8 is that it allows unicode code points U+0080 through U+009F (the C1 control codes) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this 101XXXXX was used instead of 10XXXXXX as the format for later bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, UTF-EBCDIC will generally produce larger output for the same input data than UTF-8.
This transformation leaves the data in an ASCII based format, so a reversible byte-byte transform is made on this data using a lookup table to make it as close to normal EBCDIC code pages as feasible. These steps can be easily reversed to recover the unicode code points.
Generally, this encoding form is rarely used, even on EBCDIC based mainframes for which it was designed. IBM EBCDIC based mainframe operating systems, like z/OS, usually use UTF-16 for complete Unicode support. For example, DB2 UDB, COBOL, PL/I, Java and the IBM XML toolkit support UTF-16 on IBM mainframes.
Codepage layout
UTR #16: UTF-EBCDIC
The term UTF-EBCDIC stands for EBCDIC-friendly Unicode (or UCS) Transformation Format. ... The UTF-EBCDIC encoding is derived from the Unicode scalar values ...
There are 160 characters with single-byte encodings in UTF-EBCDIC (compared to 128 in UTF-8). As you can see, the single byte portion is similar to IBM-1047 instead of IBM-37 due to the location of the square brackets. CCSID 37 has [] at hex BA and BB instead of at hex AD and BD respectively.
UTF-EBCDIC
—0
—1
—2
—3
—4
—5
—6
—7
—8
—9
—A
—B
—C
—D
—E
—F
0−
NUL
0000
0
SOH
0001
1
STX
0002
2
ETX
0003
3
ST
009C
4
HT
0009
5
SSA
0086
6
DEL
007F
7
EPA
0097
8
RI
008D
9
SS2
008E
10
VT
000B
11
FF
000C
12
CR
000D
13
SO
000E
14
SI
000F
15
1−
DLE
0010
16
DC1
0011
17
DC2
0012
18
DC3
0013
19
OSC
009D
20
LF
000A
21
BS
0008
22
ESA
0087
23
CAN
0018
24
EM
0019
25
PU2
0092
26
SS3
008F
27
FS
001C
28
GS
001D
29
RS
001E
30
US
001F
31
2−
PAD
0080
32
HOP
0081
33
BPH
0082
34
NBH
0083
35
IND
0084
36
NEL
0085
37
ETB
0017
38
ESC
001B
39
HTS
0088
40
HTJ
0089
41
VTS
008A
42
PLD
008B
43
PLU
008C
44
ENQ
0005
45
ACK
0006
46
BEL
0007
47
3−
DCS
0090
48
PU1
0091
49
SYN
0016
50
STS
0093
51
CCH
0094
52
MW
0095
53
SPA
0096
54
EOT
0004
55
SOS
0098
56
SGCI
0099
57
SCI
009A
58
CSI
009B
59
DC4
0014
60
NAK
0015
61
PM
009E
62
SUB
001A
63
4−
SP
0020
64
•
+00
65
•
+01
66
•
+02
67
•
+03
68
•
+04
69
•
+05
70
•
+06
71
•
+07
72
•
+08
73
•
+09
74
.
002E
75
<
003C
76
(
0028
77
+
002B
78
|
007C
79
5−
&
0026
80
•
+0A
81
•
+0B
82
•
+0C
83
•
+0D
84
•
+0E
85
•
+0F
86
•
+10
87
•
+11
88
•
+12
89
!
0021
90
$
0024
91
*
002A
92
)
0029
93
;
003B
94
^
005E
95
6−
-
002D
96
/
002F
97
•
+13
98
•
+14
99
•
+15
100
•
+16
101
•
+17
102
•
+18
103
•
+19
104
•
+1A
105
•
+1B
106
,
002C
107
%
0025
108
_
005F
109
>
003E
110
?
003F
111
7−
•
+1C
112
•
+1D
113
•
+1E
114
•
+1F
115
2
116
2
117
2
118
2
119
2
120
`
0060
121
:
003A
122
#
0023
123
@
0040
124
'
0027
125
=
003D
126
"
0022
127
8−
2
00A0
128
a
0061
129
b
0062
130
c
0063
131
d
0064
132
e
0065
133
f
0066
134
g
0067
135
h
0068
136
i
0069
137
2
00C0
138
2
00E0
139
2
0100
140
2
0120
141
2
0140
142
2
0160
143
9−
2
0180
144
j
006A
145
k
006B
146
l
006C
147
m
006D
148
n
006E
149
o
006F
150
p
0070
151
q
0071
152
r
0072
153
2
01A0
154
2
01C0
155
2
01E0
156
2
0200
157
2
0220
158
2
0240
159
A−
2
0260
160
~
007E
161
s
0073
162
t
0074
163
u
0075
164
v
0076
165
w
0077
166
x
0078
167
y
0079
168
z
007A
169
2
0280
170
2
02A0
171
2
02C0
172
005B
173
2
02E0
174
2
0300
175
B−
2
0320
176
2
0340
177
2
0360
178
2
0380
179
2
03A0
180
2
03C0
181
2
03E0
182
3
183
3
0400
184
3
0800
185
3
0C00
186
3
1000
187
3
1400
188
005D
189
3
1800
190
3
1C00
191
C−
{
007B
192
A
0041
193
B
0042
194
C
0043
195
D
0044
196
E
0045
197
F
0046
198
G
0047
199
H
0048
200
I
0049
201
3
2000
202
3
2400
203
3
2800
204
3
2C00
205
3
3000
206
3
3400
207
D−
}
007D
208
J
004A
209
K
004B
210
L
004C
211
M
004D
212
N
004E
213
O
004F
214
P
0050
215
Q
0051
216
R
0052
217
3
3800
218
3
3C00
219
4
4000
220
4
8000
221
4
10000
222
4
18000
223
E−
\
005C
224
4
20000
225
S
0053
226
T
0054
227
U
0055
228
V
0056
229
W
0057
230
X
0058
231
Y
0059
232
Z
005A
233
4
28000
234
4
30000
235
4
38000
236
5
40000
237
5
100000
238
239
F−
0
0030
240
1
0031
241
2
0032
242
3
0033
243
4
0034
244
5
0035
245
6
0036
246
7
0037
247
8
0038
248
9
0039
249
250
251
252
253
254
APC
009F
255
—0
—1
—2
—3
—4
—5
—6
—7
—8
—9
—A
—B
—C
—D
—E
—F
Utf-ebcdic
Utf-ebcdic on WN Network delivers the latest Videos and Editable pages for News & Events, including Entertainment, Music, Sports, Science and more, ...
White cells containing a large single-digit number are the start bytes for a sequence of that many bytes. The unbolded hexadecimal code point number shown in the cell is the lowest character value encoded using that start byte (this value can be greater than the value which would be obtained by following the start byte with continuation bytes which are all 65 (hex 0x41), if this would result in an invalid overlong form). Reduced payload of a continuation byte (5 bits, compared to 6 bits in UTF-8) results in different ranges of code points represented by code sequences of the same length.
Orange cells with one dot are continuation bytes. The hexadecimal number shown after a "+" plus sign is the value of the 5 bits they add.
See also
UTF-1
BOCU-1
External links
http://www.unicode.org/reports/tr16/ Unicode Technical Report #16: the definition of UTF-EBCDIC
v · d · eUnicode
Unicode
Unicode Consortium · ISO/IEC 10646 (Universal Character Set)
Code points
Code point · Plane · Block · Mapping characters · Character property · Character charts
Characters
Special purpose
BOM · Combining grapheme joiner · Left-to-right mark and Right-to-left mark · Zero-width non-breaking space · Zero-width joiner · Zero-width non-joiner · Zero-width space
Miscellaneous lists
Combining character · Duplicate characters · Graphic characters
Processing
Algorithms
Bi-directional text · Collation (ISO 14651) · Equivalence
Transformation
BOCU-1 · CESU-8 · UTF-1 · UTF-7 · UTF-8 · UTF-9/UTF-18 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-EBCDIC · Punycode · SCSU · Comparison
On pairs
of code points
Equivalence · Combining character · Duplicates · Homoglyph · Precomposed character (List) · Compatibility characters · Z-variant
Usage
Unicode and e-mail · Unicode and HTML · Character entity references · Unicode input · Internationalized domain name · Numeric character reference · Private Use U+F8FF · Typefaces (fonts) ·
Related standards
Common Locale Data Repository (CLDR) · GB 18030 · Han unification · ISO/IEC 8859 (8-bit encodings) · ISO 14651 (Collation) · ISO 15924 (Script codes)
Related topics
Anomalies · ConScript Unicode Registry · Ideographic Rapporteur Group · International Components for Unicode · MUFI · People related to Unicode
Scripts and symbols in Unicode
Common and
inherited scripts
Combining marks · Diacritics · Punctuation · Space
Modern scripts
Arabic (diacritics · Unicode blocks) · Armenian · Balinese · Batak · Bamum · Bengali · Bopomofo · Braille · Buginese · Buhid · Canadian Aboriginal · Cham · Cherokee · CJK Unified Ideographs (Han) · Cyrillic · Deseret · Devanagari · Ethiopic · Georgian · Greek · Gujarati · Gurmukhi · Kanji · Hanja · Hán tự · Hangul · Hanunoo · Hebrew (diacritics) · Hiragana · Javanese · Kannada · Katakana · Kayah Li · Khmer · Lao · Latin · Lepcha · Limbu · Lisu · Malayalam · Mandaic · Meetei Mayek · Mongolian · Manchu · Myanmar · N'Ko · New Tai Lue · Ol Chiki · Oriya · Osmanya · Rejang · Samaritan · Saurashtra · Shavian · Sinhala · Sundanese · Syloti Nagri · Syriac · Tagalog · Tagbanwa · Tai Le · Tai Tham · Tai Viet · Tamil · Telugu · Thaana · Thai · Tibetan · Tifinagh · Vai · Yi
Ancient and
historic scripts
Avestan · Brāhmī · Carian · Coptic · Sumero-Akkadian · Cypriot · Egyptian Hieroglyphs · Glagolitic · Gothic · Imperial Aramaic · Inscriptional Pahlavi · Inscriptional Parthian · Kaithi · Kharoshthi · Linear B · Lycian · Lydian · Ogham · Old Italic · Old Persian · Phags-pa · Phoenician · Old South Arabian · Old Turkic · Runic · Ugaritic
Symbols
Cultural, political, and religious symbols · Currency · Mathematical operators and symbols · Phonetic symbols (including IPA)
v · d · eCharacter encodings
Character sets
Early telecommunications
ASCII · ISO/IEC 646 · ISO/IEC 6937 · T.61 · sixbit code pages · Baudot code · Morse code
ISO/IEC 8859
-1 · -2 · -3 · -4 · -5 · -6 · -7 · -8 · -9 · -10 · -11 · -12 · -13 · -14 · -15 · -16
Bibliographic use
ANSEL · ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 · MARC-8
National standards
ArmSCII · CNS 11643 · GOST 10859 · GB 2312 · HKSCS · ISCII · JIS X 0201 · JIS X 0208 · JIS X 0212 · JIS X 0213 · KPS 9566 · KS X 1001 · PASCII · TIS-620 · TSCII · VISCII · YUSCII
EUC
CN · JP · KR · TW
ISO/IEC 2022
CN · JP · KR · CCCII
MacOS codepages ("scripts")
Arabic · CentralEurRoman · ChineseSimp / EUC-CN · ChineseTrad / Big5 · Croatian · Cyrillic · Devanagari · Dingbats · Farsi · Greek · Gujarati · Gurmukhi · Hebrew · Icelandic · Japanese / ShiftJIS · Korean / EUC-KR · Roman · Romanian · Symbol · Thai / TIS-620 · Turkish · Ukrainian
DOS codepages
437 · 720 · 737 · 775 · 850 · 852 · 855 · 857 · 858 · 860 · 861 · 862 · 863 · 864 · 865 · 866 · 869 · Kamenický · Mazovia · MIK · Iran System
Windows codepages
874 / TIS-620 · 932 / ShiftJIS · 936 / GBK · 949 / EUC-KR · 950 / Big5 · 1250 · 1251 · 1252 · 1253 · 1254 · 1255 · 1256 · 1257 · 1258 · 1361 · 54936 / GB18030
EBCDIC codepages
37/1140 · 273/1141 · 277/1142 · 278/1143 · 280/1144 · 284/1145 · 285/1146 · 297/1147 · 420/16804 · 424/12712 · 500/1148 · 838/1160 · 871/1149 · 875/9067 · 930/1390 · 933/1364 · 937/1371 · 935/1388 · 939/1399 · 1025/1154 · 1026/1155 · 1047/924 · 1112/1156 · 1122/1157 · 1123/1158 · 1130/1164 · JEF · KEIS
Platform specific
ATASCII · CDC display code · DEC-MCS · DEC Radix-50 · Fieldata · GSM 03.38 · HP roman8 · PETSCII · TI calculator character sets · WISCII · ZX Spectrum character set
Unicode / ISO/IEC 10646
UTF-8 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-7 · UTF-1 · UTF-EBCDIC · GB 18030 · SCSU · BOCU-1
Miscellaneous codepages
APL · Cork · HZ · IBM code page 1133 · KOI8 · TRON
Related topics
control character (C0 C1) · CCSID · Character encodings in HTML · charset detection · Han unification · ISO 6429/IEC 6429/ANSI X3.64 · mojibake
RealTech: Resource Library
UTF-EBCDIC is a character encoding used to represent Unicode characters. ... Details on UTF-EBCDIC are defined in Unicode Technical Report #16. ...
Groove|Asia Directory: UTF-EBCDIC
UTF-EBCDIC is a character encoding used to represent Unicode characters. ... Details on UTF-EBCDIC are defined in Unicode Technical Report #16. ...
Extended Binary Coded Decimal Interchange Code - Wikipedia ...
EBCDIC ( /ˈɛb sɨdɪk/) was devised in 1963 and 1964 by IBM and was announced with ... IBM mainframes support UTF-16, but they do not support UTF-EBCDIC natively. ...
UTF-EBCDIC
UTF-EBCDIC är a teckenencoding van vid föreställ Unicode tecken. ... Specificerar på UTF-EBCDIC definieras i Unicode den tekniska rapporten #16. ...
Utf-EBCDIC
Utf-EBCDIC is a karakter het coderen gebruikt om te vertegenwoordigen Unicode karakters. ... De details op utf-EBCDIC worden bepaald in Unicode Technisch Rapport #16. ...
UTF-EBCDIC - Everything on UTF-EBCDIC (information, latest ...
The main difference between this encoding and UTF-8 is it allows code points 80 through 9F (which map to EBCDIC control codes) to be represented as a single byte. ...
utf8 - search.cpan.org
The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based platforms) ...




