When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts […] However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription.[2]
Uses
The difference between superscript/subscript and numerator/denominator glyphs. In many popular computer fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs.
The intended use[2] when these characters were added to Unicode was to produce true superscripts and subscripts so that chemical and algebraic formulas could be written without markup. Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H2O" (with subscript markup).
In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs,[3][4] which are aligned with the cap line and the baseline, respectively. When used with the solidus or the Fraction Slash, they produce an almost typographically correct diagonal fraction, such as ³/₄ for the ¾ glyph. Super and subscript markup does not produce a correct fraction (compare markup 3/4 with precomposed ¾). The change also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters.
Unicode intended that diagonal fractions be rendered by a different mechanism: the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts), it instructs the layout system that a fraction such as ¾ is to be rendered using automatic glyph substitution.[5][a] User-end support was quite poor for a number of years, but fonts,[b] browsers,[c] word processors,[d] desktop publishing software[e] and others increasingly support the intended Unicode behavior. This browser and your default font render it as 3⁄4. (See Slash (punctuation)#Fractions for rendering in various other fonts.)
The most common superscript digits (1, 2, and 3) were included in ISO-8859-1 and were therefore carried over into those code points in the Latin-1 range of Unicode. The remainder were placed along with basic arithmetical symbols, and later some Latin subscripts, in a dedicated block at U+2070 to U+209F. The table below shows these characters together. Each superscript or subscript character is preceded by a baseline x to show the height of subscripting/superscripting.
Unicode characters
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
U+00Bx
x²
x³
x¹
U+207x
x⁰
xⁱ
x⁴
x⁵
x⁶
x⁷
x⁸
x⁹
x⁺
x⁻
x⁼
x⁽
x⁾
xⁿ
U+208x
x₀
x₁
x₂
x₃
x₄
x₅
x₆
x₇
x₈
x₉
x₊
x₋
x₌
x₍
x₎
U+209x
xₐ
xₑ
xₒ
xₓ
xₔ
xₕ
xₖ
xₗ
xₘ
xₙ
xₚ
xₛ
xₜ
x
x
x
Reserved for future use.
Other characters from Latin-1 not related to super- or sub-scripts.
Other superscript and subscript characters
Unicode also includes codepoints for subscript and superscript characters that are intended for semantic usage, in the following blocks:[1][6]
The Kanbun block has superscripted annotation characters used in Japanese copies of Classical Chinese texts: ㆒ ㆓ ㆔ ㆕ ㆖ ㆗ ㆘ ㆙ ㆚ ㆛ ㆜ ㆝ ㆞ ㆟.
The Tifinagh block has one superscript letter : ⵯ.
The Unified Canadian Aboriginal Syllabics and its Extended blocks contain several mostly consonant-only letters to indicate syllable coda called Finals, along with some characters that indicate syllable medial known as Medials: Main block ᐜ ᐝ ᐞ ᐟ ᐠ ᐡ ᐢ ᐣ ᐤ ᐥ ᐦ ᐧ ᐨ ᐩ ᐪ ᑉ ᑊ ᑋ ᒃ ᒄ ᒡ ᒢ ᒻ ᒼ ᒽ ᒾ ᓐ ᓑ ᓒ ᓪ ᓫ ᔅ ᔆ ᔇ ᔈ ᔉ ᔊ ᔋ ᔥ ᔾ ᔿ ᕀ ᕁ ᕐ ᕑ ᕝ ᕪ ᕻ ᕯ ᕽ ᖅ ᖕ ᖖ ᖟ ᖦ ᖮ ᗮ ᘁ ᙆ ᙇ ᙚ ᙾ ᙿ; Extended block: ᣔ ᣕ ᣖ ᣗ ᣘ ᣙ ᣚ ᣛ ᣜ ᣝ ᣞ ᣟ ᣳ ᣴ ᣵ.
Combining superscript
The Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over the dotted circle placeholder ◌: ◌ͣ ◌ͤ ◌ͥ ◌ͦ ◌ͧ ◌ͨ ◌ͩ ◌ͪ ◌ͫ ◌ͬ ◌ͭ ◌ͮ ◌ͯ.
Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution by the browser. Shaded cells mark petite capitals that are not very distinct from minuscules, and Greek letters that are indistinguishable from Latin, and so would not be expected to be supported by Unicode.
Little punctuation is encoded. Parentheses are shown in the basic superscript block above, and the exclamation mark ⟨ꜝ⟩ is shown in the IPA table below. In a supporting font, a question mark may be created with a superscript gelded question mark and a combining dot below: ⟨ˀ̣⟩.
Latin superscript and subscript letters
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Superscript capital
ᴬ
ᴮ
ꟲ
ᴰ
ᴱ
ꟳ
ᴳ
ᴴ
ᴵ
ᴶ
ᴷ
ᴸ
ᴹ
ᴺ
ᴼ
ᴾ
ꟴ
ᴿ
*
ᵀ
ᵁ
ⱽ
ᵂ
–
–
–
Superscript petite cap
*
𐞄
*
*
–
𐞒
𐞖
ᶦ
–
–
ᶫ
–
ᶰ
*
–
𐞪
–
ᶸ
𐞲
Superscript minuscule
ᵃ
ᵇ
ᶜ
ᵈ
ᵉ
ᶠ
ᵍ
ʰ
ⁱ
ʲ
ᵏ
ˡ
ᵐ
ⁿ
ᵒ
ᵖ
𐞥
ʳ
ˢ
ᵗ
ᵘ
ᵛ
ʷ
ˣ
ʸ
ᶻ
Overscript small cap
◌ᷛ
◌ᷞ
◌ᷟ
◌ᷡ
◌ᷢ
Overscript minuscule
◌ͣ
◌ᷨ
◌ͨ
◌ͩ
◌ͤ
◌ᷫ
◌ᷚ
◌ͪ
◌ͥ
–
◌ᷜ
◌ᷝ
◌ͫ
◌ᷠ
◌ͦ
◌ᷮ
–
◌ͬ
◌ᷤ
◌ͭ
◌ͧ
◌ͮ
◌ᷱ
◌ͯ
–
◌ᷦ
Subscript minuscule
ₐ
–
–
–
ₑ
–
–
ₕ
ᵢ
ⱼ
ₖ
ₗ
ₘ
ₙ
ₒ
ₚ
–
ᵣ
ₛ
ₜ
ᵤ
ᵥ
*
ₓ
*
*
Underscript minuscule
◌᷊
◌ᪿ
*Superscript versions of S, of petite capital A, D, E and P, of ƀ, and subscript versions of w, y and z have been accepted for a future version of the Unicode Standard.[8][9][10][11][9]
Additional Latin characters
Æ
Ƀ
Ǝ
Ŋ
Superscript capital
ᴭ
ᴯ
ᴲ
ᴻ
Superscript minuscule
𐞃
*
ᵊ
ᵑ
Overscript minuscule
◌ᷔ
◌ᷪ
Subscript minuscule
ₔ
Some of these superscript capitals are small caps in the source documents in the Unicode proposals.
^ abcIn some fonts, Latin alpha ᵅ and upsilon ᶹ can be used as superscript Greek alpha and upsilon. ᵋ and ᶥ are also officially Latin letters, but display the same as Greek.
*Superscript versons of Greek psi and omega have been accepted for a future version of the Unicode Standard.[10][9]
Superscript and subscript ё, ї, й, ў etc. are handled with diacritics, ⟨𞀵̈ 𞁌̈ 𞀸̆ 𞁁̆⟩etc. Many of the Cyrillic characters were added to the Cyrillic Extended-D block, which was added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023.
The Latin Extended-F block was created for the remaining superscript IPA letters. They are supported by the free Gentium Plus and Andika fonts. Additional superscript characters for historical and para-IPA letters have been accepted for future versions of the Unicode Standard.[11][9]
Consonant letters
The Unicode characters for superscript (modifier) IPA and extIPA consonant letters are as follows. The entire Latin Extended-F block is dedicated to superscript IPA. Characters for sounds with secondary articulation are set off in parentheses and placed below the base letters.
IPA and extIPA consonants, along with superscript variants and their Unicode code points
The spacing diacritic for ejective consonants, U+2BC, works with superscript letters despite not being superscript itself: ⟨ᵖʼᵗʼᶜʼᵏˣʼ⟩. If a distinction needs to be made, the combining apostrophe U+315 may be used: ⟨ᵖ̕ᵗ̕ᶜ̕ᵏˣ̕⟩. The spacing diacritic should be used for a baseline letter with a superscript release, such as [tˢʼ] or [kˣʼ], where the scope of the apostrophe includes the non-superscript letter, but the combining apostrophe U+315 might be used to indicate a weakly articulated ejective consonant like [ᵗ̕] or [ᵏ̕], where the whole consonant is written as a superscript, or together with U+2BC when separate apostrophes have scope over the base and modifier letters, as in ⟨pʼᵏˣ̕⟩.[12]
Spacing diacritics, as in ⟨tʲ⟩, cannot be secondarily superscripted in plain text: ⟨ᵗʲ⟩. (In this instance, the old IPA letter for [tʲ], ⟨ƫ⟩, has a superscript variant in Unicode, U+1DB5 ⟨ᶵ⟩, but that is not generally the case.)
Among older letters, ⟨ꜧ⟩ (U+A727) was a graphic variant of ⟨ɮ⟩. Its superscript is supported at ⟨ꭜ⟩ (U+AB5C). The most common letters with palatal hook are also supported; they are displayed in the table above. IPA once had an idiosyncratic curl on some of the palatalized letters: these are the fricative letters ⟨ʆʓ⟩. Their superscript forms have been accepted for a future version of the Unicode Standard.[11][9] The retired letters ⟨ƞ⟩ and ⟨ɼ⟩ have also been accepted for a future version of the Unicode Standard.[11][9]
Among para-IPA letters, superscript Sinological ⟨ȡȴȵȶ⟩ have been accepted for a future version of the Unicode Standard.[10][9]
Superscripts of the Bantuist labio-dental plosives ⟨ȹ⟩ and ⟨ȸ⟩ have been accepted for a future version of the Unicode Standard.[10][9]
The central semivowels ⟨ɉ⟩, ⟨ɥ̶⟩, and ⟨w̶⟩ have also been accepted for a future version of the Unicode Standard.[10][9][13]
Old-style click letters have been accepted for a future version of the Unicode Standard.[14][9]
Vowel letters
The Unicode characters for superscript (modifier) IPA vowel letters, plus a pair of extended letters ⟨ᵻᵿ⟩ found in English dictionaries, are as follows. Recently retired alternative letters such as ⟨ɩɷ⟩ are also supported; they are set off in parentheses and placed below the standard IPA letters:
IPA vowels and superscript variants
Front
Central
Back
Close
i ⁱ 2071
y ʸ 02B8
ɨ ᶤ 1DA4
ʉ ᶶ 1DB6
ɯ ᵚ 1D5A
u ᵘ 1D58
Near-close
ɪ ᶦ 1DA6 (ɩ ᶥ) 1DA5
ʏ 𐞲 107B2
(ᵻ ᶧ) 1DA7
(ᵿ)
(ω)
ʊ ᶷ 1DB7 (ɷ 𐞤) 107A4
Close-mid
e ᵉ 1D49
ø 𐞢 107A2
ɘ 𐞎 1078E
ɵ ᶱ 1DB1
ɤ 𐞑 10791
o ᵒ 1D52
Mid
ə ᵊ 1D4A
Open-mid
ɛ ᵋ 1D4B
œ ꟹ A7F9
ɜ ᶟ 1D9F (ᴈ ᵌ) 1D4C
ɞ 𐞏 1078F
ʌ ᶺ 1DBA
ɔ ᵓ 1D53
Near-open
æ 𐞃 10783
ɶ 𐞣 107A3
ɐ ᵄ 1D44
ɑ ᵅ 1D45
ɒ ᶛ 1D9B
Open
a ᵃ 1D43
The precomposed Unicode rhotic vowel letters ⟨ɚɝ⟩ are not directly supported. The rhotic diacritic U+02DE ◌˞ should be used instead: ⟨ᵊ˞ ᶟ˞⟩.[15]
⟨ɜ⟩ and ⟨ᶟ⟩ are reversedɛ. The older IPA turnedɛ, ⟨ᴈ⟩, is also supported, at U+1D4C ⟨ᵌ⟩. However, the briefly resurrected vowel letter ⟨ʚ⟩ (U+029A) is not supported, only its reversed replacement ⟨ɞ⟩ is.
Among older letters, ⟨ᴜ⟩ (U+1D1C), a graphic variant of ⟨ʊ⟩, is supported at ⟨ᶸ⟩ (U+1DB8)[16].
Among para-IPA letters, Sinological superscript ⟨ɿʅʮʯ ⟩ have been accepted for a future version of the Unicode Standard.[10][9][13]
Length marks
The two length marks are also supported:
Length marks
Long
Half-long
ː 𐞁 10781
ˑ 𐞂 10782
These are used to add length to another superscript, such as ⟨Cʰ𐞁⟩ or ⟨Cʰ𐞂⟩ for long aspiration.
Wildcards
Superscript wildcards (full caps) are largely supported: e.g. ᴺC (prenasalized consonant), ꟲN (prestopped nasal), Pꟳ (fricative release), NᴾF (epenthetic plosive), CVNᵀ (tone-bearing syllable), Cᴸ (liquid or lateral release), Cᴿ (rhotic or resonant release), Vᴳ (off-glide/diphthong), Cⱽ (fleeting vowel). Superscript S for sibilant release has been accepted for a future version of the Unicode Standard;[13][9] superscript Ʞ for fleeting/epenthetic click has not. Other basic Latin superscript wildcards for tone and weak indeterminate sounds, as described in the article on the International Phonetic Alphabet, are mostly supported. (See table in previous section.)
Combining marks and subscripts
In addition, a very few IPA letters beyond the basic Latin alphabet have combining forms or are supported as subscripts:
Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols.[1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.
The Number Forms block contains several precomposed fractions: ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ⅟ ↉.
The Letterlike Symbols block contains a few symbols composed of subscript and superscript characters: ℀ ℁ ℅ ℆ № ℠ ™ ⅍.
The Enclosed Alphanumeric Supplement block contains three superscript abbreviations 🅪 🅫 🅬: MC for marque de commerce (trademark), MD for marque déposée (registered trademark), both used in Canada; MR for marca registrada (registered trademark) in Spanish and Portuguese speaking countries.[17]
^Superscript ⟨ç⟩ is composed of superscript c and a combiningcedilla, which should display properly in a good font. Superscript c was specifically requested for this purpose in Unicode proposal L2/03-180.
^U+02E4ˤMODIFIER LETTER SMALL REVERSED GLOTTAL STOP is the superscript variant of U+0295ʕLATIN LETTER PHARYNGEAL VOICED FRICATIVE and is defined for IPA use. The similar character U+02C1ˁMODIFIER LETTER REVERSED GLOTTAL STOP is a reversed U+02C0ˀMODIFIER LETTER GLOTTAL STOP, perhaps a gelded reversed question mark. Fonts are inconsistent in whether they look different and what the difference is.
^In Microsoft fonts, superscript ⟨ɫ⟩ was erroneously designed as a superscript ⟨ꬸ⟩.
^U+A71D ⟨ꜝ⟩ and A71E ⟨ꜞ⟩ were adopted as the Africanist equivalents of the IPA characters ⟨ꜜ⟩ downstep and ⟨ꜛ⟩ upstep. The correspondence of U+A71D ⟨ꜝ⟩ to the IPA click letter ⟨ǃ⟩ is thus accidental. Coincidentally, U+A71E ⟨ꜞ⟩ serves as the superscript variant of the extIPA percussive consonant ⟨¡⟩; the other percussive letters, ⟨ʬ⟩ and ⟨ʭ⟩, do not have superscript support in Unicode.
^This is actually the Vietnamese diacritic dấu hỏi, not specifically IPA, but graphically both are gelded question marks.