The ``Atomic Theory´´ of Unicode

Jonathan Coxhead

Introduction

I wonder how many ``primitive characters´´ are encoded in Unicode?

This question has diverted me from several perspectives, and here I attempt to answer it.

I posted ``Some thoughts on character decomposition´´ on 4th June 1999 to the Unicode mailing list. Since then I have made a more thorough examination of the ideas I considered there. The main motivation is to simplify Unicode for engineers by providing more structure within the standard: this allows a lot of characters to be implemented by following a few clearly-stated rules; and at the same time, make the character set more extensible, thereby making it more universal.

It has the side effect of giving more control to the users of the standard by ``opening it up´´ so that people in special fields (e g, mathematics, phonetics), or those who just want novel effects in text, can have them without needing to petition the standardising body. This, I think, is what makes it more than just an exercise in classification.

Since this has been an exercise in trying to understand the internal structure of the U C S, I have called it the ``Atomic Theory´´ of Unicode. Maybe the analogy with chemistry would be closer, as single characters are like atoms, able to interact in their own right, or to join in various ways to make molecules.

My aim has been to identify the largest possible set of semantic decompositions, by using (or abusing) the ``markup´´ tags present in the decomposition fields of UNIDATA.TXT as explicit modifier characters, and by making explicit some of the information that is only represented in the name or visual appearance of the character. This is done with a mixture of existing combining characters, some new combining characters, and a new type of character called a SEMANTIC.

This resolves another question, as well: there is a script alphabet in the U C S, consisting of the characters a B Ee F g H I Ll M o P R Vv. There is also a ``turned´´ alphabet of Aa Cc Ee Ff h k m r t v w y (the longest word you can write upside-down in Unicode is `aftereffect´). Similar remarks apply to other alphabets. On one hand, it doesn´t make sense to have such an arbitrary set of characters; on the other hand, there is no obvious requirement for the others. The resolution is to give them all decompositions, putting them all on an equal footing.

The character START GROUP is needed to make this work. It is an open bracket, like LEFT-TO-RIGHT OVERRIDE but without any directional implication, terminated in the same way: by POP DIRECTIONAL FORMATTING.

I see the value of a decomposition as lying in 2 places: firstly, it provides new structure to existing characters, which can let rendering software make substitutions in an intelligent way, and thereby increase the readability of text to everyone (in other words, `R´ is better than `?´ as a rendering of DOUBLE-STRUCK CAPITAL R); and second, it may be productive as a means for characters to be generated without having to get new characters encoded (in other words, it gives access to (*)DOUBLE-STRUCK CAPITAL F, should anyone need it).

The second point is important, because it allows us to recapitulate the way many characters entered common use in the first place. The character LATIN SMALL LETTER TURNED Y did not just appear: it was adopted because a new symbol was needed, and the typographic technology made it convenient. So it seems sensible to acknowledge that LATIN SMALL LETTER TURNED Y is a LATIN SMALL LETTER Y that has had some sort of process applied to it.

I have also listed the characters whose definition would be affected. It is intended to be complete. An accompanying Tcl programme, vunicode, reads in files UNIDATA.TXT and PUDATA.TXT that define the available characters, and also reads this file and locates the new decompositions. Then it emits 2 files: prim.txt, a list of primitive characters, and comp.txt, a list of composite characters with decomposition. It checks there are no errors, as far as possible.

New ``semantic decompositions´´

SEMANTIC BLACK-LETTER

This requests that a black-letter, or fraktur, font be used. Certain mathematical symbols are conventionally written this way, and German mathematical publishing sometimes uses fraktur rather then heavy (or bold) for vectors.

There are 5 black-letter characters in the U C S.

BLACK-LETTER CAPITAL C=LATIN CAPITAL LETTER C+SEMANTIC BLACK-LETTER
BLACK-LETTER CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC BLACK-LETTER
BLACK-LETTER CAPITAL I=LATIN CAPITAL LETTER I+SEMANTIC BLACK-LETTER
BLACK-LETTER CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC BLACK-LETTER
BLACK-LETTER CAPITAL Z=LATIN CAPITAL LETTER Z+SEMANTIC BLACK-LETTER

The lower-case alphabet is available as LATIN SMALL LETTER whatever+SEMANTIC BLACK-LETTER, and because these are canonical decompositions, the resulting output would be completely compatible, visually and for all processing purposes, with the 5 precomposed forms already encoded.

In handwriting, these letters are written in a form called ``Sütterlin´´. This is regarded as a glyph variation of the fraktur alphabet.

Cannot be done algorithmically: either you have the right font, or you don´t. Falling back to the base glyph is likely to give good results though.

SEMANTIC CAPITAL LETTER TONE

This suggests, of a digit, that a variant glyph be used with a style suitable for marking Zhuang tone. It is for the following:

LATIN CAPITAL LETTER TONE FIVE=DIGIT FIVE+SEMANTIC CAPITAL LETTER TONE
LATIN CAPITAL LETTER TONE SIX=DIGIT SIX+SEMANTIC CAPITAL LETTER TONE
LATIN CAPITAL LETTER TONE TWO=DIGIT TWO+SEMANTIC CAPITAL LETTER TONE

There is a relationship between CYRILLIC CAPITAL LETTER CHE and DIGIT FOUR+SEMANTIC CAPITAL LETTER TONE, and also between CYRILLIC CAPITAL LETTER ZE and DIGIT THREE+SEMANTIC CAPITAL LETTER TONE, in that they are likely to be the same glyph; but it would be odd to give decompositions like (*)CYRILLIC CAPITAL LETTER CHE=DIGIT FOUR+SEMANTIC CAPITAL LETTER TONE, as this would imply the wrong historical relationship. Instead, uses of CYRILLIC CAPITAL LETTER CHE as a tone mark should simply be superseded by DIGIT FOUR+SEMANTIC CAPITAL LETTER TONE.

By encoding this character, it becomes possible for sophisticated software to render suitable glyphs for all the tone letters, without needing separate encodings for tones 3, 4.

SEMANTIC CONTROL SYMBOL

Requests that a sequence of characters be rendered as if they were the name of an ISO control character. There are 35 of these encoded already, but the C1 range contains another 32 which at the moment are second-class citizens. This SEMANTIC puts them all on an even footing.

SYMBOL FOR ACKNOWLEDGE=START GROUP+LATIN CAPITAL LETTER A+LATIN CAPITAL LETTER C+LATIN CAPITAL LETTER K+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR BACKSPACE=START GROUP+LATIN CAPITAL LETTER B+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR BELL=START GROUP+LATIN CAPITAL LETTER B+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR CANCEL=START GROUP+LATIN CAPITAL LETTER C+LATIN CAPITAL LETTER A+LATIN CAPITAL LETTER N+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR CARRIAGE RETURN=START GROUP+LATIN CAPITAL LETTER C+LATIN CAPITAL LETTER R+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR DATA LINK ESCAPE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER L+LATIN CAPITAL LETTER E+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR DELETE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR DEVICE CONTROL FOUR=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT FOUR+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR DEVICE CONTROL ONE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT ONE+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR DEVICE CONTROL THREE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT THREE+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR DEVICE CONTROL TWO=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT TWO+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR END OF MEDIUM=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER M+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR END OF TEXT=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER T+LATIN CAPITAL LETTER X+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR END OF TRANSMISSION=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER O+LATIN CAPITAL LETTER T+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR END OF TRANSMISSION BLOCK=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER T+LATIN CAPITAL LETTER B+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR ENQUIRY=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER Q+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR ESCAPE=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER C+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR FILE SEPARATOR=START GROUP+LATIN CAPITAL LETTER F+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR FORM FEED=START GROUP+LATIN CAPITAL LETTER F+LATIN CAPITAL LETTER F+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR GROUP SEPARATOR=START GROUP+LATIN CAPITAL LETTER G+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR HORIZONTAL TABULATION=START GROUP+LATIN CAPITAL LETTER H+LATIN CAPITAL LETTER T+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR LINE FEED=START GROUP+LATIN CAPITAL LETTER L+LATIN CAPITAL LETTER F+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR NEGATIVE ACKNOWLEDGE=START GROUP+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER A+LATIN CAPITAL LETTER K+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR NEWLINE=START GROUP+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR NULL=START GROUP+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER U+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR RECORD SEPARATOR=START GROUP+LATIN CAPITAL LETTER R+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR SHIFT IN=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER I+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR SHIFT OUT=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER O+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR SPACE=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER P+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR START OF HEADING=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER O+LATIN CAPITAL LETTER H+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR START OF TEXT=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER T+LATIN CAPITAL LETTER X+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR SUBSTITUTE=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER U+LATIN CAPITAL LETTER B+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR SYNCHRONOUS IDLE=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER Y+LATIN CAPITAL LETTER N+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR UNIT SEPARATOR=START GROUP+LATIN CAPITAL LETTER U+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
SYMBOL FOR VERTICAL TABULATION=START GROUP+LATIN CAPITAL LETTER V+LATIN CAPITAL LETTER T+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL

The glyphs could be rendered smaller, or as sequences with a raised first character and a lowered second, or in inverse video, for example. The standard abbreviations for the C1 range of control codes are (in order 0080 to 00A0) PAD, HOP, BPH, NBH, IND, NEL, SSA, ESA, HTS, HTJ, VTS, PLD, PLU, RI, SS2, SS3, DCS, PU1, PU2, STS, CCH, MW, SPA, EPA, SOS, SGCI, SCI, CSI, ST, OSC, PM, APC, NBSP, and we also have

BREAK PERMITTED HERE=ZERO WIDTH SPACE
CARRIAGE RETURN=ZERO WIDTH NO-BREAK SPACE
CHARACTER TABULATION SET=COLUMN SEPARATOR
CHARACTER TABULATION WITH JUSTIFICATION=HAIR SPACE+COLUMN SEPARATOR
FORM FEED=LINE SEPARATOR
HORIZONTAL TABULATION=COLUMN SEPARATOR
LINE FEED=LINE SEPARATOR
LINE TABULATION SET=LINE SEPARATOR
NEXT LINE=LINE SEPARATOR
NO BREAK HERE=ZERO WIDTH NO-BREAK SPACE
SUBSTITUTE=REPLACEMENT CHARACTER
VERTICAL TABULATION=LINE SEPARATOR

(Formerly, something like a printer would have expected any of CARRIAGE RETURN+LINE FEED, CARRIAGE RETURN+VERTICAL TABULATION or CARRIAGE RETURN+FORM FEED to start a new line. This is why we discard CARRIAGE RETURN but treat LINE FEED, VERTICAL TABULATION, FORM FEED or NEW LINE as LINE SEPARATOR.)

SEMANTIC DOUBLE-STRUCK

Requests that a double-struck, ``open-face´´, ``blackboard bold´´ font be used.

Used in

DOUBLE-STRUCK CAPITAL C=LATIN CAPITAL LETTER C+SEMANTIC DOUBLE-STRUCK
DOUBLE-STRUCK CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC DOUBLE-STRUCK
DOUBLE-STRUCK CAPITAL N=LATIN CAPITAL LETTER N+SEMANTIC DOUBLE-STRUCK
DOUBLE-STRUCK CAPITAL P=LATIN CAPITAL LETTER P+SEMANTIC DOUBLE-STRUCK
DOUBLE-STRUCK CAPITAL Q=LATIN CAPITAL LETTER Q+SEMANTIC DOUBLE-STRUCK
DOUBLE-STRUCK CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC DOUBLE-STRUCK
DOUBLE-STRUCK CAPITAL Z=LATIN CAPITAL LETTER Z+SEMANTIC DOUBLE-STRUCK

and arguably in

CIRCLED OPEN CENTRE EIGHT POINTED STAR=EIGHT POINTED BLACK STAR+SEMANTIC DOUBLE-STRUCK+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DOWNWARDS DOUBLE ARROW=DOWNWARDS ARROW+SEMANTIC DOUBLE-STRUCK
LEFT RIGHT DOUBLE ARROW=LEFT RIGHT ARROW+SEMANTIC DOUBLE-STRUCK
LEFTWARDS DOUBLE ARROW=LEFTWARDS ARROW+SEMANTIC DOUBLE-STRUCK
NORTH EAST DOUBLE ARROW=NORTH EAST ARROW+SEMANTIC DOUBLE-STRUCK
NORTH WEST DOUBLE ARROW=NORTH WEST ARROW+SEMANTIC DOUBLE-STRUCK
OPEN CENTRE ASTERISK=HEAVY ASTERISK+SEMANTIC DOUBLE-STRUCK
OPEN CENTRE BLACK STAR=BLACK STAR+SEMANTIC DOUBLE-STRUCK
OPEN CENTRE CROSS=PLUS SIGN+SEMANTIC DOUBLE-STRUCK
OPEN CENTRE TEARDROP-SPOKED ASTERISK=TEARDROP-SPOKED ASTERISK+SEMANTIC DOUBLE-STRUCK
RIGHTWARDS DOUBLE ARROW=RIGHTWARDS ARROW+SEMANTIC DOUBLE-STRUCK
SOUTH EAST DOUBLE ARROW=SOUTH EAST ARROW+SEMANTIC DOUBLE-STRUCK
SOUTH WEST DOUBLE ARROW=SOUTH WEST ARROW+SEMANTIC DOUBLE-STRUCK
UP DOWN DOUBLE ARROW=UP DOWN ARROW+SEMANTIC DOUBLE-STRUCK
UPWARDS DOUBLE ARROW=UPWARDS ARROW+SEMANTIC DOUBLE-STRUCK

Hard to do algorithmically, though possible: the character is outlined, and then the original part of the character is removed, except that some strokes are left alone.

May well be productive---in particular, F (as in ``Let F be a field ...´´) is missing, but often seen in the literature.

If used but not rendered, confusion is likely to be minimal, so highly desirable.

SEMANTIC DROP-SHADOWED

Requests that a drop-shadow be drawn behind the glyph. Conventionally, the light source is behind the left shoulder of the observer, as if the observer was right handed and working at a desk. (This can be changed by using TURNED, REVERSED or INVERTED.) The shadow is cast on a flat surface behind the glyph.

Could be used for

LOWER RIGHT DROP-SHADOWED WHITE SQUARE=WHITE SQUARE+SEMANTIC DROP-SHADOWED

Not much gain there, and unlikely to be useful for anything very much.

SEMANTIC FULLWIDTH

This is for characters whose decompositions include <wide>. It indicates that, if there is choice between 2 glyphs (the single-cell one or the double-cell one), the double-cell one should be chosen. It enables software to use decomposition to get good results without needing to understand anything else about fullwidth/halfwidth characters.

Decompositions including <small> are replaced by ones involving SEMANTIC FULLWIDTH and SEMANTIC SMALL (q v), as the character glyph is small, but it is centred in a double-cell space.

If it is true that the difference between the FULLWIDTH and non-FULLWIDTH form is present merely to distinguish different glyphs that carry the same meaning, but are being used simultaneously because of a trip through a character encoding that had both, maybe this semantic is not needed, and can be replaced by a canonical one.

SEMANTIC HALFWIDTH

This is used for halfwidth characters: those with HALFWIDTH in the name, or whose decompositions include <narrow>. It indicates that, if there is choice between 2 glyphs (the single-cell one or the double-cell one), the single-cell one should be chosen. It enables software to use decomposition and get good results without needing to understand anything else about fullwidth/halfwidth characters.

I also note the decompositions for non-Western characters, though they are not otherwise competently explored here.

If it is true that the difference between the HALFWIDTH and non-HALFWIDTH form is present merely to distinguish different glyphs that carry the same meaning, but are being used simultaneously because of a trip through a character encoding that had both, maybe this semantic is not needed, and can be replaced by a canonical one.

SEMANTIC HEAVY

Requests the character be rendered in a heavy, ``bold´´, or ``black´´ font. The style is frequently used with important semantic content in mathematics, where it is used to represent a vector, and the magnitude of the vector is represented by the corresponding non-heavy character.

These are the heavy characters already in the U C S:

BLACK RIGHTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC HEAVY
BULLET OPERATOR=DOT OPERATOR+SEMANTIC HEAVY
BULLET=MIDDLE DOT+SEMANTIC HEAVY
HEAVY ASTERISK=ASTERISK OPERATOR+SEMANTIC HEAVY
HEAVY BALLOT X=BALLOT X+SEMANTIC HEAVY
HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW=HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW+SEMANTIC INVERTED
HEAVY BLACK HEART=BLACK HEART SUIT+SEMANTIC HEAVY
HEAVY BLACK-FEATHERED NORTH EAST ARROW=BLACK-FEATHERED NORTH EAST ARROW+SEMANTIC HEAVY
HEAVY BLACK-FEATHERED RIGHTWARDS ARROW=BLACK-FEATHERED RIGHTWARDS ARROW+SEMANTIC HEAVY
HEAVY BLACK-FEATHERED SOUTH EAST ARROW=BLACK-FEATHERED SOUTH EAST ARROW+SEMANTIC HEAVY
HEAVY CHECK MARK=CHECK MARK+SEMANTIC HEAVY
HEAVY CHEVRON SNOWFLAKE=SNOWFLAKE+SEMANTIC HEAVY
HYPHEN BULLET=HYPHEN+SEMANTIC HEAVY
TRIANGULAR BULLET=BLACK RIGHT-POINTING SMALL TRIANGLE+SEMANTIC HEAVY
HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW=DASHED TRIANGLE-HEADED RIGHTWARDS ARROW+SEMANTIC HEAVY
HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT=RIGHT DOUBLE QUOTATION MARK+SEMANTIC HEAVY
HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT=LEFT DOUBLE QUOTATION MARK+SEMANTIC HEAVY
HEAVY EIGHT POINTED RECTILINEAR BLACK STAR=EIGHT POINTED RECTILINEAR BLACK STAR+SEMANTIC HEAVY
HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK=EIGHT TEARDROP-SPOKED PROPELLER ASTERISK+SEMANTIC HEAVY
HEAVY EXCLAMATION MARK ORNAMENT=EXCLAMATION MARK+SEMANTIC HEAVY
HEAVY FOUR BALLOON-SPOKED ASTERISK=FOUR BALLOON-SPOKED ASTERISK+SEMANTIC HEAVY
HEAVY GREEK CROSS=PLUS SIGN+SEMANTIC HEAVY
HEAVY MULTIPLICATION X=MULTIPLICATION X+SEMANTIC HEAVY
HEAVY NORTH EAST ARROW=NORTH EAST ARROW+SEMANTIC HEAVY
HEAVY OPEN CENTRE CROSS=OPEN CENTRE CROSS+SEMANTIC HEAVY
HEAVY OUTLINED BLACK STAR=OUTLINED BLACK STAR+SEMANTIC HEAVY
HEAVY RIGHTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC HEAVY
HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT=RIGHT SINGLE QUOTATION MARK+SEMANTIC HEAVY
HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT=LEFT SINGLE QUOTATION MARK+SEMANTIC HEAVY
HEAVY SOUTH EAST ARROW=SOUTH EAST ARROW+SEMANTIC HEAVY
HEAVY SPARKLE=SPARKLE+SEMANTIC HEAVY
HEAVY TEARDROP-SPOKED ASTERISK=TEARDROP-SPOKED ASTERISK+SEMANTIC HEAVY
HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW=TRIANGLE-HEADED RIGHTWARDS ARROW+SEMANTIC HEAVY
HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW=HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW+SEMANTIC INVERTED
HEAVY VERTICAL BAR=MEDIUM VERTICAL BAR+SEMANTIC HEAVY
HEAVY WEDGE-TAILED RIGHTWARDS ARROW=WEDGE-TAILED RIGHTWARDS ARROW+SEMANTIC HEAVY

HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK and HEAVY CHEVRON SNOWFLAKE both appear to be heavy, but the base form is not encoded. (This reminds me of the situation with proto-Indo-European *-words, whose existence we can deduce without direct evidence.) Maybe they should be added.

Hard to do well algorithmically, but easy to do to some legible standard.

If used but not recognised, unlikely to cause the resulting text to be misinterpreted (except in the mathematical use), so highly desirable.

SEMANTIC INVERTED

Rotates the character (out of the paper) through a half-turn about a horizontal axis; equivalently, reflects the character about the horizontal axis. For characters where ``inverted´´ and ``turned´´ are equivalent, we describe the character as ``turned´´, out of deference to metal typography.

These characters are inverted copies of other characters:

BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW=FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW+SEMANTIC INVERTED
BLACK LOWER RIGHT TRIANGLE=BLACK UPPER RIGHT TRIANGLE+SEMANTIC INVERTED
BLACK-FEATHERED SOUTH EAST ARROW=BLACK-FEATHERED NORTH EAST ARROW+SEMANTIC INVERTED
BOTTOM RIGHT CORNER=TOP RIGHT CORNER+SEMANTIC INVERTED
BOTTOM RIGHT CROP=TOP RIGHT CROP+SEMANTIC INVERTED
DOWNWARDS ARROW WITH TIP LEFTWARDS=UPWARDS ARROW WITH TIP LEFTWARDS+SEMANTIC INVERTED
DOWNWARDS ARROW WITH TIP RIGHTWARDS=UPWARDS ARROW WITH TIP RIGHTWARDS+SEMANTIC INVERTED
DOWNWARDS HARPOON WITH BARB LEFTWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC INVERTED
LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE=LATIN LETTER GLOTTAL STOP WITH STROKE+SEMANTIC INVERTED
LATIN LETTER INVERTED GLOTTAL STOP=LATIN LETTER GLOTTAL STOP+SEMANTIC INVERTED
LATIN LETTER SMALL CAPITAL INVERTED R=LATIN LETTER SMALL CAPITAL R+SEMANTIC INVERTED
LEFT CEILING=LEFT FLOOR+SEMANTIC INVERTED
LOWER BLADE SCISSORS=UPPER BLADE SCISSORS+SEMANTIC INVERTED
LOWER RIGHT PENCIL=UPPER RIGHT PENCIL+SEMANTIC INVERTED
LOWER RIGHT QUADRANT CIRCULAR ARC=UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC INVERTED
NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW=NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW+SEMANTIC INVERTED
RIGHT CEILING=RIGHT FLOOR+SEMANTIC INVERTED
RIGHTWARDS HARPOON WITH BARB DOWNWARDS=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC INVERTED
SOUTH EAST ARROW=NORTH EAST ARROW+SEMANTIC INVERTED
THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD=THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD+SEMANTIC INVERTED
UPPER RIGHT DROP-SHADOWED WHITE SQUARE=LOWER RIGHT DROP-SHADOWED WHITE SQUARE+SEMANTIC INVERTED
UPPER RIGHT SHADOWED WHITE SQUARE=LOWER RIGHT SHADOWED WHITE SQUARE+SEMANTIC INVERTED
WHITE DOWN POINTING INDEX=WHITE UP POINTING INDEX+SEMANTIC INVERTED

There is also INVERTED LAZY S, but no (*)LAZY S. (*)LAZY S could be seen as a rotated version of LATIN SMALL LETTER S obtained if we use a SEMANTIC ROTATED decomposition, described later, but the glyphs are very different. In fact, we are assured that `reversed tilde and lazy s are glyph variants´, so that´s how we´ll encode it.

For arrows, the one pointing up (mathematical positive) is the one we define as ``the right way up´´, and its image is the ``inverted´´ glyph.

This is very easy to do in software, and the consequences of ignoring it are likely to be severe if arrows are important. (This only affects people who try to make up new characters, as existing characters are already encoded and should be well understood.)

SEMANTIC ITALIC

Requests the glyph be rendered in an italic, oblique, or slanted font. This may be just slanted, or it may have additional ornamentation at the ends of strokes, but in this case should still be distinguishable from SCRIPT (q v).

Only needed for 2 characters currently encoded.

PLANCK CONSTANT=LATIN SMALL LETTER H+SEMANTIC ITALIC
PLANCK CONSTANT OVER TWO PI=LATIN SMALL LETTER H+SEMANTIC ITALIC+COMBINING SHORT SOLIDUS OVERLAY

In mathematical text, there is usually a font difference between the characters used in the running text and the characters used for ordinary mathematical symbols. The recommended way to mark the distinction is with SEMANTIC ITALIC. Symbols represented as Greek characters are sometimes printed in a recognisably italic font, and sometimes an upright one: if there is only an italic Greek font available, it should be used for Greek characters with or without a SEMANTIC ITALIC.

Slanting, at least, can be done algorithmically with little difficulty for both outline and bit-mapped fonts.

If used but not recognised, unlikely to cause the resulting text to be misinterpreted (even in a mathematical application), so this is very desirable even though it´s only used for 2 existing characters.

SEMANTIC LARGE

A larger version of the same character. Used in

LIGHT VERTICAL BAR=VERTICAL LINE+SEMANTIC LARGE
MULTIPLICATION X=MULTIPLICATION SIGN+SEMANTIC LARGE
N-ARY INTERSECTION=INTERSECTION+SEMANTIC LARGE
N-ARY LOGICAL AND=LOGICAL AND+SEMANTIC LARGE
N-ARY LOGICAL OR=LOGICAL OR+SEMANTIC LARGE
N-ARY PRODUCT=GREEK CAPITAL LETTER PI+SEMANTIC LARGE
N-ARY SUMMATION=GREEK CAPITAL LETTER SIGMA+SEMANTIC LARGE
N-ARY UNION=UNION+SEMANTIC LARGE

It´s odd that although there´s an N-ARY COPRODUCT, there´s no (*)COPRODUCT. It should be represented as GREEK CAPITAL LETTER PI+SEMANTIC TURNED. This semantic would also be the right one to use for the Hebrew wide letters:

HEBREW LETTER WIDE ALEF=HEBREW LETTER ALEF+SEMANTIC LARGE
HEBREW LETTER WIDE DALET=HEBREW LETTER DALET+SEMANTIC LARGE
HEBREW LETTER WIDE FINAL MEM=HEBREW LETTER FINAL MEM+SEMANTIC LARGE
HEBREW LETTER WIDE HE=HEBREW LETTER HE+SEMANTIC LARGE
HEBREW LETTER WIDE KAF=HEBREW LETTER KAF+SEMANTIC LARGE
HEBREW LETTER WIDE LAMED=HEBREW LETTER LAMED+SEMANTIC LARGE
HEBREW LETTER WIDE RESH=HEBREW LETTER RESH+SEMANTIC LARGE
HEBREW LETTER WIDE TAV=HEBREW LETTER TAV+SEMANTIC LARGE

SEMANTIC LIGATURE

Requests some kind of ``artistic combination´´ of 2 characters into a single glyph. This is another ``binary operation´´: the SEMANTIC LIGATURE stands between 2 characters to be ligated. Either or both may have their own combining marks: to give a combining mark to the whole ligature, it would have to be first enclosed in START GROUP ... POP DIRECTIONAL FORMATTING.

Used for characters with LIGATURE or DIGRAPH in their name (except Arabic characters, where the expectation is that letters are joined anyway).

AMPERSAND=LATIN CAPITAL LETTER E+SEMANTIC LIGATURE+LATIN SMALL LETTER T
ARMENIAN SMALL LIGATURE ECH YIWN=ARMENIAN SMALL LETTER ECH+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER YIWN
ARMENIAN SMALL LIGATURE MEN ECH=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LIGATURE MEN INI=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER INI
ARMENIAN SMALL LIGATURE MEN NOW=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER NOW
ARMENIAN SMALL LIGATURE MEN XEH=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER XEH
ARMENIAN SMALL LIGATURE VEW NOW=ARMENIAN SMALL LETTER VEW+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER NOW
BEAMED EIGHTH NOTES=EIGHTH NOTE+SEMANTIC LIGATURE+EIGHTH NOTE
BEAMED SIXTEENTH NOTES=EIGHTH NOTE+COMBINING HOOK+SEMANTIC LIGATURE+START GROUP+EIGHTH NOTE+COMBINING HOOK+POP DIRECTIONAL FORMATTING
CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER BIG YUS
CYRILLIC CAPITAL LETTER IOTIFIED E=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER IE
CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER LITTLE YUS
CYRILLIC CAPITAL LETTER LJE=CYRILLIC CAPITAL LETTER EL+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER SOFT SIGN
CYRILLIC CAPITAL LETTER NJE=CYRILLIC CAPITAL LETTER EN+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER SOFT SIGN
CYRILLIC CAPITAL LETTER YU=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER O
CYRILLIC CAPITAL LIGATURE A IE=CYRILLIC CAPITAL LETTER A+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER IE
CYRILLIC CAPITAL LIGATURE EN GHE=CYRILLIC CAPITAL LETTER EN+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER GHE
CYRILLIC CAPITAL LIGATURE TE TSE=CYRILLIC CAPITAL LETTER TE+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER TSE
CYRILLIC SMALL LETTER IOTIFIED BIG YUS=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER BIG YUS
CYRILLIC SMALL LETTER IOTIFIED E=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER E
CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER LITTLE YUS
CYRILLIC SMALL LETTER LJE=CYRILLIC SMALL LETTER EL+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER SOFT SIGN
CYRILLIC SMALL LETTER NJE=CYRILLIC SMALL LETTER EN+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER SOFT SIGN
CYRILLIC SMALL LETTER YU=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER O
CYRILLIC SMALL LIGATURE A IE=CYRILLIC SMALL LETTER A+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER IE
CYRILLIC SMALL LIGATURE EN GHE=CYRILLIC SMALL LETTER EN+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER GHE
CYRILLIC SMALL LIGATURE TE TSE=CYRILLIC SMALL LETTER TE+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER TSE
L B BAR SYMBOL=LATIN SMALL LETTER L+SEMANTIC LIGATURE+LATIN SMALL LETTER B
LAO HO MO=LAO LETTER HO SUNG+SEMANTIC LIGATURE+LAO LETTER MO
LAO HO NO=LAO LETTER HO SUNG+SEMANTIC LIGATURE+LAO LETTER NO
LATIN CAPITAL LETTER AE=LATIN CAPITAL LETTER A+SEMANTIC LIGATURE+LATIN CAPITAL LETTER E
LATIN CAPITAL LETTER OI=LATIN CAPITAL LETTER O+SEMANTIC LIGATURE+LATIN CAPITAL LETTER I
LATIN CAPITAL LIGATURE IJ=LATIN CAPITAL LETTER I+SEMANTIC LIGATURE+LATIN CAPITAL LETTER J
LATIN CAPITAL LIGATURE OE=LATIN CAPITAL LETTER O+SEMANTIC LIGATURE+LATIN SMALL LETTER E
LATIN SMALL LETTER AE=LATIN SMALL LETTER A+SEMANTIC LIGATURE+LATIN SMALL LETTER E
LATIN SMALL LETTER DEZH DIGRAPH=LATIN SMALL LETTER D+SEMANTIC LIGATURE+LATIN SMALL LETTER EZH
LATIN SMALL LETTER DZ DIGRAPH WITH CURL=LATIN SMALL LETTER D+SEMANTIC LIGATURE+LATIN SMALL LETTER Z WITH CURL
LATIN SMALL LETTER DZ DIGRAPH=LATIN SMALL LETTER D+SEMANTIC LIGATURE+LATIN SMALL LETTER Z
LATIN SMALL LETTER HV=LATIN SMALL LETTER H+SEMANTIC LIGATURE+LATIN SMALL LETTER V
LATIN SMALL LETTER LEZH=LATIN SMALL LETTER L+SEMANTIC LIGATURE+LATIN SMALL LETTER EZH
LATIN SMALL LETTER OI=LATIN SMALL LETTER O+SEMANTIC LIGATURE+LATIN SMALL LETTER DOTLESS I
LATIN SMALL LETTER REVERSED OPEN E WITH HOOK=LATIN SMALL LETTER REVERSED OPEN E+SEMANTIC LIGATURE+MODIFIER LETTER RHOTIC HOOK
LATIN SMALL LETTER SCHWA WITH HOOK=LATIN SMALL LETTER SCHWA+SEMANTIC LIGATURE+MODIFIER LETTER RHOTIC HOOK
LATIN SMALL LETTER SHARP S=LATIN SMALL LETTER S+SEMANTIC LIGATURE+LATIN SMALL LETTER S
LATIN SMALL LETTER TC DIGRAPH WITH CURL=LATIN SMALL LETTER T+SEMANTIC LIGATURE+LATIN SMALL LETTER C WITH CURL
LATIN SMALL LETTER TESH DIGRAPH=LATIN SMALL LETTER T+SEMANTIC LIGATURE+LATIN SMALL LETTER ESH
LATIN SMALL LETTER TS DIGRAPH=LATIN SMALL LETTER T+SEMANTIC LIGATURE+LATIN SMALL LETTER S
LATIN SMALL LIGATURE FF=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER F
LATIN SMALL LIGATURE FFI=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER I
LATIN SMALL LIGATURE FFL=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER L
LATIN SMALL LIGATURE FI=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER I
LATIN SMALL LIGATURE FL=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER L
LATIN SMALL LIGATURE IJ=LATIN SMALL LETTER I+SEMANTIC LIGATURE+LATIN SMALL LETTER J
LATIN SMALL LIGATURE LONG S T=LATIN SMALL LETTER LONG S+SEMANTIC LIGATURE+LATIN SMALL LETTER T
LATIN SMALL LIGATURE OE=LATIN SMALL LETTER O+SEMANTIC LIGATURE+LATIN SMALL LETTER E
LATIN SMALL LIGATURE ST=LATIN SMALL LETTER S+SEMANTIC LIGATURE+LATIN SMALL LETTER T
NUMERO SIGN=LATIN CAPITAL LETTER N+SEMANTIC LIGATURE+LATIN SMALL LETTER O
PRESCRIPTION TAKE=LATIN CAPITAL LETTER P+SEMANTIC LIGATURE+LATIN SMALL LETTER X
ROMAN NUMERAL ONE THOUSAND C D=LATIN CAPITAL LETTER C+SEMANTIC LIGATURE+LATIN CAPITAL LETTER D

If the decomposition of CYRILLIC LETTER YU (historically CYRILLIC LETTER IOTIFIED O) as CYRILLIC LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC LETTER O horrifies you, see the remarks on O WITH STROKE, below.

There is an argument for some of these---e g, LATIN CAPITAL LETTER AE---that they are not ligatures, and should not be decomposed as such. However, even when LATIN CAPITAL LETTER AE is being used as the letter ash, it is still appropriate to render it as AE if its glyph is not available: you´d see text like ``AElfred the Great´´. (The comments for ``O WITH STROKE´´ also apply here.)

We should enclose each decomposition in ``brackets´´ (START GROUP, ..., POP DIRECTIONAL FORMATTING) for formal reasons: it ensures that the ligature is treated as unit by any other processing that may be done.

I also note here the decompositions for Hebrew ligatures, though Hebrew is not otherwise competently explored here:

HEBREW LIGATURE ALEF LAMED=HEBREW LETTER ALEF+SEMANTIC LIGATURE+HEBREW LETTER LAMED
HEBREW LIGATURE YIDDISH DOUBLE VAV=HEBREW LETTER VAV+SEMANTIC LIGATURE+HEBREW LETTER VAV
HEBREW LIGATURE YIDDISH DOUBLE YOD=HEBREW LETTER YOD+SEMANTIC LIGATURE+HEBREW LETTER YOD
HEBREW LIGATURE YIDDISH VAV YOD=HEBREW LETTER VAV+SEMANTIC LIGATURE+HEBREW LETTER YOD

There is a 3rd set of ligatures as well: many mathematical symbols are composed as a combination of other characters, but in at least some cases the composition is no longer purely algorithmic. The list is ...

ALMOST EQUAL OR EQUAL TO=ALMOST EQUAL TO+SEMANTIC LIGATURE+EQUALS SIGN
BOWTIE=VERTICAL STROKE+SEMANTIC LIGATURE+MULTIPLICATION SIGN+SEMANTIC LIGATURE+VERTICAL STROKE
CONTAINS AS NORMAL SUBGROUP OR EQUAL TO=CONTAINS AS NORMAL SUBGROUP+SEMANTIC LIGATURE+EQUALS SIGN
EQUAL TO OR GREATER-THAN=EQUALS SIGN+SEMANTIC LIGATURE+GREATER-THAN SIGN
EQUAL TO OR LESS-THAN=EQUALS SIGN+SEMANTIC LIGATURE+LESS-THAN SIGN
EQUAL TO OR PRECEDES=EQUALS SIGN+SEMANTIC LIGATURE+PRECEDES
EQUAL TO OR SUCCEEDS=EQUALS SIGN+SEMANTIC LIGATURE+SUCCEEDS
GREATER-THAN BUT NOT EQUIVALENT TO=GREATER-THAN SIGN+SEMANTIC LIGATURE+NOT EQUIVALENT TO
GREATER-THAN EQUAL TO OR LESS-THAN=GREATER-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN+SEMANTIC LIGATURE+LESS-THAN SIGN
GREATER-THAN OR EQUAL TO=GREATER-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN
GREATER-THAN OR EQUIVALENT TO=GREATER-THAN SIGN+SEMANTIC LIGATURE+EQUIVALENT TO
GREATER-THAN OR LESS-THAN=GREATER-THAN SIGN+SEMANTIC LIGATURE+LESS-THAN SIGN
LEFT NORMAL FACTOR SEMIDIRECT PRODUCT=VERTICAL STROKE+SEMANTIC LIGATURE+MULTIPLICATION SIGN
LESS-THAN BUT NOT EQUIVALENT TO=LESS-THAN SIGN+SEMANTIC LIGATURE+NOT EQUIVALENT TO
LESS-THAN EQUAL TO OR GREATER-THAN=LESS-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN+SEMANTIC LIGATURE+GREATER-THAN SIGN
LESS-THAN OR EQUAL TO=LESS-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN
LESS-THAN OR EQUIVALENT TO=LESS-THAN SIGN+SEMANTIC LIGATURE+EQUIVALENT TO
LESS-THAN OR GREATER-THAN=LESS-THAN SIGN+SEMANTIC LIGATURE+GREATER-THAN SIGN
NORMAL SUBGROUP OF=LESS-THAN SIGN+SEMANTIC LIGATURE+VERTICAL STROKE
NORMAL SUBGROUP OF OR EQUAL TO=NORMAL SUBGROUP OF+SEMANTIC LIGATURE+EQUALS SIGN
POSTAL MARK FACE=POSTAL MARK+SEMANTIC LIGATURE+WHITE SMILING FACE
PRECEDES BUT NOT EQUIVALENT TO=PRECEDES+SEMANTIC LIGATURE+NOT EQUIVALENT TO
PRECEDES OR EQUAL TO=PRECEDES+SEMANTIC LIGATURE+EQUALS SIGN
PRECEDES OR EQUIVALENT TO=PRECEDES+SEMANTIC LIGATURE+EQUIVALENT TO
RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT=MULTIPLICATION SIGN+SEMANTIC LIGATURE+VERTICAL STROKE
SQUARE IMAGE OF OR EQUAL TO=SQUARE IMAGE OF+SEMANTIC LIGATURE+EQUALS SIGN
SQUARE IMAGE OF OR NOT EQUAL TO=SQUARE IMAGE OF+SEMANTIC LIGATURE+NOT EQUAL TO
SQUARE ORIGINAL OF OR EQUAL TO=SQUARE ORIGINAL OF+SEMANTIC LIGATURE+EQUALS SIGN
SQUARE ORIGINAL OF OR NOT EQUAL TO=SQUARE ORIGINAL OF+SEMANTIC LIGATURE+NOT EQUAL TO
SUBSET OF OR EQUAL TO=SUBSET OF+SEMANTIC LIGATURE+EQUALS SIGN
SUBSET OF WITH NOT EQUAL TO=SUBSET OF+SEMANTIC LIGATURE+NOT EQUAL TO
SUCCEEDS BUT NOT EQUIVALENT TO=SUCCEEDS+SEMANTIC LIGATURE+NOT EQUIVALENT TO
SUCCEEDS OR EQUAL TO=SUCCEEDS+SEMANTIC LIGATURE+EQUALS SIGN
SUCCEEDS OR EQUIVALENT TO=SUCCEEDS+SEMANTIC LIGATURE+EQUIVALENT TO
SUPERSET OF OR EQUAL TO=SUPERSET OF+SEMANTIC LIGATURE+EQUALS SIGN
SUPERSET OF WITH NOT EQUAL TO=SUPERSET OF+SEMANTIC LIGATURE+NOT EQUAL TO

Strangely, the decomposition with ligature is most useful for renderers that can´t do ligatures, e g, cell-based character terminals. They can just look up the decomposition and render the two glyphs---ignoring the PRESENTATION REQUEST LIGATURE completely---and get good, legible results.

SEMANTIC OUTLINED

Surrounds the character with a narrow line. 5 characters are described as ``outlined´´, and a few others fit the description.

BULLSEYE=WHITE BULLET+SEMANTIC OUTLINED
FISHEYE=BULLET+SEMANTIC OUTLINED
OPEN-OUTLINED RIGHTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC OUTLINED
OUTLINED BLACK STAR=BLACK STAR+SEMANTIC OUTLINED
OUTLINED GREEK CROSS=PLUS SIGN+SEMANTIC OUTLINED
OUTLINED LATIN CROSS=LATIN CROSS+SEMANTIC OUTLINED
STRESS OUTLINED WHITE STAR=WHITE STAR+SEMANTIC OUTLINED
WHITE DIAMOND CONTAINING BLACK SMALL DIAMOND=BLACK DIAMOND+SEMANTIC SMALL+SEMANTIC OUTLINED
WHITE SQUARE CONTAINING BLACK SMALL SQUARE=BLACK SMALL SQUARE+SEMANTIC OUTLINED

and EIGHT PETALLED OUTLINED BLACK FLORETTE lacks a base form.

Possible to do algorithmically, but seems like a very specialised thing to do for such little gain.

Substitution of the non-outlined glyph is unlikely to cause legibility problems though, so this would be a good decomposition to have even if noone uses it.

SEMANTIC OVERPRINT

Requests that characters be overstruck. Applies to the 2 characters on each side, like a ``binary operator´´.

Although seemingly simple, this introduces a whole set of problems. What is the difference between following a character with a COMBINING ENCLOSING CIRCLE and OVERPRINTING it with a LARGE CIRCLE? Can you accent a character by composing it with a spacing accent character?

To avoid such problems, the OVERPRINT character is only used in cases where the derivation of the character is clearly understood, and known to be overstuck. This is a historical judgement.

It applies mostly to the A P L block, and there are 64 symbols:

APL FUNCTIONAL SYMBOL ALPHA UNDERBAR=APL FUNCTIONAL SYMBOL ALPHA+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL BACKSLASH BAR=REVERSE SOLIDUS+SEMANTIC OVERPRINT+MINUS SIGN
APL FUNCTIONAL SYMBOL CIRCLE BACKSLASH=LARGE CIRCLE+SEMANTIC OVERPRINT+REVERSE SOLIDUS
APL FUNCTIONAL SYMBOL CIRCLE DIAERESIS=LARGE CIRCLE+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL CIRCLE JOT=LARGE CIRCLE+SEMANTIC OVERPRINT+RING OPERATOR
APL FUNCTIONAL SYMBOL CIRCLE STAR=LARGE CIRCLE+SEMANTIC OVERPRINT+ASTERISK OPERATOR
APL FUNCTIONAL SYMBOL CIRCLE STILE=LARGE CIRCLE+SEMANTIC OVERPRINT+VERTICAL LINE
APL FUNCTIONAL SYMBOL CIRCLE UNDERBAR=LARGE CIRCLE+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL COMMA BAR=COMMA+SEMANTIC OVERPRINT+MINUS SIGN
APL FUNCTIONAL SYMBOL DEL DIAERESIS=INCREMENT+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL DEL STILE=INCREMENT+SEMANTIC OVERPRINT+VERTICAL LINE
APL FUNCTIONAL SYMBOL DEL TILDE=INCREMENT+SEMANTIC OVERPRINT+TILDE OPERATOR
APL FUNCTIONAL SYMBOL DELTA STILE=INCREMENT+SEMANTIC OVERPRINT+VERTICAL LINE
APL FUNCTIONAL SYMBOL DELTA UNDERBAR=INCREMENT+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL DIAMOND UNDERBAR=DIAMOND OPERATOR+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL DOWN CARET TILDE=DOWN ARROWHEAD+SEMANTIC OVERPRINT+TILDE OPERATOR
APL FUNCTIONAL SYMBOL DOWN SHOE STILE=UNION+SEMANTIC OVERPRINT+VERTICAL LINE
APL FUNCTIONAL SYMBOL DOWN TACK JOT=DOWN TACK+SEMANTIC OVERPRINT+RING OPERATOR
APL FUNCTIONAL SYMBOL DOWN TACK UNDERBAR=DOWN TACK+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL DOWNWARDS VANE=MINUS SIGN+SEMANTIC OVERPRINT+DOWNWARDS ARROW
APL FUNCTIONAL SYMBOL EPSILON UNDERBAR=GREEK SMALL LETTER EPSILON+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL GREATER-THAN DIAERESIS=GREATER-THAN SIGN+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL IOTA UNDERBAR=APL FUNCTIONAL SYMBOL IOTA+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL JOT DIAERESIS=RING OPERATOR+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL JOT UNDERBAR=RING OPERATOR+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL LEFT SHOE STILE=SUBSET OF+SEMANTIC OVERPRINT+VERTICAL LINE
APL FUNCTIONAL SYMBOL LEFTWARDS VANE=VERTICAL LINE+SEMANTIC OVERPRINT+LEFTWARDS ARROW
APL FUNCTIONAL SYMBOL OMEGA UNDERBAR=APL FUNCTIONAL SYMBOL OMEGA+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL QUAD BACKSLASH=BALLOT BOX+SEMANTIC OVERPRINT+REVERSE SOLIDUS
APL FUNCTIONAL SYMBOL QUAD CIRCLE=BALLOT BOX+SEMANTIC OVERPRINT+LARGE CIRCLE
APL FUNCTIONAL SYMBOL QUAD COLON=BALLOT BOX+SEMANTIC OVERPRINT+COLON
APL FUNCTIONAL SYMBOL QUAD DEL=BALLOT BOX+SEMANTIC OVERPRINT+INCREMENT
APL FUNCTIONAL SYMBOL QUAD DELTA=BALLOT BOX+SEMANTIC OVERPRINT+INCREMENT
APL FUNCTIONAL SYMBOL QUAD DIAMOND=BALLOT BOX+SEMANTIC OVERPRINT+DIAMOND OPERATOR
APL FUNCTIONAL SYMBOL QUAD DIVIDE=BALLOT BOX+SEMANTIC OVERPRINT+DIVISION SIGN
APL FUNCTIONAL SYMBOL QUAD DOWN CARET=BALLOT BOX+SEMANTIC OVERPRINT+DOWN ARROWHEAD
APL FUNCTIONAL SYMBOL QUAD DOWNWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+DOWNWARDS ARROW
APL FUNCTIONAL SYMBOL QUAD EQUAL=BALLOT BOX+SEMANTIC OVERPRINT+EQUALS SIGN
APL FUNCTIONAL SYMBOL QUAD GREATER-THAN=BALLOT BOX+SEMANTIC OVERPRINT+GREATER-THAN SIGN
APL FUNCTIONAL SYMBOL QUAD JOT=BALLOT BOX+SEMANTIC OVERPRINT+RING OPERATOR
APL FUNCTIONAL SYMBOL QUAD LEFTWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+LEFTWARDS ARROW
APL FUNCTIONAL SYMBOL QUAD LESS-THAN=BALLOT BOX+SEMANTIC OVERPRINT+LESS-THAN SIGN
APL FUNCTIONAL SYMBOL QUAD NOT EQUAL=BALLOT BOX+SEMANTIC OVERPRINT+NOT EQUAL TO
APL FUNCTIONAL SYMBOL QUAD QUESTION=BALLOT BOX+SEMANTIC OVERPRINT+QUESTION MARK
APL FUNCTIONAL SYMBOL QUAD RIGHTWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+RIGHTWARDS ARROW
APL FUNCTIONAL SYMBOL QUAD SLASH=BALLOT BOX+SEMANTIC OVERPRINT+SOLIDUS
APL FUNCTIONAL SYMBOL QUAD UP CARET=BALLOT BOX+SEMANTIC OVERPRINT+UP ARROWHEAD
APL FUNCTIONAL SYMBOL QUAD UPWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+UPWARDS ARROW
APL FUNCTIONAL SYMBOL QUOTE QUAD=APOSTROPHE+SEMANTIC OVERPRINT+BALLOT BOX
APL FUNCTIONAL SYMBOL QUOTE UNDERBAR=APOSTROPHE+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL RIGHTWARDS VANE=VERTICAL LINE+SEMANTIC OVERPRINT+RIGHTWARDS ARROW
APL FUNCTIONAL SYMBOL SEMICOLON UNDERBAR=SEMICOLON+COMBINING LOW LINE
APL FUNCTIONAL SYMBOL SLASH BAR=SOLIDUS+SEMANTIC OVERPRINT+MINUS SIGN
APL FUNCTIONAL SYMBOL STAR DIAERESIS=ASTERISK OPERATOR+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL STILE TILDE=VERTICAL LINE+SEMANTIC OVERPRINT+TILDE OPERATOR
APL FUNCTIONAL SYMBOL TILDE DIAERESIS=TILDE OPERATOR+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL UP CARET TILDE=UP ARROWHEAD+SEMANTIC OVERPRINT+TILDE OPERATOR
APL FUNCTIONAL SYMBOL UP SHOE JOT=INTERSECTION+SEMANTIC OVERPRINT+RING OPERATOR
APL FUNCTIONAL SYMBOL UP TACK DIAERESIS=UP TACK+COMBINING DIAERESIS
APL FUNCTIONAL SYMBOL UP TACK JOT=UP TACK+SEMANTIC OVERPRINT+RING OPERATOR
APL FUNCTIONAL SYMBOL UP TACK OVERBAR=UP TACK+COMBINING OVERLINE
APL FUNCTIONAL SYMBOL UPWARDS VANE=MINUS SIGN+SEMANTIC OVERPRINT+UPWARDS ARROW
APL FUNCTIONAL SYMBOL ZILDE=LARGE CIRCLE+SEMANTIC OVERPRINT+TILDE OPERATOR

Composition could also be used to shrink the number of box-drawing characters down to a very reasonable 10 or so, from which the rest can be built. It seems likely that a sophisicated renderer would not regard these as characters at all, but would convert them into drawing primitives taking into account the current leading.

Since there are no characters BOX DRAWINGS DOUBLE {DOWN, LEFT, RIGHT, UP}, we use BOX DRAWINGS LIGHT {DOWN, LEFT, RIGHT, UP}+SEMANTIC DOUBLE-STRUCK in their place.

BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL
BOX DRAWINGS DOUBLE DOWN AND LEFT=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE DOWN AND RIGHT=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE HORIZONTAL=BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE UP AND HORIZONTAL=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL
BOX DRAWINGS DOUBLE UP AND LEFT=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE UP AND RIGHT=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL
BOX DRAWINGS DOUBLE VERTICAL AND LEFT=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE VERTICAL AND RIGHT=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOUBLE VERTICAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS DOWN HEAVY AND HORIZONTAL LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS DOWN HEAVY AND RIGHT LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS DOWN HEAVY AND RIGHT UP LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS DOWN HEAVY AND UP HORIZONTAL LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS DOWN LIGHT AND HORIZONTAL HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT
BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP
BOX DRAWINGS DOWN LIGHT AND RIGHT HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS DOWN LIGHT AND RIGHT UP HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP
BOX DRAWINGS DOWN LIGHT AND UP HORIZONTAL HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL
BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL=BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL+SEMANTIC HEAVY
BOX DRAWINGS HEAVY DOUBLE DASH VERTICAL=BOX DRAWINGS LIGHT DOUBLE DASH VERTICAL+SEMANTIC HEAVY
BOX DRAWINGS HEAVY DOWN AND HORIZONTAL=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS HEAVY DOWN AND LEFT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT
BOX DRAWINGS HEAVY DOWN AND RIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS HEAVY DOWN=BOX DRAWINGS LIGHT DOWN+SEMANTIC HEAVY
BOX DRAWINGS HEAVY HORIZONTAL=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS HEAVY LEFT AND LIGHT RIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS HEAVY LEFT=BOX DRAWINGS LIGHT LEFT+SEMANTIC HEAVY
BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL=BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL+SEMANTIC HEAVY
BOX DRAWINGS HEAVY QUADRUPLE DASH VERTICAL=BOX DRAWINGS LIGHT QUADRUPLE DASH VERTICAL+SEMANTIC HEAVY
BOX DRAWINGS HEAVY RIGHT=BOX DRAWINGS LIGHT RIGHT+SEMANTIC HEAVY
BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL=BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL+SEMANTIC HEAVY
BOX DRAWINGS HEAVY TRIPLE DASH VERTICAL=BOX DRAWINGS LIGHT TRIPLE DASH VERTICAL+SEMANTIC HEAVY
BOX DRAWINGS HEAVY UP AND HORIZONTAL=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS HEAVY UP AND LEFT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT
BOX DRAWINGS HEAVY UP AND LIGHT DOWN=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS HEAVY UP AND RIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS HEAVY UP=BOX DRAWINGS LIGHT UP+SEMANTIC HEAVY
BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS HEAVY VERTICAL AND LEFT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT
BOX DRAWINGS HEAVY VERTICAL AND RIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS HEAVY VERTICAL=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP
BOX DRAWINGS LEFT DOWN HEAVY AND RIGHT UP LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS LEFT HEAVY AND RIGHT DOWN LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS LEFT HEAVY AND RIGHT UP LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS LEFT HEAVY AND RIGHT VERTICAL LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT VERTICAL
BOX DRAWINGS LEFT LIGHT AND RIGHT DOWN HEAVY=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN
BOX DRAWINGS LEFT LIGHT AND RIGHT UP HEAVY=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP
BOX DRAWINGS LEFT LIGHT AND RIGHT VERTICAL HEAVY=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY VERTICAL
BOX DRAWINGS LEFT UP HEAVY AND RIGHT DOWN LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS LIGHT ARC DOWN AND LEFT=BOX DRAWINGS LIGHT ARC UP AND RIGHT+SEMANTIC TURNED
BOX DRAWINGS LIGHT ARC DOWN AND RIGHT=BOX DRAWINGS LIGHT ARC UP AND RIGHT+SEMANTIC INVERTED
BOX DRAWINGS LIGHT ARC UP AND LEFT=BOX DRAWINGS LIGHT ARC UP AND RIGHT+SEMANTIC REVERSED
BOX DRAWINGS LIGHT DIAGONAL CROSS=BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT
BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT=BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT+SEMANTIC REVERSED
BOX DRAWINGS LIGHT DOWN AND HORIZONTAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS LIGHT DOWN AND LEFT=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS LIGHT DOWN AND RIGHT=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS LIGHT DOWN=BOX DRAWINGS LIGHT UP+SEMANTIC TURNED
BOX DRAWINGS LIGHT HORIZONTAL=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS LIGHT LEFT AND HEAVY RIGHT=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS LIGHT LEFT=BOX DRAWINGS LIGHT RIGHT+SEMANTIC TURNED
BOX DRAWINGS LIGHT UP AND HEAVY DOWN=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN
BOX DRAWINGS LIGHT UP AND HORIZONTAL=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS LIGHT UP AND LEFT=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS LIGHT UP AND RIGHT=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS LIGHT VERTICAL AND LEFT=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS LIGHT VERTICAL AND RIGHT=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS LIGHT VERTICAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS RIGHT DOWN HEAVY AND LEFT UP LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS RIGHT HEAVY AND LEFT DOWN LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS RIGHT HEAVY AND LEFT UP LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP
BOX DRAWINGS RIGHT HEAVY AND LEFT VERTICAL LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT VERTICAL
BOX DRAWINGS RIGHT LIGHT AND LEFT DOWN HEAVY=BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN
BOX DRAWINGS RIGHT LIGHT AND LEFT UP HEAVY=BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP
BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY=BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY VERTICAL
BOX DRAWINGS RIGHT UP HEAVY AND LEFT DOWN LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS UP DOUBLE AND LEFT SINGLE=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS UP HEAVY AND DOWN HORIZONTAL LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS UP HEAVY AND HORIZONTAL LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS UP HEAVY AND LEFT LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS UP HEAVY AND RIGHT DOWN LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN
BOX DRAWINGS UP HEAVY AND RIGHT LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS UP LIGHT AND DOWN HORIZONTAL HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS UP LIGHT AND HORIZONTAL HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN
BOX DRAWINGS UP LIGHT AND LEFT HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT
BOX DRAWINGS UP LIGHT AND RIGHT DOWN HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN
BOX DRAWINGS UP LIGHT AND RIGHT HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL
BOX DRAWINGS UP SINGLE AND LEFT DOUBLE=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS VERTICAL HEAVY AND HORIZONTAL LIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL
BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT
BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT
BOX DRAWINGS VERTICAL LIGHT AND HORIZONTAL HEAVY=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL
BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT
BOX DRAWINGS VERTICAL LIGHT AND RIGHT HEAVY=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT
BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL
BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING

It would also be hard to deny that

BALLOT BOX WITH CHECK=BALLOT BOX+SEMANTIC OVERPRINT+CHECK MARK
BALLOT BOX WITH X=BALLOT BOX+SEMANTIC OVERPRINT+BALLOT X
CHI RHO=GREEK CAPITAL LETTER CHI+SEMANTIC OVERPRINT+GREEK CAPITAL LETTER RHO
DIVISION TIMES=DIVISION SIGN+SEMANTIC OVERPRINT+MULTIPLICATION SIGN
EQUAL AND PARALLEL TO=EQUALS SIGN+SEMANTIC OVERPRINT+PARALLEL TO
GEOMETRIC PROPORTION=PROPORTION+SEMANTIC OVERPRINT+MINUS SIGN
GREATER-THAN WITH DOT=GREATER-THAN SIGN+SEMANTIC OVERPRINT+DOT OPERATOR
HOMOTHETIC=TILDE OPERATOR+SEMANTIC OVERPRINT+COLON
INTERROBANG=QUESTION MARK+SEMANTIC OVERPRINT+EXCLAMATION MARK
LESS-THAN WITH DOT=LESS-THAN SIGN+SEMANTIC OVERPRINT+DOT OPERATOR
MULTISET MULTIPLICATION=UNION+SEMANTIC OVERPRINT+DOT OPERATOR
MULTISET UNION=UNION+SEMANTIC OVERPRINT+STAR OPERATOR
MULTISET=UNION+SEMANTIC OVERPRINT+ELEMENT OF
PITCHFORK=INTERSECTION+SEMANTIC OVERPRINT+VERTICAL LINE
RING IN EQUAL TO=EQUALS SIGN+SEMANTIC OVERPRINT+RING OPERATOR
SQUARE WITH DIAGONAL CROSSHATCH FILL=SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL+SEMANTIC OVERPRINT+SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL
SQUARE WITH ORTHOGONAL CROSSHATCH FILL=SQUARE WITH HORIZONTAL FILL+SEMANTIC OVERPRINT+SQUARE WITH VERTICAL FILL
WHITE UP-POINTING TRIANGLE WITH DOT=WHITE UP-POINTING TRIANGLE+SEMANTIC OVERPRINT+DOT OPERATOR

We should enclose each decomposition in ``brackets´´ (START GROUP, ..., POP DIRECTIONAL FORMATTING) similarly to the treatment of ligature. When 2 characters are overprinted, order is important: the second may be modified to take account of the first. If the 2nd is LARGE CIRLE, LOZENGE or BALLOT BOX, the assumption would be that it should enclose the first; if it is LOW LINE or OVERBAR, it should extend to the full width occupied by the first. This is analogous to the way in which the height of an accent above a base character depends on the height of that base character. But all the strokes of both characters will be fully visible. (There is no white ink!)

If widely deployed, the OVERPRINT operation could cause no end of havoc by encouraging the creation of new symbols in a very uncontrolled way. (On the other hand, maybe that´s a good thing.)

SEMANTIC REVERSED

Rotates the character (out of the paper) through a half-turn about a vertical axis; equivalently, reflects the character about the vertical axis. For characters where ``reversed´´ and ``turned´´ are equivalent, we describe the character as ``turned´´, out of deference to metal typography.

ANTICLOCKWISE OPEN CIRCLE ARROW=CLOCKWISE OPEN CIRCLE ARROW+SEMANTIC REVERSED
ANTICLOCKWISE TOP SEMICIRCLE ARROW=CLOCKWISE TOP SEMICIRCLE ARROW+SEMANTIC REVERSED
BLACK LEFT POINTING INDEX=BLACK RIGHT POINTING INDEX+SEMANTIC REVERSED
BLACK UPPER LEFT TRIANGLE=BLACK UPPER RIGHT TRIANGLE+SEMANTIC REVERSED
GRAVE ACCENT=ACUTE ACCENT+SEMANTIC REVERSED
HANGUL CHOSEONG CEONGCHIEUMCHIEUCH=HANGUL CHOSEONG CHITUEUMCHIEUCH+SEMANTIC REVERSED
HANGUL CHOSEONG CEONGCHIEUMCIEUC=HANGUL CHOSEONG CHITUEUMCIEUC+SEMANTIC REVERSED
HANGUL CHOSEONG CEONGCHIEUMSIOS=HANGUL CHOSEONG CHITUEUMSIOS+SEMANTIC REVERSED
LATIN CAPITAL LETTER D WITH TOPBAR=LATIN CAPITAL LETTER B WITH TOPBAR+SEMANTIC REVERSED
LATIN CAPITAL LETTER EZH REVERSED=LATIN CAPITAL LETTER EZH+SEMANTIC REVERSED
LATIN LETTER PHARYNGEAL VOICED FRICATIVE=LATIN LETTER GLOTTAL STOP+SEMANTIC REVERSED
LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE=LATIN LETTER GLOTTAL STOP WITH STROKE+SEMANTIC REVERSED
LATIN SMALL LETTER CLOSED REVERSED OPEN E=LATIN SMALL LETTER CLOSED OPEN E+SEMANTIC REVERSED
LATIN SMALL LETTER D WITH TOPBAR=LATIN SMALL LETTER B WITH TOPBAR+SEMANTIC REVERSED
LATIN SMALL LETTER EZH REVERSED=LATIN SMALL LETTER EZH+SEMANTIC REVERSED
LATIN SMALL LETTER REVERSED E=LATIN SMALL LETTER E+SEMANTIC REVERSED
LATIN SMALL LETTER REVERSED OPEN E=LATIN SMALL LETTER OPEN E+SEMANTIC REVERSED
LATIN SMALL LETTER REVERSED R WITH FISHHOOK=LATIN SMALL LETTER R WITH FISHHOOK+SEMANTIC REVERSED
LEFTWARDS HARPOON WITH BARB UPWARDS=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC REVERSED
NORTH WEST ARROW=NORTH EAST ARROW+SEMANTIC REVERSED
REVERSE SOLIDUS=SOLIDUS+SEMANTIC REVERSED
REVERSED DOUBLE PRIME QUOTATION MARK=DOUBLE PRIME QUOTATION MARK+SEMANTIC REVERSED
REVERSED DOUBLE PRIME=DOUBLE PRIME+SEMANTIC REVERSED
REVERSED NOT SIGN=NOT SIGN+SEMANTIC REVERSED
REVERSED PRIME=PRIME+SEMANTIC REVERSED
REVERSED TILDE=TILDE OPERATOR+SEMANTIC REVERSED
REVERSED TRIPLE PRIME=TRIPLE PRIME+SEMANTIC REVERSED
SINGLE HIGH-REVERSED-9 QUOTATION MARK=RIGHT SINGLE QUOTATION MARK+SEMANTIC REVERSED
SQUARE WITH UPPER LEFT DIAGONAL HALF BLACK=SQUARE WITH LOWER RIGHT DIAGONAL HALF BLACK+SEMANTIC REVERSED
SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL=SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL+SEMANTIC REVERSED
TIBETAN LETTER DDA=TIBETAN LETTER DA+SEMANTIC REVERSED
TIBETAN LETTER NNA=TIBETAN LETTER NA+SEMANTIC REVERSED
TIBETAN LETTER SSA=TIBETAN LETTER SHA+SEMANTIC REVERSED
TIBETAN LETTER TTA=TIBETAN LETTER TA+SEMANTIC REVERSED
TIBETAN LETTER TTHA=TIBETAN LETTER THA+SEMANTIC REVERSED
TIBETAN MARK ANG KHANG GYAS=TIBETAN MARK ANG KHANG GYON+SEMANTIC REVERSED
TIBETAN MARK GUG RTAGS GYAS=TIBETAN MARK GUG RTAGS GYON+SEMANTIC REVERSED
TIBETAN VOWEL SIGN REVERSED I=TIBETAN VOWEL SIGN I+SEMANTIC REVERSED
TOP LEFT CORNER=TOP RIGHT CORNER+SEMANTIC REVERSED
TOP LEFT CROP=TOP RIGHT CROP+SEMANTIC REVERSED
UP-POINTING TRIANGLE WITH LEFT HALF BLACK=UP-POINTING TRIANGLE WITH RIGHT HALF BLACK+SEMANTIC REVERSED
UPPER LEFT QUADRANT CIRCULAR ARC=UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC REVERSED
UPWARDS ARROW WITH TIP LEFTWARDS=UPWARDS ARROW WITH TIP RIGHTWARDS+SEMANTIC REVERSED
UPWARDS HARPOON WITH BARB RIGHTWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC REVERSED

For arrows, the one pointing to the right (mathematically positive) is the one we define as ``forwards´´, and its image is the ``reversed´´ glyph.

Combining characters cannot take direct advantage of the semantics, so there is no way to decompose a character like COMBINING REVERSED COMMA ABOVE by means of SEMANTIC REVERSED. (... Unless you wish to reverse the character, put an ordinary comma above on it, and turn it back: (*)COMBINING REVERSED COMMA ABOVE=SEMANTIC REVERSED+COMBINING COMMA ABOVE+SEMANTIC REVERSED? What sick mind would try to do such a thing??) But we can use a different composition to get the same effect.

It seems possible that there should be some relationship between SEMANTIC REVERSED and the ``symmetric swapping´´ that happens when text is flowing from right to left. It may be appropriate to decompose all ``left´´ characters as reversed ``right´´ characters, to make this explicit, as in LEFT PARENTHESIS=RIGHT PARENTHESIS+SEMANTIC REVERSED; etc. I´m not convinced this is so desirable, but if it was, we would get the following.

RIGHT ANGLE BRACKET=LEFT ANGLE BRACKET+SEMANTIC REVERSED
RIGHT BLACK LENTICULAR BRACKET=LEFT BLACK LENTICULAR BRACKET+SEMANTIC REVERSED
RIGHT CURLY BRACKET=LEFT CURLY BRACKET+SEMANTIC REVERSED
RIGHT FLOOR=LEFT FLOOR+SEMANTIC REVERSED
RIGHT PARENTHESIS=LEFT PARENTHESIS+SEMANTIC REVERSED
RIGHT SEMIDIRECT PRODUCT=LEFT SEMIDIRECT PRODUCT+SEMANTIC REVERSED
RIGHT SQUARE BRACKET=LEFT SQUARE BRACKET+SEMANTIC REVERSED
RIGHT TORTOISE SHELL BRACKET=LEFT TORTOISE SHELL BRACKET+SEMANTIC REVERSED
RIGHT-POINTING ANGLE BRACKET=LEFT-POINTING ANGLE BRACKET+SEMANTIC REVERSED
SINGLE RIGHT-POINTING ANGLE QUOTATION MARK=SINGLE LEFT-POINTING ANGLE QUOTATION MARK+SEMANTIC REVERSED

In order to implement symmetric swapping, a renderer is in any case going to need some extra reversed glyphs which are not in the U C S at all. This is no trouble for a renderer that implements the Atomic Theory, because it can do the REVERSE operation; but others may be surprised to find that they need such characters as ANGLE+SEMANTIC REVERSED, INTEGRAL+SEMANTIC REVERSED, PROPORTIONAL TO+SEMANTIC REVERSED and many more.

Reversing is very easy to do in software, and the consequences of ignoring it are likely to be severe if arrows are important. (This only affects people who try to make up new symbols, as existing characters are already encoded and should be well understood.)

SEMANTIC ROTATED

This is a rotation of a quarter-turn clockwise (mathematically -90°), staying in the plane of the paper. Typographically, it is unusual to use rotated characters, because traditional type is designed to fit in a constant height, but with varying widths. (A rotated character would just fall out of the stick too easily.) There is only 2 characters in the U C S that are described as rotated by name, and it is rotated the other way.

However, lots of arrows could be described as rotated versions of other arrows, e g

BLACK RIGHT-POINTING TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC ROTATED
DOWNWARDS ARROW WITH CORNER LEFTWARDS=RIGHTWARDS ARROW WITH CORNER DOWNWARDS+SEMANTIC ROTATED
LEFT RIGHT ARROW=UP DOWN ARROW+SEMANTIC ROTATED
RIGHT TACK=UP TACK+SEMANTIC ROTATED
RIGHTWARDS ARROW FROM BAR=UPWARDS ARROW FROM BAR+SEMANTIC ROTATED
RIGHTWARDS ARROW=UPWARDS ARROW+SEMANTIC ROTATED
RIGHTWARDS DASHED ARROW=UPWARDS DASHED ARROW+SEMANTIC ROTATED
RIGHTWARDS HARPOON WITH BARB UPWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC ROTATED
RIGHTWARDS PAIRED ARROWS=UPWARDS PAIRED ARROWS+SEMANTIC ROTATED
RIGHTWARDS TWO HEADED ARROW=UPWARDS TWO HEADED ARROW+SEMANTIC ROTATED

But the main reason for the existence of this character is for decompositions involving the <vertical> tag (21 of them). The decomposition given for PRESENTATION FORM FOR VERTICAL WAVY LOW LINE loses information, so we replace it with

PRESENTATION FORM FOR VERTICAL WAVY LOW LINE=<vertical>+WAVY LOW LINE

Also, a few other characters

BLACK VERTICAL RECTANGLE=BLACK RECTANGLE+SEMANTIC ROTATED
CIRCLE WITH RIGHT HALF BLACK=CIRCLE WITH UPPER HALF BLACK+SEMANTIC ROTATED
LEFT FIVE EIGHTHS BLOCK=LOWER FIVE EIGHTHS BLOCK+SEMANTIC ROTATED
LEFT HALF BLOCK=LOWER HALF BLOCK+SEMANTIC ROTATED
LEFT ONE EIGHTH BLOCK=LOWER ONE EIGHTH BLOCK+SEMANTIC ROTATED
LEFT ONE QUARTER BLOCK=LOWER ONE QUARTER BLOCK+SEMANTIC ROTATED
LEFT SEVEN EIGHTHS BLOCK=LOWER SEVEN EIGHTHS BLOCK+SEMANTIC ROTATED
LEFT THREE EIGHTHS BLOCK=LOWER THREE EIGHTHS BLOCK+SEMANTIC ROTATED
LEFT THREE QUARTERS BLOCK=LOWER THREE QUARTERS BLOCK+SEMANTIC ROTATED
ROTATED FLORAL HEART BULLET=FLORAL HEART+SEMANTIC ROTATED+SEMANTIC ROTATED+SEMANTIC ROTATED
ROTATED HEAVY BLACK HEART BULLET=HEAVY BLACK HEART+SEMANTIC ROTATED+SEMANTIC ROTATED+SEMANTIC ROTATED
SQUARE WITH VERTICAL FILL=SQUARE WITH HORIZONTAL FILL+SEMANTIC ROTATED
UP RIGHT DIAGONAL ELLIPSIS=DOWN RIGHT DIAGONAL ELLIPSIS+SEMANTIC ROTATED
UPPER HALF BLOCK=LEFT HALF BLOCK+SEMANTIC ROTATED
UPPER ONE EIGHTH BLOCK=LEFT ONE EIGHTH BLOCK+SEMANTIC ROTATED
VERTICAL ELLIPSIS=HORIZONTAL ELLIPSIS+SEMANTIC ROTATED
WAVY LINE=WAVY DASH+SEMANTIC ROTATED
WREATH PRODUCT=TILDE OPERATOR+SEMANTIC ROTATED

Rotation can be done algorithmically, but is harder than INVERTED, REVERSED or TURNED because the resulting character has a different bounding box. This means it is not just a question of moving the ink around, but has wider implications for line-length etc. (This is related to the typographic point.)

If widely deployed, could be a very useful source of new symbols in many different disciplines.

SEMANTIC SANS-SERIF

This requests a sans-serif font to be used. Since there is no requirement in Unicode for a font to have serifs in the first place, this could easily be a null operation. However, the concept is in the U C S in the names of the dingbats DINGBAT CIRCLED SANS-SERIF DIGIT ONE--NUMBER TEN and DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE--NUMBER TEN. Even with that, it seems unlikely that anyone would support it as a character. If Dingbats are to be allowed decompositions (and there are good reasons to do so), maybe the sans-serif numbers could be decomposed using SEMANTIC VARIANT, together with SEMANTIC WHITE and COMBINING ENCLOSING CIRCLE.

DINGBAT CIRCLED SANS-SERIF DIGIT EIGHT=DIGIT EIGHT+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT FIVE=DIGIT FIVE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT FOUR=DIGIT FOUR+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT NINE=DIGIT NINE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT ONE=DIGIT ONE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT SEVEN=DIGIT SEVEN+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT SIX=DIGIT SIX+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT THREE=DIGIT THREE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF DIGIT TWO=DIGIT TWO+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT CIRCLED SANS-SERIF NUMBER TEN=START GROUP+DIGIT ONE+DIGIT ZERO+POP DIRECTIONAL FORMATTING+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT EIGHT=DIGIT EIGHT+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FIVE=DIGIT FIVE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FOUR=DIGIT FOUR+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE=DIGIT NINE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE=DIGIT ONE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SEVEN=DIGIT SEVEN+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SIX=DIGIT SIX+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT THREE=DIGIT THREE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT TWO=DIGIT TWO+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN=START GROUP+DIGIT ONE+DIGIT ZERO+POP DIRECTIONAL FORMATTING+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE

SEMANTIC SCRIPT

Requests a script font be used. Many script characters are already present:

LATIN CAPITAL LETTER V WITH HOOK=LATIN CAPITAL LETTER V+SEMANTIC SCRIPT
LATIN SMALL LETTER ALPHA=LATIN SMALL LETTER A+SEMANTIC SCRIPT
LATIN SMALL LETTER SCRIPT G=LATIN SMALL LETTER G+SEMANTIC SCRIPT
LATIN SMALL LETTER V WITH HOOK=LATIN SMALL LETTER V+SEMANTIC SCRIPT
SCRIPT CAPITAL B=LATIN CAPITAL LETTER B+SEMANTIC SCRIPT
SCRIPT CAPITAL E=LATIN CAPITAL LETTER E+SEMANTIC SCRIPT
SCRIPT CAPITAL F=LATIN CAPITAL LETTER F+SEMANTIC SCRIPT
SCRIPT CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC SCRIPT
SCRIPT CAPITAL I=LATIN CAPITAL LETTER I+SEMANTIC SCRIPT
SCRIPT CAPITAL L=LATIN CAPITAL LETTER L+SEMANTIC SCRIPT
SCRIPT CAPITAL M=LATIN CAPITAL LETTER M+SEMANTIC SCRIPT
SCRIPT CAPITAL P=LATIN SMALL LETTER P+SEMANTIC SCRIPT
SCRIPT CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC SCRIPT
SCRIPT SMALL E=LATIN SMALL LETTER E+SEMANTIC SCRIPT
SCRIPT SMALL L=LATIN SMALL LETTER L+SEMANTIC SCRIPT
SCRIPT SMALL O=LATIN SMALL LETTER O+SEMANTIC SCRIPT

(The v´s ``with hook´´ are really script letters, but there´s more on hooks later.)

SEMANTIC SHADOWED

Requests that a plinth-like shadow be drawn, with the glyph as the top surface. Conventionally, the light source is above and slightly to the left of the observer. (This can be changed by using TURNED, INVERTED or REVERSED.)

Could be used for

HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW=HEAVY RIGHTWARDS ARROW+SEMANTIC WHITE+SEMANTIC SHADOWED
SHADOWED WHITE CIRCLE=WHITE CIRCLE+SEMANTIC SHADOWED
SHADOWED WHITE LATIN CROSS=LATIN CROSS+SEMANTIC WHITE+SEMANTIC SHADOWED
SHADOWED WHITE STAR=WHITE STAR+SEMANTIC SHADOWED
LOWER RIGHT SHADOWED WHITE SQUARE=WHITE SQUARE+SEMANTIC SHADOWED

and there are some shadowed characters that have no base form: BACKTILTED SHADOWED WHITE RIGHTWARDS ARROW, FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW, NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW, NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW.

Hard to do well in software, but a missing shadow is not going to make the difference between comprehension and confusion, so the decompositions would be useful.

SEMANTIC SMALL

This SMALL just asks for a smaller version of the same character. It should not be confused with the SMALL in LATIN SMALL LETTER A, which means lower-case. It is also used, in conjunction with SEMANTIC FULLWIDTH, for decompositions including the tag <small>.

It is used in

BLACK DOWN-POINTING SMALL TRIANGLE=BLACK DOWN-POINTING TRIANGLE+SEMANTIC SMALL
BLACK LEFT-POINTING SMALL TRIANGLE=BLACK LEFT-POINTING TRIANGLE+SEMANTIC SMALL
BLACK RIGHT-POINTING SMALL TRIANGLE=BLACK RIGHT-POINTING TRIANGLE+SEMANTIC SMALL
BLACK SMALL SQUARE=BLACK SQUARE+SEMANTIC SMALL
BLACK UP-POINTING SMALL TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC SMALL
HIRAGANA LETTER SMALL A=HIRAGANA LETTER A+SEMANTIC SMALL
HIRAGANA LETTER SMALL E=HIRAGANA LETTER E+SEMANTIC SMALL
HIRAGANA LETTER SMALL I=HIRAGANA LETTER I+SEMANTIC SMALL
HIRAGANA LETTER SMALL O=HIRAGANA LETTER O+SEMANTIC SMALL
HIRAGANA LETTER SMALL TU=HIRAGANA LETTER TU+SEMANTIC SMALL
HIRAGANA LETTER SMALL U=HIRAGANA LETTER U+SEMANTIC SMALL
HIRAGANA LETTER SMALL WA=HIRAGANA LETTER WA+SEMANTIC SMALL
HIRAGANA LETTER SMALL YA=HIRAGANA LETTER YA+SEMANTIC SMALL
HIRAGANA LETTER SMALL YO=HIRAGANA LETTER YO+SEMANTIC SMALL
HIRAGANA LETTER SMALL YU=HIRAGANA LETTER YU+SEMANTIC SMALL
KATAKANA LETTER SMALL A=KATAKANA LETTER A+SEMANTIC SMALL
KATAKANA LETTER SMALL E=KATAKANA LETTER E+SEMANTIC SMALL
KATAKANA LETTER SMALL I=KATAKANA LETTER I+SEMANTIC SMALL
KATAKANA LETTER SMALL KA=KATAKANA LETTER KA+SEMANTIC SMALL
KATAKANA LETTER SMALL KE=KATAKANA LETTER KE+SEMANTIC SMALL
KATAKANA LETTER SMALL O=KATAKANA LETTER O+SEMANTIC SMALL
KATAKANA LETTER SMALL TU=KATAKANA LETTER TU+SEMANTIC SMALL
KATAKANA LETTER SMALL U=KATAKANA LETTER U+SEMANTIC SMALL
KATAKANA LETTER SMALL WA=KATAKANA LETTER WA+SEMANTIC SMALL
KATAKANA LETTER SMALL YA=KATAKANA LETTER YA+SEMANTIC SMALL
KATAKANA LETTER SMALL YO=KATAKANA LETTER YO+SEMANTIC SMALL
KATAKANA LETTER SMALL YU=KATAKANA LETTER YU+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL B=LATIN CAPITAL LETTER B+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL G WITH HOOK=LATIN CAPITAL LETTER G WITH HOOK+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL G=LATIN CAPITAL LETTER G+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL I=LATIN CAPITAL LETTER I+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL L=LATIN CAPITAL LETTER L+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL N=LATIN CAPITAL LETTER N+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL OE=LATIN CAPITAL LIGATURE OE+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC SMALL
LATIN LETTER SMALL CAPITAL Y=LATIN CAPITAL LETTER Y+SEMANTIC SMALL
LATIN SMALL LETTER KRA=LATIN CAPITAL LETTER K+SEMANTIC SMALL
MODIFIER LETTER DOWN TACK=DOWN TACK+SEMANTIC SMALL
MODIFIER LETTER MINUS SIGN=MINUS SIGN+SEMANTIC SMALL
MODIFIER LETTER PLUS SIGN=PLUS SIGN+SEMANTIC SMALL
MODIFIER LETTER UP TACK=UP TACK+SEMANTIC SMALL

SMALL ELEMENT OF and SMALL CONTAINS AS MEMBER are really presentation variants of GREEK SMALL LETTER EPSILON, treated below.

Although it seems to be an easy thing to make a character smaller by algorithm, things are not as easy as they may seem. Stroke widths should typically remain harmonious with the rest of the font, so a simple reduction of scale may not be appropriate.

Some of these characters are used in ways that may not be clear from their representation: for example, LATIN LETTER SMALL CAPITAL R (= LATIN CAPITAL LETTER R + SEMANTIC SMALL) is the lower case form of LATIN LETTER YR, and LATIN SMALL LETTER KRA (= LATIN CAPITAL LETTER K + SEMANTIC SMALL) is also a lower case letter, though its upper case form is not coded.

SEMANTIC SMALL LETTER TONE

This suggests, of a digit, that a variant glyph be used of a style suitable for marking Zhuang tone. It is for the following:

LATIN SMALL LETTER TONE FIVE=DIGIT FIVE+SEMANTIC SMALL LETTER TONE
LATIN SMALL LETTER TONE SIX=DIGIT SIX+SEMANTIC SMALL LETTER TONE
LATIN SMALL LETTER TONE TWO=DIGIT TWO+SEMANTIC SMALL LETTER TONE

There is a relationship between CYRILLIC SMALL LETTER CHE and DIGIT FOUR+SEMANTIC SMALL LETTER TONE, and also between CYRILLIC SMALL LETTER ZE and DIGIT THREE+SEMANTIC SMALL LETTER TONE, in that they are likely to be the same glyph; but it would be odd to give decompositions like (*)CYRILLIC SMALL LETTER CHE=DIGIT FOUR+SEMANTIC SMALL LETTER TONE, as this would imply the wrong historical relationship. Instead, uses of CYRILLIC SMALL LETTER CHE as a tone mark should simply be replaced by DIGIT FOUR+SEMANTIC SMALL LETTER TONE.

By encoding this character, it becomes possible for sophisticated software to render suitable glyphs for all the tone letters, without needing separate encodings for tones 3, 4.

SEMANTIC ABOVE, SEMANTIC AFTER, SEMANTIC BEFORE, SEMANTIC BELOW

Requests that characters be stacked in the given direction. The first character in the stack is placed at its normal position. The second is moved left, right, down or up to appear before, after, below or above the first. (`After´ and `before´ here refer to the current writing direction.)

This is another idea, like OVERPRINT, that could cause a lot of problems, as it is not obvious where a sensible place to stop might be.

Is an underlined character a character formed from COMBINING LOW LINE, or is it a down-stack with MINUS SIGN?

Can you make accented characters by stacking spacing accents above letters?

Is a LESS-THAN OR EQUAL TO sign a stack of a LESS-THAN SIGN and a MINUS SIGN? Although it looks like it in many fonts (including the one used in The Unicode Standard), we would really prefer it to be something to do with LESS-THAN SIGN and EQUALS SIGN, because that reflects the real meaning. (You might only get to see `<=´, which would still be very helpful.) As it´s also very often given its own glyph, e g with the underline parallel to the bottom part of the LESS-THAN SIGN, we prefer to regard this and similar characters as a ligature.

Despite these problems, the idea of a stack seems necessary. Consider the character EQUAL TO BY DEFINITION. This character is an equals sign with the small word `def´ on top of it. It seems ridiculous that this should be an atomic character, when the reason for its existence is the fact that d, e, f are the first 3 letters of the English word `definition´. Whichever mathematician invented that symbol was clearly ``sticking things together´´, and not just coming up with an arbitrary symbol from nowhere. Another mathematician might do a similar thing tomorrow, and it seems wrong that (in a perfectly logical world) that mathematician would have to get ``approval´´ from the Unicode Consortium (in the form of a character registration) before it could publish its book.

We need 4 different stacking characters to ensure visual harmony between the different presentation forms that can be generated. (Recall that the first character is rendered at its normal place, and the stack is built around it.) All are binary.

The SEMANTIC ABOVE concept has an antecedent in TEX, where it is called \buildrel.

The following are compositions using SEMANTIC ABOVE. In some cases (e g, MEASURED BY), there is a SEMANTIC SMALL for the second character.

ALL EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+REVERSED TILDE+SEMANTIC VARIANT+POP DIRECTIONAL FORMATTING
ALMOST EQUAL TO=TILDE OPERATOR+SEMANTIC ABOVE+TILDE OPERATOR
APPROACHES THE LIMIT=EQUALS SIGN+SEMANTIC ABOVE+DOT OPERATOR
APPROXIMATELY BUT NOT ACTUALLY EQUAL TO=NOT EQUAL TO+SEMANTIC ABOVE+TILDE OPERATOR
APPROXIMATELY EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+TILDE OPERATOR
ASYMPTOTICALLY EQUAL TO=MINUS SIGN+SEMANTIC ABOVE+TILDE OPERATOR
CORRESPONDS TO=EQUALS SIGN+SEMANTIC ABOVE+FROWN
DELTA EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+INCREMENT+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
DOT MINUS=MINUS SIGN+SEMANTIC ABOVE+DOT OPERATOR
DOT PLUS=PLUS SIGN+SEMANTIC ABOVE+DOT OPERATOR
DOUBLE INTERSECTION=INTERSECTION+SEMANTIC ABOVE+INTERSECTION
DOUBLE UNION=UNION+SEMANTIC ABOVE+UNION
EQUIANGULAR TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+LOGICAL OR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
EQUIVALENT TO=FROWN+SEMANTIC ABOVE+SMILE
ESTIMATES=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+LOGICAL AND+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
MEASURED BY=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+LATIN SMALL LETTER M+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
MINUS-OR-PLUS SIGN=PLUS SIGN+SEMANTIC ABOVE+MINUS SIGN
NAND=LOGICAL AND+SEMANTIC ABOVE+LOW LINE
NOR=LOGICAL OR+SEMANTIC ABOVE+LOW LINE
NORTH WEST ARROW TO LONG BAR=NORTH WEST ARROW+COMBINING OVERLINE
PERSPECTIVE=UP ARROWHEAD+SEMANTIC ABOVE+LOW LINE+SEMANTIC ABOVE+LOW LINE
PROJECTIVE=UP ARROWHEAD+SEMANTIC ABOVE+LOW LINE
QUESTIONED EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+QUESTION MARK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
REVERSED TILDE EQUALS=MINUS SIGN+SEMANTIC ABOVE+REVERSED TILDE
RING EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+RING OPERATOR
STAR EQUALS=EQUALS SIGN+SEMANTIC ABOVE+STAR OPERATOR
TRIPLE TILDE=TILDE OPERATOR+SEMANTIC ABOVE+TILDE OPERATOR+SEMANTIC ABOVE+TILDE OPERATOR

The following use SEMANTIC BELOW:

DOUBLE LOW LINE=LOW LINE+COMBINING LOW LINE
DOUBLE WAVY OVERLINE=WAVY OVERLINE+SEMANTIC ABOVE+WAVY OVERLINE
GREATER-THAN BUT NOT EQUAL TO=GREATER-THAN SIGN+SEMANTIC BELOW+NOT EQUAL TO
GREATER-THAN OVER EQUAL TO=GREATER-THAN SIGN+SEMANTIC BELOW+EQUALS SIGN
LEFTWARDS ARROW OVER RIGHTWARDS ARROW=LEFTWARDS ARROW+SEMANTIC BELOW+RIGHTWARDS ARROW
LEFTWARDS ARROW TO BAR OVER RIGHTWARDS ARROW TO BAR=LEFTWARDS ARROW TO BAR+SEMANTIC BELOW+RIGHTWARDS ARROW TO BAR
LEFTWARDS HARPOON OVER RIGHTWARDS HARPOON=LEFTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC BELOW+RIGHTWARDS HARPOON WITH BARB DOWNWARDS
LESS-THAN BUT NOT EQUAL TO=LESS-THAN SIGN+SEMANTIC BELOW+NOT EQUAL TO
LESS-THAN OVER EQUAL TO=LESS-THAN SIGN+SEMANTIC BELOW+EQUALS SIGN
MINUS TILDE=MINUS SIGN+SEMANTIC BELOW+TILDE OPERATOR
PLUS-MINUS SIGN=PLUS SIGN+SEMANTIC BELOW+MINUS SIGN
RIGHTWARDS ARROW OVER LEFTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC BELOW+LEFTWARDS ARROW
RIGHTWARDS HARPOON OVER LEFTWARDS HARPOON=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC BELOW+LEFTWARDS HARPOON WITH BARB DOWNWARDS
UP DOWN ARROW WITH BASE=UP DOWN ARROW+COMBINING LOW LINE
UPWARDS ARROW FROM BAR=UPWARDS ARROW+COMBINING LOW LINE
XOR=LOGICAL OR+COMBINING LOW LINE

Some characters need both:

DIVISION SIGN=MINUS SIGN+SEMANTIC ABOVE+DOT OPERATOR+SEMANTIC BELOW+DOT OPERATOR
GEOMETRICALLY EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+DOT OPERATOR+SEMANTIC BELOW+DOT OPERATOR

(This assumes that SEMANTIC characters, if viewed as operators, are of equal precedence and associate to the left.) And of course there is character that caused all these problems:

EQUAL TO BY DEFINITION=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+START GROUP+LATIN SMALL LETTER D+LATIN SMALL LETTER E+LATIN SMALL LETTER F+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING

(it seems horrible, but I see no real alternative). An unsophisticated rendering engine will be able to make a shot at this as `=def´, which seems like as good a result as one might hope for.

Many ``double´´ or ``triple´´ characters are made with AFTER:

ASTERISM=ASTERISK OPERATOR+SEMANTIC SMALL+SEMANTIC AFTER+START GROUP+ASTERISK OPERATOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING+SEMANTIC ABOVE+START GROUP+ASTERISK OPERATOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
DOUBLE EXCLAMATION MARK=EXCLAMATION MARK+SEMANTIC AFTER+EXCLAMATION MARK
DOUBLE HIGH-REVERSED-9 QUOTATION MARK=SINGLE HIGH-REVERSED-9 QUOTATION MARK+SEMANTIC AFTER+SINGLE HIGH-REVERSED-9 QUOTATION MARK
DOUBLE LOW-9 QUOTATION MARK=SINGLE LOW-9 QUOTATION MARK+SEMANTIC AFTER+SINGLE LOW-9 QUOTATION MARK
DOUBLE PRIME=PRIME+SEMANTIC AFTER+PRIME
DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE=VERTICAL LINE+SEMANTIC AFTER+TRUE
DOUBLE VERTICAL LINE=VERTICAL LINE+SEMANTIC AFTER+VERTICAL LINE
DOWNWARDS PAIRED ARROWS=DOWNWARDS ARROW+SEMANTIC AFTER+DOWNWARDS ARROW
EQUALS COLON=EQUALS SIGN+SEMANTIC AFTER+COLON
EXCESS=MINUS SIGN+SEMANTIC AFTER+COLON
FORCES=VERTICAL LINE+SEMANTIC AFTER+ASSERTION
IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL=SALTIRE+SEMANTIC AFTER+SALTIRE
LATIN LETTER LATERAL CLICK=LATIN LETTER DENTAL CLICK+SEMANTIC AFTER+LATIN LETTER DENTAL CLICK
LEFT DOUBLE ANGLE BRACKET=LEFT ANGLE BRACKET+SEMANTIC AFTER+LEFT ANGLE BRACKET
LEFT DOUBLE QUOTATION MARK=LEFT SINGLE QUOTATION MARK+SEMANTIC AFTER+LEFT SINGLE QUOTATION MARK
LEFT-POINTING DOUBLE ANGLE QUOTATION MARK=SINGLE LEFT-POINTING ANGLE QUOTATION MARK+SEMANTIC AFTER+SINGLE LEFT-POINTING ANGLE QUOTATION MARK
LOW DOUBLE PRIME QUOTATION MARK=MODIFIER LETTER PRIME+SEMANTIC SUBSCRIPT+SEMANTIC AFTER+MODIFIER LETTER PRIME+SEMANTIC SUBSCRIPT
PARALLEL TO=DIVIDES+SEMANTIC AFTER+DIVIDES
PROPORTION=RATIO+SEMANTIC AFTER+RATIO
QUOTATION MARK=APOSTROPHE+SEMANTIC AFTER+APOSTROPHE
RIGHT DOUBLE ANGLE BRACKET=RIGHT ANGLE BRACKET+SEMANTIC AFTER+RIGHT ANGLE BRACKET
RIGHT DOUBLE QUOTATION MARK=RIGHT SINGLE QUOTATION MARK+SEMANTIC AFTER+RIGHT SINGLE QUOTATION MARK
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK=SINGLE RIGHT-POINTING ANGLE QUOTATION MARK+SEMANTIC AFTER+SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
TIBETAN MARK NYIS SHAD=TIBETAN MARK SHAD+SEMANTIC AFTER+TIBETAN MARK SHAD
TRIPLE PRIME=PRIME+SEMANTIC AFTER+PRIME+SEMANTIC AFTER+PRIME
TRIPLE VERTICAL BAR RIGHT TURNSTILE=VERTICAL LINE+SEMANTIC AFTER+VERTICAL LINE+SEMANTIC AFTER+ASSERTION
UPWARDS ARROW LEFTWARDS OF DOWNWARDS ARROW=UPWARDS ARROW+SEMANTIC AFTER+DOWNWARDS ARROW
UPWARDS PAIRED ARROWS=UPWARDS ARROW+SEMANTIC AFTER+UPWARDS ARROW

Some characters would class as ligatures, execpt that there is no modification involved---one is just written straight after the other.

LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON=LATIN CAPITAL LETTER D+SEMANTIC AFTER+START GROUP+LATIN SMALL LETTER Z+COMBINING CARON+POP DIRECTIONAL FORMATTING
LATIN CAPITAL LETTER D WITH SMALL LETTER Z=LATIN CAPITAL LETTER D+SEMANTIC AFTER+LATIN SMALL LETTER Z
LATIN CAPITAL LETTER DZ WITH CARON=LATIN CAPITAL LETTER D+SEMANTIC AFTER+START GROUP+LATIN CAPITAL LETTER Z+COMBINING CARON+POP DIRECTIONAL FORMATTING
LATIN CAPITAL LETTER DZ=LATIN CAPITAL LETTER D+SEMANTIC AFTER+LATIN CAPITAL LETTER Z
LATIN CAPITAL LETTER L WITH SMALL LETTER J=LATIN CAPITAL LETTER L+SEMANTIC AFTER+LATIN SMALL LETTER J
LATIN CAPITAL LETTER LJ=LATIN CAPITAL LETTER L+SEMANTIC AFTER+LATIN CAPITAL LETTER J
LATIN CAPITAL LETTER N WITH SMALL LETTER J=LATIN CAPITAL LETTER N+SEMANTIC AFTER+LATIN SMALL LETTER J
LATIN CAPITAL LETTER NJ=LATIN CAPITAL LETTER N+SEMANTIC AFTER+LATIN CAPITAL LETTER J
LATIN SMALL LETTER DZ WITH CARON=LATIN SMALL LETTER D+SEMANTIC AFTER+START GROUP+LATIN SMALL LETTER Z+COMBINING CARON+POP DIRECTIONAL FORMATTING
LATIN SMALL LETTER DZ=LATIN SMALL LETTER D+SEMANTIC AFTER+LATIN SMALL LETTER Z
LATIN SMALL LETTER LJ=LATIN SMALL LETTER L+SEMANTIC AFTER+LATIN SMALL LETTER J
LATIN SMALL LETTER NJ=LATIN SMALL LETTER N+SEMANTIC AFTER+LATIN SMALL LETTER J

There are even a few characters which seem to use BEFORE in a natural way.

CUBE ROOT=SQUARE ROOT+SEMANTIC BEFORE+SUPERSCRIPT THREE
FOURTH ROOT=SQUARE ROOT+SEMANTIC BEFORE+SUPERSCRIPT FOUR

If widely deployed, the stack operations could cause no end of havoc by encouraging the creation of new ``symbols´´ in a very uncontrolled way. (On the other hand, maybe that´s a good thing.)

On the third hand, once we have accepted that this is a necessary operation, we could follow through and complete the job. If we consider PRESENTATION ABOVE to be a legitimate composition tool, we can see that a character like LATIN CAPITAL LETTER A WITH GRAVE is really just LATIN CAPITAL LETTER A+SEMANTIC ABOVE+GRAVE ACCENT. Since we already know that LATIN CAPITAL LETTER A WITH GRAVE=LATIN CAPITAL LETTER A+COMBINING GRAVE ACCENT, we are led to conclude that COMBINING GRAVE ACCENT=SEMANTIC ABOVE+GRAVE ACCENT. After a little consideration, this starts seeming to be a more natural view than the truth (which is that GRAVE ACCENT=SPACE+COMBINING GRAVE ACCENT), as it could allow us to decompose all combining characters into sequences involving ABOVE, BELOW and OVERPRINT. This approach allows a font designer to design just 1 ACUTE ACENT glyph, which can then be used automatically in all the places where it is right to do so.

COMBINING ACUTE ACCENT BELOW=SEMANTIC BELOW+ACUTE ACCENT
COMBINING ACUTE ACCENT=SEMANTIC ABOVE+ACUTE ACCENT
COMBINING ACUTE TONE MARK=SEMANTIC ABOVE+ACUTE ACCENT
COMBINING ANTICLOCKWISE ARROW ABOVE=SEMANTIC ABOVE+ANTICLOCKWISE TOP SEMICIRCLE ARROW
COMBINING ANTICLOCKWISE RING OVERLAY=SEMANTIC OVERPRINT+ANTICLOCKWISE OPEN CIRCLE ARROW
COMBINING BREVE BELOW=SEMANTIC BELOW+BREVE
COMBINING BREVE=SEMANTIC ABOVE+BREVE
COMBINING BRIDGE BELOW=SEMANTIC BELOW+START GROUP+OPEN BOX+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING
COMBINING CANDRABINDU=COMBINING BREVE+COMBINING DOT ABOVE
COMBINING CARON BELOW=SEMANTIC BELOW+CARON
COMBINING CARON=SEMANTIC ABOVE+CARON
COMBINING CEDILLA=SEMANTIC BELOW+CEDILLA
COMBINING CIRCUMFLEX ACCENT BELOW=SEMANTIC BELOW+MODIFIER LETTER CIRCUMFLEX ACCENT
COMBINING CIRCUMFLEX ACCENT=SEMANTIC ABOVE+MODIFIER LETTER CIRCUMFLEX ACCENT
COMBINING CLOCKWISE ARROW ABOVE=SEMANTIC ABOVE+CLOCKWISE TOP SEMICIRCLE ARROW
COMBINING CLOCKWISE RING OVERLAY=SEMANTIC OVERPRINT+CLOCKWISE OPEN CIRCLE ARROW
COMBINING COMMA ABOVE RIGHT=SEMANTIC AFTER+RIGHT SINGLE QUOTATION MARK
COMBINING COMMA ABOVE=SEMANTIC ABOVE+COMMA
COMBINING COMMA BELOW=SEMANTIC BELOW+COMMA
COMBINING CYRILLIC DASIA PNEUMATA=SEMANTIC ABOVE+START GROUP+RIGHT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING CYRILLIC PSILI PNEUMATA=SEMANTIC ABOVE+START GROUP+LEFT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING DIAERESIS BELOW=SEMANTIC BELOW+DIAERESIS
COMBINING DIAERESIS=SEMANTIC ABOVE+DIAERESIS
COMBINING DOT ABOVE=SEMANTIC ABOVE+DOT ABOVE
COMBINING DOT BELOW=SEMANTIC BELOW+DOT ABOVE
COMBINING DOUBLE ACUTE ACCENT=SEMANTIC ABOVE+DOUBLE ACUTE ACCENT
COMBINING DOUBLE GRAVE ACCENT=SEMANTIC ABOVE+START GROUP+GRAVE ACCENT+SEMANTIC AFTER+GRAVE ACCENT+POP DIRECTIONAL FORMATTING
COMBINING DOUBLE LOW LINE=SEMANTIC BELOW+DOUBLE LOW LINE
COMBINING DOUBLE OVERLINE=SEMANTIC ABOVE+LOW LINE+SEMANTIC ABOVE+LOW LINE
COMBINING DOUBLE VERTICAL LINE ABOVE=SEMANTIC ABOVE+START GROUP+MODIFIER LETTER VERTICAL LINE+MODIFIER LETTER VERTICAL LINE+POP DIRECTIONAL FORMATTING
COMBINING DOWN TACK BELOW=SEMANTIC BELOW+START GROUP+DOWN TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING ENCLOSING CIRCLE BACKSLASH=SEMANTIC OVERPRINT+LARGE CIRCLE+SEMANTIC OVERPRINT+REVERSE SOLIDUS
COMBINING ENCLOSING CIRCLE=SEMANTIC OVERPRINT+LARGE CIRCLE
COMBINING ENCLOSING DIAMOND=SEMANTIC OVERPRINT+LOZENGE
COMBINING ENCLOSING SQUARE=SEMANTIC OVERPRINT+BALLOT BOX
COMBINING FOUR DOTS ABOVE=SEMANTIC ABOVE+START GROUP+DOT ABOVE+DOT ABOVE+DOT ABOVE+DOT ABOVE+POP DIRECTIONAL FORMATTING
COMBINING GRAVE ACCENT BELOW=SEMANTIC BELOW+GRAVE ACCENT
COMBINING GRAVE ACCENT=SEMANTIC ABOVE+GRAVE ACCENT
COMBINING GRAVE TONE MARK=SEMANTIC ABOVE+GRAVE ACCENT
COMBINING GREEK DIALYTIKA TONOS=COMBINING DIAERESIS+COMBINING VERTICAL LINE ABOVE
COMBINING GREEK KORONIS=SEMANTIC ABOVE+GREEK KORONIS
COMBINING GREEK PERISPOMENI=SEMANTIC ABOVE+GREEK PERISPOMENI
COMBINING GREEK YPOGEGRAMMENI=SEMANTIC BELOW+GREEK YPOGEGRAMMENI
COMBINING HOOK ABOVE=SEMANTIC ABOVE+MODIFIER LETTER GLOTTAL STOP
COMBINING INVERTED BREVE BELOW=SEMANTIC BELOW+START GROUP+BREVE+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING
COMBINING INVERTED BREVE=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING
COMBINING INVERTED BRIDGE BELOW=SEMANTIC BELOW+OPEN BOX
COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK=SEMANTIC AFTER+KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK=SEMANTIC AFTER+KATAKANA-HIRAGANA VOICED SOUND MARK
COMBINING LEFT ANGLE ABOVE=SEMANTIC ABOVE+START GROUP+NOT SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING LEFT ARROW ABOVE=SEMANTIC ABOVE+START GROUP+LEFTWARDS ARROW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING LEFT HALF RING BELOW=SEMANTIC BELOW+MODIFIER LETTER CENTRED LEFT HALF RING
COMBINING LEFT HARPOON ABOVE=SEMANTIC ABOVE+START GROUP+LEFTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING LEFT RIGHT ARROW ABOVE=SEMANTIC ABOVE+START GROUP+LEFT RIGHT ARROW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING LEFT TACK BELOW=SEMANTIC BELOW+START GROUP+LEFT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING LONG SOLIDUS OVERLAY=SEMANTIC OVERPRINT+SOLIDUS
COMBINING LONG STROKE OVERLAY=SEMANTIC OVERPRINT+EN DASH
COMBINING LONG VERTICAL LINE OVERLAY=SEMANTIC OVERPRINT+VERTICAL LINE
COMBINING LOW LINE=SEMANTIC BELOW+LOW LINE
COMBINING MACRON BELOW=SEMANTIC BELOW+MACRON
COMBINING MACRON=SEMANTIC ABOVE+MACRON
COMBINING MINUS SIGN BELOW=SEMANTIC BELOW+START GROUP+MINUS SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING OGONEK=SEMANTIC BELOW+OGONEK
COMBINING OVERLINE=SEMANTIC ABOVE+LOW LINE
COMBINING PLUS SIGN BELOW=SEMANTIC BELOW+START GROUP+PLUS SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING REVERSED COMMA ABOVE=SEMANTIC ABOVE+START GROUP+COMMA+SEMANTIC REVERSED+POP DIRECTIONAL FORMATTING
COMBINING RIGHT ARROW ABOVE=SEMANTIC ABOVE+START GROUP+RIGHTWARDS ARROW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING RIGHT HALF RING BELOW=SEMANTIC BELOW+MODIFIER LETTER CENTRED RIGHT HALF RING
COMBINING RIGHT HARPOON ABOVE=SEMANTIC ABOVE+START GROUP+RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING RIGHT TACK BELOW=SEMANTIC BELOW+START GROUP+RIGHT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING RING ABOVE=SEMANTIC ABOVE+RING ABOVE
COMBINING RING BELOW=SEMANTIC BELOW+RING ABOVE
COMBINING RING OVERLAY=SEMANTIC OVERPRINT+RING OPERATOR
COMBINING SHORT STROKE OVERLAY=SEMANTIC OVERPRINT+NON-BREAKING HYPHEN
COMBINING SHORT VERTICAL LINE OVERLAY=SEMANTIC OVERPRINT+VERTICAL STROKE
COMBINING SQUARE BELOW=SEMANTIC BELOW+START GROUP+BALLOT BOX+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING THREE DOTS ABOVE=SEMANTIC ABOVE+START GROUP+DOT ABOVE+DOT ABOVE+DOT ABOVE+POP DIRECTIONAL FORMATTING
COMBINING TILDE BELOW=SEMANTIC BELOW+SMALL TILDE
COMBINING TILDE OVERLAY=SEMANTIC OVERPRINT+TILDE OPERATOR
COMBINING TILDE=SEMANTIC ABOVE+SMALL TILDE
COMBINING TURNED COMMA ABOVE=SEMANTIC ABOVE+START GROUP+COMMA+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING
COMBINING UP TACK BELOW=SEMANTIC BELOW+START GROUP+UP TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
COMBINING VERTICAL LINE ABOVE=SEMANTIC ABOVE+MODIFIER LETTER VERTICAL LINE
COMBINING VERTICAL LINE BELOW=SEMANTIC BELOW+MODIFIER LETTER LOW VERTICAL LINE
COMBINING VERTICAL TILDE=SEMANTIC ABOVE+START GROUP+SMALL TILDE+SEMANTIC ROTATED+POP DIRECTIONAL FORMATTING
COMBINING X ABOVE=SEMANTIC ABOVE+START GROUP+MULTIPLICATION SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HANGUL DOUBLE DOT TONE MARK=SEMANTIC BEFORE+COLON
HANGUL SINGLE DOT TONE MARK=SEMANTIC BEFORE+MIDDLE DOT
IDEOGRAPHIC DEPARTING TONE MARK=SEMANTIC AFTER+RING ABOVE
IDEOGRAPHIC ENTERING TONE MARK=SEMANTIC AFTER+START GROUP+ZERO WIDTH NO-BREAK SPACE+COMBINING RING BELOW+POP DIRECTIONAL FORMATTING
IDEOGRAPHIC LEVEL TONE MARK=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+COMBINING RING BELOW+POP DIRECTIONAL FORMATTING
IDEOGRAPHIC RISING TONE MARK=SEMANTIC BEFORE+RING ABOVE
LATIN CAPITAL LETTER L WITH MIDDLE DOT=LATIN CAPITAL LETTER L+SEMANTIC AFTER+MIDDLE DOT
LATIN SMALL LETTER L WITH MIDDLE DOT=LATIN SMALL LETTER L+SEMANTIC AFTER+MIDDLE DOT
LATIN SMALL LETTER N PRECEDED BY APOSTROPHE=LATIN SMALL LETTER N+SEMANTIC BEFORE+RIGHT SINGLE QUOTATION MARK
TIBETAN MARK NGAS BZUNG NYI ZLA=COMBINING RING BELOW+COMBINING BREVE BELOW
TIBETAN MARK NGAS BZUNG SGOR RTAGS=COMBINING RING BELOW
TIBETAN SIGN RJES SU NGA RO=COMBINING RING ABOVE
TIBETAN SIGN SNA LDAN=COMBINING BREVE+COMBINING RING ABOVE
TIBETAN SIGN YANG RTAGS=COMBINING VERTICAL LINE ABOVE
TIBETAN VOWEL SIGN EE=TIBETAN VOWEL SIGN E+TIBETAN VOWEL SIGN E
TIBETAN VOWEL SIGN OO=TIBETAN VOWEL SIGN O+TIBETAN VOWEL SIGN O

The characters COMBINING DOUBLE TILDE and COMBINING DOUBLE INVERTED BREVE already have their own syntax (defined in section 3.9 of Unicode 2.0). For the purposes of the Atomic Theory, we prefer to brush these under the carpet: since they don´t fit the theory, we just pretend they don´t exist. (This isn´t as silly as it sounds: since we have a grouping mechanism, we can use that instead.) The 2 examples given on p3-9 of The Unicode Standard, Version 2·0 (LATIN SMALL LETTER O, COMBINING CIRCUMFLEX, COMBINING DOUBLE TILDE, LATIN SMALL LETTER O, COMBINING DIAERESIS; and LATIN SMALL LETTER O, COMBINING DOUBLE TILDE, COMBINING CIRCUMFLEX, LATIN SMALL LETTER O, COMBINING DIAERESIS) would be replaced by START GROUP, LATIN SMALL LETTER O, COMBINING CIRCUMFLEX, LATIN SMALL LETTER O, COMBINING DIAERESIS, POP DIRECTIONAL FORMATTING, COMBINING TILDE, where the renderer is assumed to be clever enough to work out that it needs an extra-big tilde to cover a group of 2 characters. (Or not, of course.)

In order to at least do something presentable with these characters, we say that

COMBINING DOUBLE INVERTED BREVE=COMBINING INVERTED BREVE
COMBINING DOUBLE TILDE=COMBINING TILDE

Since WHITE CIRCLE, WHITE DIAMOND and WHITE SQUARE are themselves composite, we prefer to regard COMBINING ENCLOSING CIRCLE, COMBINING ENCLOSING DIAMOND and COMBINING ENCLOSING SQUARE as composed from LARGE CIRCLE, LOZENGE and BALLOT BOX respectively.

We also have to delete the decompositions for everything named on the right above (the ones starting with SPACE), or we get infinite regressions of decomposition.

ACUTE ACCENT=
BREVE=
CEDILLA=
CENTRELINE LOW LINE=
DASHED LOW LINE=
DIAERESIS=DOT ABOVE+SEMANTIC AFTER+DOT ABOVE
DOT ABOVE=
DOUBLE ACUTE ACCENT=ACUTE ACCENT+SEMANTIC AFTER+ACUTE ACCENT
KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK=
KATAKANA-HIRAGANA VOICED SOUND MARK=
LOW LINE=
MACRON=
MODIFIER LETTER CIRCUMFLEX ACCENT=
OGONEK=
RING ABOVE=
SMALL TILDE=
WAVY LOW LINE=

The Tibetan subjoined letters are BELOW.

TIBETAN SUBJOINED LETTER KA=SEMANTIC BELOW+TIBETAN LETTER KA
TIBETAN SUBJOINED LETTER KHA=SEMANTIC BELOW+TIBETAN LETTER KHA
TIBETAN SUBJOINED LETTER GA=SEMANTIC BELOW+TIBETAN LETTER GA
TIBETAN SUBJOINED LETTER NGA=SEMANTIC BELOW+TIBETAN LETTER NGA
TIBETAN SUBJOINED LETTER CA=SEMANTIC BELOW+TIBETAN LETTER CA
TIBETAN SUBJOINED LETTER JA=SEMANTIC BELOW+TIBETAN LETTER JA
TIBETAN SUBJOINED LETTER NYA=SEMANTIC BELOW+TIBETAN LETTER NYA
TIBETAN SUBJOINED LETTER TTA=SEMANTIC BELOW+TIBETAN LETTER TTA
TIBETAN SUBJOINED LETTER TTHA=SEMANTIC BELOW+TIBETAN LETTER TTHA
TIBETAN SUBJOINED LETTER DDA=SEMANTIC BELOW+TIBETAN LETTER DDA
TIBETAN SUBJOINED LETTER NNA=SEMANTIC BELOW+TIBETAN LETTER NNA
TIBETAN SUBJOINED LETTER TA=SEMANTIC BELOW+TIBETAN LETTER TA
TIBETAN SUBJOINED LETTER THA=SEMANTIC BELOW+TIBETAN LETTER THA
TIBETAN SUBJOINED LETTER DA=SEMANTIC BELOW+TIBETAN LETTER DA
TIBETAN SUBJOINED LETTER NA=SEMANTIC BELOW+TIBETAN LETTER NA
TIBETAN SUBJOINED LETTER PA=SEMANTIC BELOW+TIBETAN LETTER PA
TIBETAN SUBJOINED LETTER PHA=SEMANTIC BELOW+TIBETAN LETTER PHA
TIBETAN SUBJOINED LETTER BA=SEMANTIC BELOW+TIBETAN LETTER BA
TIBETAN SUBJOINED LETTER MA=SEMANTIC BELOW+TIBETAN LETTER MA
TIBETAN SUBJOINED LETTER TSA=SEMANTIC BELOW+TIBETAN LETTER TSA
TIBETAN SUBJOINED LETTER TSHA=SEMANTIC BELOW+TIBETAN LETTER TSHA
TIBETAN SUBJOINED LETTER DZA=SEMANTIC BELOW+TIBETAN LETTER DZA
TIBETAN SUBJOINED LETTER WA=SEMANTIC BELOW+TIBETAN LETTER WA
TIBETAN SUBJOINED LETTER YA=SEMANTIC BELOW+TIBETAN LETTER YA
TIBETAN SUBJOINED LETTER RA=SEMANTIC BELOW+TIBETAN LETTER RA
TIBETAN SUBJOINED LETTER LA=SEMANTIC BELOW+TIBETAN LETTER LA
TIBETAN SUBJOINED LETTER SHA=SEMANTIC BELOW+TIBETAN LETTER SHA
TIBETAN SUBJOINED LETTER SSA=SEMANTIC BELOW+TIBETAN LETTER SSA
TIBETAN SUBJOINED LETTER SA=SEMANTIC BELOW+TIBETAN LETTER SA
TIBETAN SUBJOINED LETTER HA=SEMANTIC BELOW+TIBETAN LETTER HA

A glyph may be moved vertically relative to the baseline, without being changed in size or orientation. We can use this idea to ``decompose´´ many characters which share the same glyph, but at different positions. We take the most central one as the base form. The raised one is then considered to be stacked ABOVE an invisible character of height 1ex. (We use ZERO WIDTH NO-BREAK SPACE for this.) The lowered one is considered to be stacked below the same character. The idea that a space character might have height is a little strange, but it allows to to ``reuse´´ many glyph designs, for even more characters.

CENTRELINE OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+CENTRELINE LOW LINE
DASHED OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+DASHED LOW LINE
MODIFIER LETTER LEFT HALF RING=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+MODIFIER LETTER CENTRED LEFT HALF RING
MODIFIER LETTER LOW ACUTE ACCENT=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+ACUTE ACCENT
MODIFIER LETTER LOW GRAVE ACCENT=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+GRAVE ACCENT
MODIFIER LETTER LOW MACRON=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+MACRON
MODIFIER LETTER RIGHT HALF RING=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+MODIFIER LETTER CENTRED RIGHT HALF RING
OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+LOW LINE
WAVY OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+WAVY LOW LINE

SEMANTIC SUBSCRIPT

Requests that a character be rendered at a smaller size, and with a lower baseline.

Would be used in all characters whose decomposition includes <sub> (there are 15 of these) as well as

GREEK LOWER NUMERAL SIGN=<sub>+MODIFIER LETTER PRIME
MODIFIER LETTER LOW VERTICAL LINE=<sub>+VERTICAL LINE

Easy to do algorithmically.

If used but not recognised, the resulting test will be wrong, but still better than if a substitute character was used.

SEMANTIC SUPERSCRIPT

Requests that a character be rendered at a smaller size, and above the baseline. A superscript is equivalent to a raised subscript.

Would be used in all characters whose decomposition includes <super> (there are about 50 of these) as well as:

ASTERISK=<super>+ASTERISK OPERATOR
CIRCUMFLEX ACCENT=<super>+UP ARROWHEAD
DEGREE SIGN=<super>+RING OPERATOR
MODIFIER LETTER VERTICAL LINE=<super>+VERTICAL LINE
PRIME=<super>+MODIFIER LETTER PRIME
TILDE=<super>+TILDE OPERATOR

The character used as a tilde accent is SMALL TILDE, not this TILDE. The character TILDE is a mixed-use character, so we may as well make it look consistent with ASTERISK (which is ``clearly´´ a superscript of some sort, and ASTERISK OPERATOR is the only possibility) and DEGREE SIGN. The same applies to circumflex---it appears that MODIFIER LETTER CIRCUMFLEX ACCENT is the recommended character for making accents, as CIRCUMFLEX ACCENT is very ugly. The idea that PRIME is a superscript is due to TEX, and extremely convincing.

Easy to do algorithmically.

If used but not recognised, the resulting test will be wrong, but still better than if a substitute character was used.

SEMANTIC TURNED

Rotates the character through half a turn in its own plane. Equivalent to REVERSED followed by INVERTED, or to ROTATED twice.

Could be used in:

BECAUSE=THEREFORE+SEMANTIC TURNED
BLACK DOWN-POINTING TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC TURNED
BLACK LEFT-POINTING POINTER=BLACK RIGHT-POINTING POINTER+SEMANTIC TURNED
BLACK LEFT-POINTING TRIANGLE=BLACK RIGHT-POINTING TRIANGLE+SEMANTIC TURNED
BLACK LOWER LEFT TRIANGLE=BLACK UPPER RIGHT TRIANGLE+SEMANTIC TURNED
BOTTOM LEFT CORNER=TOP RIGHT CORNER+SEMANTIC TURNED
BOTTOM LEFT CROP=TOP RIGHT CROP+SEMANTIC TURNED
CIRCLE WITH LEFT HALF BLACK=CIRCLE WITH RIGHT HALF BLACK+SEMANTIC TURNED
CIRCLE WITH LOWER HALF BLACK=CIRCLE WITH UPPER HALF BLACK+SEMANTIC TURNED
CONTAINS AS MEMBER=ELEMENT OF+SEMANTIC TURNED
CONTAINS AS NORMAL SUBGROUP=NORMAL SUBGROUP OF+SEMANTIC TURNED
DESCENDING NODE=ASCENDING NODE+SEMANTIC TURNED
DOWN ARROWHEAD=UP ARROWHEAD+SEMANTIC TURNED
DOWN TACK=UP TACK+SEMANTIC TURNED
DOWNWARDS ARROW FROM BAR=UPWARDS ARROW FROM BAR+SEMANTIC TURNED
DOWNWARDS ARROW=UPWARDS ARROW+SEMANTIC TURNED
DOWNWARDS DASHED ARROW=UPWARDS DASHED ARROW+SEMANTIC TURNED
DOWNWARDS HARPOON WITH BARB RIGHTWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC TURNED
DOWNWARDS TWO HEADED ARROW=UPWARDS TWO HEADED ARROW+SEMANTIC TURNED
ERASE TO THE RIGHT=ERASE TO THE LEFT+SEMANTIC TURNED
FOR ALL=LATIN CAPITAL LETTER A+SEMANTIC TURNED
FROWN=SMILE+SEMANTIC TURNED
GREATER-THAN SIGN=LESS-THAN SIGN+SEMANTIC TURNED
INTERSECTION=UNION+SEMANTIC TURNED
INVERTED EXCLAMATION MARK=EXCLAMATION MARK+SEMANTIC TURNED
INVERTED OHM SIGN=OHM SIGN+SEMANTIC TURNED
INVERTED QUESTION MARK=QUESTION MARK+SEMANTIC TURNED
LAST QUARTER MOON=FIRST QUARTER MOON+SEMANTIC TURNED
LATIN CAPITAL LETTER OPEN O=LATIN CAPITAL LETTER C+SEMANTIC TURNED
LATIN CAPITAL LETTER REVERSED E=LATIN CAPITAL LETTER E+SEMANTIC TURNED
LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK=LATIN SMALL LETTER F WITH HOOK+SEMANTIC TURNED
LATIN SMALL LETTER DOTLESS J WITH STROKE=LATIN SMALL LETTER F+SEMANTIC TURNED
LATIN SMALL LETTER OPEN O=LATIN SMALL LETTER C+SEMANTIC TURNED
LATIN SMALL LETTER SCHWA=LATIN SMALL LETTER E+SEMANTIC TURNED
LATIN SMALL LETTER TURNED A=LATIN SMALL LETTER A+SEMANTIC TURNED
LATIN SMALL LETTER TURNED ALPHA=LATIN SMALL LETTER ALPHA+SEMANTIC TURNED
LATIN SMALL LETTER TURNED DELTA=GREEK SMALL LETTER DELTA+SEMANTIC TURNED
LATIN SMALL LETTER TURNED E=LATIN SMALL LETTER E+SEMANTIC TURNED
LATIN SMALL LETTER TURNED H=LATIN SMALL LETTER H+SEMANTIC TURNED
LATIN SMALL LETTER TURNED K=LATIN SMALL LETTER K+SEMANTIC TURNED
LATIN SMALL LETTER TURNED M=LATIN SMALL LETTER M+SEMANTIC TURNED
LATIN SMALL LETTER TURNED R WITH LONG LEG=LATIN SMALL LETTER R WITH LONG LEG+SEMANTIC TURNED
LATIN SMALL LETTER TURNED R=LATIN SMALL LETTER R+SEMANTIC TURNED
LATIN SMALL LETTER TURNED T=LATIN SMALL LETTER T+SEMANTIC TURNED
LATIN SMALL LETTER TURNED V=LATIN SMALL LETTER V+SEMANTIC TURNED
LATIN SMALL LETTER TURNED W=LATIN SMALL LETTER W+SEMANTIC TURNED
LATIN SMALL LETTER TURNED Y=LATIN SMALL LETTER Y+SEMANTIC TURNED
LEFT HALF BLACK CIRCLE=RIGHT HALF BLACK CIRCLE+SEMANTIC TURNED
LEFT TACK=RIGHT TACK+SEMANTIC TURNED
LEFTWARDS ARROW FROM BAR=RIGHTWARDS ARROW FROM BAR+SEMANTIC TURNED
LEFTWARDS ARROW TO BAR=RIGHTWARDS ARROW TO BAR+SEMANTIC TURNED
LEFTWARDS ARROW WITH TAIL=RIGHTWARDS ARROW WITH TAIL+SEMANTIC TURNED
LEFTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC TURNED
LEFTWARDS DASHED ARROW=RIGHTWARDS DASHED ARROW+SEMANTIC TURNED
LEFTWARDS HARPOON WITH BARB DOWNWARDS=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC TURNED
LEFTWARDS PAIRED ARROWS=RIGHTWARDS PAIRED ARROWS+SEMANTIC TURNED
LEFTWARDS SQUIGGLE ARROW=RIGHTWARDS SQUIGGLE ARROW+SEMANTIC TURNED
LEFTWARDS TRIPLE ARROW=RIGHTWARDS TRIPLE ARROW+SEMANTIC TURNED
LEFTWARDS TWO HEADED ARROW=RIGHTWARDS TWO HEADED ARROW+SEMANTIC TURNED
LEFTWARDS WAVE ARROW=RIGHTWARDS WAVE ARROW+SEMANTIC TURNED
LOGICAL AND=LOGICAL OR+SEMANTIC TURNED
LOWER LEFT QUADRANT CIRCULAR ARC=UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC TURNED
N-ARY COPRODUCT=N-ARY PRODUCT+SEMANTIC TURNED
NABLA=INCREMENT+SEMANTIC TURNED
OCR INVERTED FORK=OCR FORK+SEMANTIC TURNED
ORIGINAL OF=IMAGE OF+SEMANTIC TURNED
RIGHT HALF BLOCK=LEFT HALF BLOCK+SEMANTIC TURNED
SMALL CONTAINS AS MEMBER=SMALL ELEMENT OF+SEMANTIC TURNED
SOUTH WEST ARROW=NORTH EAST ARROW+SEMANTIC TURNED
SQUARE WITH LEFT HALF BLACK=SQUARE WITH RIGHT HALF BLACK+SEMANTIC TURNED
SUCCEEDS UNDER RELATION=PRECEDES UNDER RELATION+SEMANTIC TURNED
SUPERSET OF=SUBSET OF+SEMANTIC TURNED
THERE EXISTS=LATIN CAPITAL LETTER E+SEMANTIC TURNED
TURNED CAPITAL F=LATIN CAPITAL LETTER F+SEMANTIC TURNED
TURNED GREEK SMALL LETTER IOTA=GREEK SMALL LETTER IOTA+SEMANTIC TURNED
TURNED NOT SIGN=NOT SIGN+SEMANTIC TURNED
UNDERTIE=CHARACTER TIE+SEMANTIC TURNED

There are 2 characters with TURNED in the name (implicitly, anyway) that would not be coded directly with SEMANTIC TURNED: LATIN CAPITAL LETTER TURNED M and LATIN CAPITAL LETTER SCHWA. These are large turned versions of a lower-case character. Maybe they should be decomposed as

LATIN CAPITAL LETTER SCHWA=LATIN SMALL LETTER E+SEMANTIC LARGE+SEMANTIC TURNED
LATIN CAPITAL LETTER TURNED M=LATIN SMALL LETTER M+SEMANTIC LARGE+SEMANTIC TURNED

---or maybe these characters have a completely different origin?

SEMANTIC VARIANT

This is glyph modification. It´s a bit of a miscellany, but it has sound antecedents (e g, in TEX), and provided it is not overused, seems to be useful. One character is described as a variant of another if the shapes are similar, and if the reason for the similarity is related to the history of the character in some way. Often (e g, INVERTED LAZY S) this is explicitly stated in the text of the standard. Sometimes (e g, LATIN SMALL LETTER LONG S) it is obvious from the name of the character. Other times (e g, SMALL ELEMENT OF), it is just a fact that we happen to know.

CURVED STEM PARAGRAPH SIGN ORNAMENT=PILCROW SIGN+SEMANTIC VARIANT
CYRILLIC CAPITAL LETTER BASHKIR KA=CYRILLIC CAPITAL LETTER KA+SEMANTIC VARIANT
CYRILLIC CAPITAL LETTER GHE WITH UPTURN=CYRILLIC CAPITAL LETTER GHE+SEMANTIC VARIANT
CYRILLIC CAPITAL LETTER ROUND OMEGA=CYRILLIC CAPITAL LETTER O+SEMANTIC VARIANT
CYRILLIC CAPITAL LETTER STRAIGHT U=CYRILLIC CAPITAL LETTER U+SEMANTIC VARIANT
CYRILLIC SMALL LETTER BASHKIR KA=CYRILLIC SMALL LETTER KA+SEMANTIC VARIANT
CYRILLIC SMALL LETTER GHE WITH UPTURN=CYRILLIC SMALL LETTER GHE+SEMANTIC VARIANT
CYRILLIC SMALL LETTER ROUND OMEGA=CYRILLIC SMALL LETTER OMEGA+SEMANTIC VARIANT
CYRILLIC SMALL LETTER STRAIGHT U=CYRILLIC SMALL LETTER U+SEMANTIC VARIANT
EIGHT POINTED PINWHEEL STAR=PINWHEEL STAR+SEMANTIC VARIANT
FLORAL HEART=BLACK HEART SUIT+SEMANTIC VARIANT
GREEK BETA SYMBOL=GREEK SMALL LETTER BETA+SEMANTIC VARIANT
GREEK KAPPA SYMBOL=GREEK SMALL LETTER KAPPA+SEMANTIC VARIANT
GREEK LUNATE SIGMA SYMBOL=GREEK SMALL LETTER SIGMA+SEMANTIC VARIANT
GREEK PHI SYMBOL=GREEK SMALL LETTER PHI+SEMANTIC VARIANT
GREEK PI SYMBOL=GREEK SMALL LETTER PI+SEMANTIC VARIANT
GREEK RHO SYMBOL=GREEK SMALL LETTER RHO+SEMANTIC VARIANT
GREEK THETA SYMBOL=GREEK SMALL LETTER THETA+SEMANTIC VARIANT
GREEK UPSILON WITH HOOK SYMBOL=GREEK CAPITAL LETTER UPSILON+SEMANTIC VARIANT
HANGUL CHOSEONG CHITUEUMCHIEUCH=HANGUL CHOSEONG CHIEUCH+SEMANTIC VARIANT
HANGUL CHOSEONG CHITUEUMCIEUC=HANGUL CHOSEONG CIEUC+SEMANTIC VARIANT
HANGUL CHOSEONG CHITUEUMSIOS=HANGUL CHOSEONG SIOS+SEMANTIC VARIANT
HANGUL CHOSEONG PANSIOS=HANGUL CHOSEONG SIOS+SEMANTIC VARIANT
HANGUL JONGSEONG PANSIOS=HANGUL JONGSEONG SIOS+SEMANTIC VARIANT
HEBREW LETTER ALTERNATIVE AYIN=HEBREW LETTER AYIN+SEMANTIC VARIANT
HEBREW LETTER ALTERNATIVE PLUS SIGN=PLUS SIGN+SEMANTIC VARIANT
INVERTED LAZY S=TILDE OPERATOR+SEMANTIC VARIANT
LATIN CAPITAL LETTER B WITH TOPBAR=CYRILLIC CAPITAL LETTER BE+SEMANTIC VARIANT
LATIN CAPITAL LETTER OPEN E=LATIN CAPITAL LETTER E+SEMANTIC VARIANT
LATIN LETTER STRETCHED C=LATIN CAPITAL LETTER C+SEMANTIC VARIANT
LATIN SMALL LETTER B WITH TOPBAR=CYRILLIC SMALL LETTER BE+SEMANTIC VARIANT
LATIN SMALL LETTER LONG S=LATIN SMALL LETTER S+SEMANTIC VARIANT
LATIN SMALL LETTER R WITH FISHHOOK=LATIN SMALL LETTER R+SEMANTIC VARIANT
MODIFIER LETTER TRIANGULAR COLON=COLON+SEMANTIC VARIANT
ORNATE LEFT PARENTHESIS=LEFT PARENTHESIS+SEMANTIC VARIANT
ORNATE RIGHT PARENTHESIS=RIGHT PARENTHESIS+SEMANTIC VARIANT
PARTIAL DIFFERENTIAL=LATIN SMALL LETTER D+SEMANTIC VARIANT
SMALL ELEMENT OF=GREEK SMALL LETTER EPSILON+SEMANTIC VARIANT
TIBETAN MARK RIN CHEN SPUNGS SHAD=TIBETAN MARK SHAD+SEMANTIC VARIANT
TIGHT TRIFOLIATE SNOWFLAKE=SNOWFLAKE+SEMANTIC VARIANT

(Despite its name, LETTER ROUND OMEGA is a variant of LETTER O.) There are also some mathematical symbols that are best considered as glyph variants.

BROKEN BAR=VERTICAL LINE+SEMANTIC VARIANT
CURLY LOGICAL AND=LOGICAL AND+SEMANTIC VARIANT
CURLY LOGICAL OR=LOGICAL OR+SEMANTIC VARIANT
DIVISION SLASH=SOLIDUS+SEMANTIC VARIANT
PRECEDES=LESS-THAN SIGN+SEMANTIC VARIANT
RATIO=COLON+SEMANTIC VARIANT
SET MINUS=REVERSE SOLIDUS+SEMANTIC VARIANT
SQUARE CAP=INTERSECTION+SEMANTIC VARIANT
SQUARE CUP=UNION+SEMANTIC VARIANT
SQUARE IMAGE OF=SUBSET OF+SEMANTIC VARIANT
SQUARE ORIGINAL OF=SUPERSET OF+SEMANTIC VARIANT
SUCCEEDS=GREATER-THAN SIGN+SEMANTIC VARIANT
WHITE SQUARE WITH ROUNDED CORNERS=WHITE SQUARE+SEMANTIC VARIANT

Decomposition of the CURLY and SQUARE mathematical symbols as variants may seem capricious, but it is correct in 2 ways:

Cannot be done algorithmically: either you have a variant glyph, or you don´t.

Falling back to the base form is likely to give good results, except in specialised fields, so this is a desirable decomposition to encode.

SEMANTIC WHITE

We assume that the ordinary state for a character is to be ``black´´, as this is the colour of ink. Some characters---normally those with large solid regions---also exist in ``white´´ variants. This is a request for those characters to be used. Many characters have the word ``black´´ in their name. We just ignore this (except in those cases where it means HEAVY), claiming that it carries no semantic value apart from emphasis.

The following characters are white variants of others:

BLACK CENTRE WHITE STAR=OPEN CENTRE BLACK STAR+SEMANTIC WHITE
BLACK DIAMOND MINUS WHITE X=MULTIPLICATION SIGN+COMBINING ENCLOSING DIAMOND+SEMANTIC WHITE
BLACK SMILING FACE=WHITE SMILING FACE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT EIGHT=DIGIT EIGHT+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT FIVE=DIGIT FIVE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT FOUR=DIGIT FOUR+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT NINE=DIGIT NINE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT ONE=DIGIT ONE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT SEVEN=DIGIT SEVEN+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT SIX=DIGIT SIX+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT THREE=DIGIT THREE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED DIGIT TWO=DIGIT TWO+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DINGBAT NEGATIVE CIRCLED NUMBER TEN=START GROUP+DIGIT ONE+DIGIT ZERO+POP DIRECTIONAL FORMATTING+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE
DOWNWARDS WHITE ARROW=DOWNWARDS ARROW+SEMANTIC WHITE
INVERSE BULLET=BULLET+COMBINING ENCLOSING SQUARE+SEMANTIC WHITE
INVERSE WHITE CIRCLE=WHITE CIRCLE+COMBINING ENCLOSING SQUARE+SEMANTIC WHITE
LEFT WHITE CORNER BRACKET=LEFT CORNER BRACKET+SEMANTIC WHITE
LEFT WHITE LENTICULAR BRACKET=LEFT BLACK LENTICULAR BRACKET+SEMANTIC WHITE
LEFT WHITE SQUARE BRACKET=LEFT SQUARE BRACKET+SEMANTIC WHITE
LEFT WHITE TORTOISE SHELL BRACKET=LEFT TORTOISE SHELL BRACKET+SEMANTIC WHITE
LEFTWARDS WHITE ARROW=LEFTWARDS ARROW+SEMANTIC WHITE
RIGHT WHITE CORNER BRACKET=RIGHT CORNER BRACKET+SEMANTIC WHITE
RIGHT WHITE LENTICULAR BRACKET=RIGHT BLACK LENTICULAR BRACKET+SEMANTIC WHITE
RIGHT WHITE SQUARE BRACKET=RIGHT SQUARE BRACKET+SEMANTIC WHITE
RIGHT WHITE TORTOISE SHELL BRACKET=RIGHT TORTOISE SHELL BRACKET+SEMANTIC WHITE
RIGHTWARDS WHITE ARROW=RIGHTWARDS ARROW+SEMANTIC WHITE
UPWARDS WHITE ARROW FROM BAR=UPWARDS ARROW FROM BAR+SEMANTIC WHITE
UPWARDS WHITE ARROW=UPWARDS ARROW+SEMANTIC WHITE
WHITE BULLET=BULLET+SEMANTIC WHITE
WHITE CHESS BISHOP=BLACK CHESS BISHOP+SEMANTIC WHITE
WHITE CHESS KING=BLACK CHESS KING+SEMANTIC WHITE
WHITE CHESS KNIGHT=BLACK CHESS KNIGHT+SEMANTIC WHITE
WHITE CHESS PAWN=BLACK CHESS PAWN+SEMANTIC WHITE
WHITE CHESS QUEEN=BLACK CHESS QUEEN+SEMANTIC WHITE
WHITE CHESS ROOK=BLACK CHESS ROOK+SEMANTIC WHITE
WHITE CIRCLE=BLACK CIRCLE+SEMANTIC WHITE
WHITE CLUB SUIT=BLACK CLUB SUIT+SEMANTIC WHITE
WHITE DIAMOND SUIT=BLACK DIAMOND SUIT+SEMANTIC WHITE
WHITE DIAMOND=BLACK DIAMOND+SEMANTIC WHITE
WHITE DOWN-POINTING SMALL TRIANGLE=BLACK DOWN-POINTING SMALL TRIANGLE+SEMANTIC WHITE
WHITE DOWN-POINTING TRIANGLE=BLACK DOWN-POINTING TRIANGLE+SEMANTIC WHITE
WHITE FLORETTE=BLACK FLORETTE+SEMANTIC WHITE
WHITE FOUR POINTED STAR=BLACK FOUR POINTED STAR+SEMANTIC WHITE
WHITE HEART SUIT=BLACK HEART SUIT+SEMANTIC WHITE
WHITE LEFT POINTING INDEX=BLACK LEFT POINTING INDEX+SEMANTIC WHITE
WHITE LEFT-POINTING POINTER=BLACK LEFT-POINTING POINTER+SEMANTIC WHITE
WHITE LEFT-POINTING SMALL TRIANGLE=BLACK LEFT-POINTING SMALL TRIANGLE+SEMANTIC WHITE
WHITE LEFT-POINTING TRIANGLE=BLACK LEFT-POINTING TRIANGLE+SEMANTIC WHITE
WHITE NIB=BLACK NIB+SEMANTIC WHITE
WHITE PARALLELOGRAM=BLACK PARALLELOGRAM+SEMANTIC WHITE
WHITE RECTANGLE=BLACK RECTANGLE+SEMANTIC WHITE
WHITE RIGHT POINTING INDEX=BLACK RIGHT POINTING INDEX+SEMANTIC WHITE
WHITE RIGHT-POINTING POINTER=BLACK RIGHT-POINTING POINTER+SEMANTIC WHITE
WHITE RIGHT-POINTING SMALL TRIANGLE=BLACK RIGHT-POINTING SMALL TRIANGLE+SEMANTIC WHITE
WHITE RIGHT-POINTING TRIANGLE=BLACK RIGHT-POINTING TRIANGLE+SEMANTIC WHITE
WHITE SCISSORS=BLACK SCISSORS+SEMANTIC WHITE
WHITE SMALL SQUARE=BLACK SMALL SQUARE+SEMANTIC WHITE
WHITE SPADE SUIT=BLACK SPADE SUIT+SEMANTIC WHITE
WHITE SQUARE=BLACK SQUARE+SEMANTIC WHITE
WHITE STAR=BLACK STAR+SEMANTIC WHITE
WHITE SUN WITH RAYS=BLACK SUN WITH RAYS+SEMANTIC WHITE
WHITE TELEPHONE=BLACK TELEPHONE+SEMANTIC WHITE
WHITE UP-POINTING SMALL TRIANGLE=BLACK UP-POINTING SMALL TRIANGLE+SEMANTIC WHITE
WHITE UP-POINTING TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC WHITE
WHITE VERTICAL RECTANGLE=BLACK VERTICAL RECTANGLE+SEMANTIC WHITE

but not BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW, NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW, WHITE UP POINTING INDEX, because there are no black forms.

For SMILING FACE, we derive the black character from the white, which looks the ``wrong way round´´; but that´s because we assume it´s really (*)SMILING FEATURES+COMBINING ENCLOSING CIRCLE. The interaction of these is described later.

This could be done algorithmically, but it requires clever image processing capability: software could do something like surrounding the character with a thin black line, and then invert the interior of the region so delineated. It might be sufficient to convey the concept just to exchange black and white in a character cell, though this wouldn´t work if an attempt was made to use extended runs of white text.

I assume in the above that if a character in a COMBINING ENCLOSING CIRCLE is made white, the character itself becomes white, and the space between it and the circle becomes black.

It could be argued that there is no need for a ``double-struck´´ semantic, because the double-struck characters are just white versions of heavy ones: (*)SEMANTIC DOUBLE-STRUCK=SEMANTIC HEAVY+SEMANTIC WHITE? I am not going to argue that here though.

If used but not interpreted, unlikely to result in misinterpretation: a black symbol is likely to be a good stand-in for a white one.

Miscellaneous other forms

There are many other cases where 2 characters have been encoded separately because they differ in some way other than visual appearance. We give decompositions for these, since the Atomic Theory is concerned foremost with rendering.

We take a rigorous approach to this; we do not merge, e g, HYPHEN, MINUS, EN DASH or EM DASH, because they require separate consideration when designing a font. The Atomic Theory is universal: it is not a way of providing fallback glyphs for renderers that do not have them---that is just a consequence of its expression of the (perfectly real) relationships between characters. It is designed to be used by the highest-quality renderers without any compromise in the visual appearance of the output.

APL FUNCTIONAL SYMBOL ALPHA=GREEK SMALL LETTER ALPHA
APL FUNCTIONAL SYMBOL IOTA=GREEK SMALL LETTER IOTA
APL FUNCTIONAL SYMBOL OMEGA=GREEK SMALL LETTER OMEGA
APL FUNCTIONAL SYMBOL RHO=GREEK SMALL LETTER RHO
ARMENIAN COMMA=GRAVE ACCENT
ARMENIAN EMPHASIS MARK=ACUTE ACCENT
ARMENIAN FULL STOP=COLON
ARMENIAN MODIFIER LETTER LEFT HALF RING=MODIFIER LETTER LEFT HALF RING
ASSERTION=LEFT TACK
COMPLEMENT=LATIN LETTER STRETCHED C
DEVANAGARI ABBREVIATION SIGN=RING OPERATOR
DIGIT EIGHT FULL STOP=DIGIT EIGHT+FULL STOP
DIGIT FIVE FULL STOP=DIGIT FIVE+FULL STOP
DIGIT FOUR FULL STOP=DIGIT FOUR+FULL STOP
DIGIT NINE FULL STOP=DIGIT NINE+FULL STOP
DIGIT ONE FULL STOP=DIGIT ONE+FULL STOP
DIGIT SEVEN FULL STOP=DIGIT SEVEN+FULL STOP
DIGIT SIX FULL STOP=DIGIT SIX+FULL STOP
DIGIT THREE FULL STOP=DIGIT THREE+FULL STOP
DIGIT TWO FULL STOP=DIGIT TWO+FULL STOP
DITTO MARK=DOUBLE PRIME
DOT OPERATOR=MIDDLE DOT
DOUBLE PRIME QUOTATION MARK=DOUBLE PRIME
END OF PROOF=BLACK SQUARE
GREEK DASIA=SINGLE HIGH-REVERSED-9 QUOTATION MARK
GREEK DIALYTIKA AND OXIA=DIAERESIS+COMBINING VERTICAL LINE ABOVE
GREEK DIALYTIKA AND PERISPOMENI=DIAERESIS+COMBINING GREEK PERISPOMENI
GREEK DIALYTIKA AND VARIA=DIAERESIS+COMBINING GRAVE ACCENT
GREEK DIALYTIKA TONOS=DIAERESIS+COMBINING VERTICAL LINE ABOVE
GREEK KORONIS=RIGHT SINGLE QUOTATION MARK
GREEK NUMERAL SIGN=PRIME
GREEK PERISPOMENI=SMALL TILDE
GREEK PSILI=RIGHT SINGLE QUOTATION MARK
GREEK TONOS=MODIFIER LETTER VERTICAL LINE
GREEK YPOGEGRAMMENI=GREEK SMALL LETTER IOTA+SEMANTIC SUBSCRIPT
HYPHEN-MINUS=EN DASH
HYPHENATION POINT=MIDDLE DOT
IDEOGRAPHIC NUMBER ZERO=LARGE CIRCLE
INCREMENT=GREEK CAPITAL LETTER DELTA
KATAKANA MIDDLE DOT=MIDDLE DOT
KATAKANA-HIRAGANA PROLONGED SOUND MARK=EM DASH
LATIN CAPITAL LETTER AFRICAN D=LATIN CAPITAL LETTER D WITH STROKE
LATIN CAPITAL LETTER ETH=LATIN CAPITAL LETTER D WITH STROKE
LATIN CAPITAL LETTER ESH=GREEK CAPITAL LETTER SIGMA
LATIN CAPITAL LETTER GAMMA=LATIN SMALL LETTER GAMMA+SEMANTIC LARGE
LATIN CAPITAL LETTER IOTA=LATIN SMALL LETTER IOTA+SEMANTIC LARGE
LATIN CAPITAL LETTER UPSILON=LATIN SMALL LETTER UPSILON+SEMANTIC LARGE
LATIN LETTER BILABIAL CLICK=FISHEYE
LATIN LETTER DENTAL CLICK=VERTICAL LINE
LATIN LETTER RETROFLEX CLICK=EXCLAMATION MARK
LATIN SMALL LETTER A WITH RIGHT HALF RING=LATIN SMALL LETTER A+MODIFIER LETTER RIGHT HALF RING
LATIN SMALL LETTER ETH=LATIN SMALL LETTER D+SEMANTIC VARIANT+COMBINING SHORT SOLIDUS OVERLAY
LATIN SMALL LETTER GAMMA=GREEK SMALL LETTER GAMMA
LATIN SMALL LETTER IOTA=GREEK SMALL LETTER IOTA
LATIN SMALL LETTER OPEN E=GREEK SMALL LETTER EPSILON
LATIN SMALL LETTER PHI=GREEK SMALL LETTER PHI
LATIN SMALL LETTER RAMS HORN=LATIN SMALL LETTER GAMMA+SEMANTIC SMALL
LATIN SMALL LETTER UPSILON=GREEK SMALL LETTER UPSILON
LEFT CORNER BRACKET=LEFT CEILING
MEDIUM VERTICAL BAR=BLACK VERTICAL RECTANGLE
MODELS=TRUE
MODIFIER LETTER ACUTE ACCENT=ACUTE ACCENT
MODIFIER LETTER APOSTROPHE=RIGHT SINGLE QUOTATION MARK
MODIFIER LETTER DOUBLE PRIME=DOUBLE PRIME
MODIFIER LETTER DOWN ARROWHEAD=DOWN ARROWHEAD
MODIFIER LETTER GLOTTAL STOP=LATIN LETTER GLOTTAL STOP
MODIFIER LETTER GRAVE ACCENT=GRAVE ACCENT
MODIFIER LETTER HALF TRIANGULAR COLON=BLACK DOWN-POINTING SMALL TRIANGLE
MODIFIER LETTER LEFT ARROWHEAD=LESS-THAN SIGN
MODIFIER LETTER MACRON=MACRON
MODIFIER LETTER REVERSED COMMA=SINGLE HIGH-REVERSED-9 QUOTATION MARK
MODIFIER LETTER REVERSED GLOTTAL STOP=LATIN LETTER PHARYNGEAL VOICED FRICATIVE
MODIFIER LETTER RIGHT ARROWHEAD=GREATER-THAN SIGN
MODIFIER LETTER TURNED COMMA=LEFT SINGLE QUOTATION MARK
MODIFIER LETTER UP ARROWHEAD=UP ARROWHEAD
NUMBER EIGHTEEN FULL STOP=DIGIT ONE+DIGIT EIGHT+FULL STOP
NUMBER ELEVEN FULL STOP=DIGIT ONE+DIGIT ONE+FULL STOP
NUMBER FIFTEEN FULL STOP=DIGIT ONE+DIGIT FIVE+FULL STOP
NUMBER FOURTEEN FULL STOP=DIGIT ONE+DIGIT FOUR+FULL STOP
NUMBER NINETEEN FULL STOP=DIGIT ONE+DIGIT NINE+FULL STOP
NUMBER SEVENTEEN FULL STOP=DIGIT ONE+DIGIT SEVEN+FULL STOP
NUMBER SIXTEEN FULL STOP=DIGIT ONE+DIGIT SIX+FULL STOP
NUMBER TEN FULL STOP=DIGIT ONE+DIGIT ZERO+FULL STOP
NUMBER THIRTEEN FULL STOP=DIGIT ONE+DIGIT THREE+FULL STOP
NUMBER TWELVE FULL STOP=DIGIT ONE+DIGIT TWO+FULL STOP
NUMBER TWENTY FULL STOP=DIGIT TWO+DIGIT ZERO+FULL STOP
OHM SIGN=GREEK CAPITAL LETTER OMEGA
RIGHT CORNER BRACKET=RIGHT FLOOR
SINGLE LOW-9 QUOTATION MARK=COMMA
STAR OPERATOR=ARABIC FIVE POINTED STAR
TIBETAN SIGN RDEL DKAR GNYIS=TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL DKAR GCIG
TIBETAN SIGN RDEL DKAR GSUM=TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL DKAR GCIG
TIBETAN SIGN RDEL DKAR RDEL NAG=TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL NAG GCIG
TIBETAN SIGN RDEL NAG GNYIS=TIBETAN SIGN RDEL NAG GCIG+TIBETAN SIGN RDEL NAG GCIG
TIBETAN VOWEL SIGN VOCALIC LL=TIBETAN SUBJOINED LETTER LA+TIBETAN VOWEL SIGN AA+TIBETAN VOWEL SIGN REVERSED I
TIBETAN VOWEL SIGN VOCALIC RR=TIBETAN SUBJOINED LETTER RA+TIBETAN VOWEL SIGN AA+TIBETAN VOWEL SIGN REVERSED I
WAVE DASH=TILDE OPERATOR

New uses for existing characters

COMBINING COMMA BELOW

These diacritics should take the comma accent, not the cedilla form (except where the design of the cedilla is such that it may serve as either accent---i e, an unatttached cedilla with a shallow curve). These characters are misnamed in Unicode due to an early failure to distinguish between the 2 accent marks.

LATIN CAPITAL LETTER G WITH CEDILLA=LATIN CAPITAL LETTER G+COMBINING COMMA BELOW
LATIN CAPITAL LETTER K WITH CEDILLA=LATIN CAPITAL LETTER K+COMBINING COMMA BELOW
LATIN CAPITAL LETTER L WITH CEDILLA=LATIN CAPITAL LETTER L+COMBINING COMMA BELOW
LATIN CAPITAL LETTER N WITH CEDILLA=LATIN CAPITAL LETTER N+COMBINING COMMA BELOW
LATIN CAPITAL LETTER R WITH CEDILLA=LATIN CAPITAL LETTER R+COMBINING COMMA BELOW
LATIN SMALL LETTER G WITH CEDILLA=LATIN SMALL LETTER G+COMBINING COMMA BELOW
LATIN SMALL LETTER K WITH CEDILLA=LATIN SMALL LETTER K+COMBINING COMMA BELOW
LATIN SMALL LETTER L WITH CEDILLA=LATIN SMALL LETTER L+COMBINING COMMA BELOW
LATIN SMALL LETTER N WITH CEDILLA=LATIN SMALL LETTER N+COMBINING COMMA BELOW
LATIN SMALL LETTER R WITH CEDILLA=LATIN SMALL LETTER R+COMBINING COMMA BELOW

COMBINING CYRILLIC TITLO

This decomposition is missing for historical reasons.

CYRILLIC CAPITAL LETTER OMEGA WITH TITLO=CYRILLIC CAPITAL LETTER OMEGA+COMBINING CYRILLIC TITLO
CYRILLIC SMALL LETTER OMEGA WITH TITLO=CYRILLIC SMALL LETTER OMEGA+COMBINING CYRILLIC TITLO

COMBINING DOT ABOVE

The following are frankly silly, though visually appealing.

LATIN SMALL LETTER I=LATIN SMALL LETTER DOTLESS I+COMBINING DOT ABOVE
LATIN SMALL LETTER J=LATIN SMALL LETTER DOTLESS J+COMBINING DOT ABOVE

Having done this, we are then obliged to respecify most existing decompositions with LATIN SMALL LETTER I/J to use the dotless form. This lets rendering software ignore the requirement of ``dot removal´´ when drawing these characters.

LATIN SMALL LETTER I WITH ACUTE=LATIN SMALL LETTER DOTLESS I+COMBINING ACUTE ACCENT
LATIN SMALL LETTER I WITH BREVE=LATIN SMALL LETTER DOTLESS I+COMBINING BREVE
LATIN SMALL LETTER I WITH CARON=LATIN SMALL LETTER DOTLESS I+COMBINING CARON
LATIN SMALL LETTER I WITH CIRCUMFLEX=LATIN SMALL LETTER DOTLESS I+COMBINING CIRCUMFLEX ACCENT
LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE=LATIN SMALL LETTER DOTLESS I+COMBINING DIAERESIS+COMBINING ACUTE ACCENT
LATIN SMALL LETTER I WITH DIAERESIS=LATIN SMALL LETTER DOTLESS I+COMBINING DIAERESIS
LATIN SMALL LETTER I WITH DOUBLE GRAVE=LATIN SMALL LETTER DOTLESS I+COMBINING DOUBLE GRAVE ACCENT
LATIN SMALL LETTER I WITH GRAVE=LATIN SMALL LETTER DOTLESS I+COMBINING GRAVE ACCENT
LATIN SMALL LETTER I WITH HOOK ABOVE=LATIN SMALL LETTER DOTLESS I+SEMANTIC ABOVE+START GROUP+LATIN LETTER GLOTTAL STOP+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
LATIN SMALL LETTER I WITH INVERTED BREVE=LATIN SMALL LETTER DOTLESS I+COMBINING INVERTED BREVE
LATIN SMALL LETTER I WITH MACRON=LATIN SMALL LETTER DOTLESS I+COMBINING MACRON
LATIN SMALL LETTER I WITH TILDE=LATIN SMALL LETTER DOTLESS I+COMBINING TILDE
LATIN SMALL LETTER J WITH CARON=LATIN SMALL LETTER DOTLESS J+COMBINING CARON
LATIN SMALL LETTER J WITH CIRCUMFLEX=LATIN SMALL LETTER DOTLESS J+COMBINING CIRCUMFLEX ACCENT

COMBINING ENCLOSING CIRCLE

Everything marked as <circle> (there are 197 of these) should be modified to a canonical decomposition involving COMBINING ENCLOSING CIRCLE, as well as

CIRCLED ASTERISK OPERATOR=<circle>+ASTERISK OPERATOR
CIRCLED DASH=<circle>+EN DASH
CIRCLED DIVISION SLASH=<circle>+DIVISION SLASH
CIRCLED DOT OPERATOR=<circle>+DOT OPERATOR
CIRCLED EQUALS=<circle>+EQUALS SIGN
CIRCLED HEAVY WHITE RIGHTWARDS ARROW=RIGHTWARDS WHITE ARROW+SEMANTIC HEAVY+COMBINING ENCLOSING CIRCLE
CIRCLED MINUS=<circle>+MINUS SIGN
CIRCLED PLUS=<circle>+PLUS SIGN
CIRCLED POSTAL MARK=<circle>+POSTAL MARK
CIRCLED RING OPERATOR=<circle>+RING OPERATOR
CIRCLED TIMES=<circle>+MULTIPLICATION SIGN
CIRCLED WHITE STAR=<circle>+WHITE STAR
COMMERCIAL AT=LATIN SMALL LETTER A+SEMANTIC SCRIPT+COMBINING ENCLOSING CIRCLE
COPYRIGHT SIGN=LATIN CAPITAL LETTER C+COMBINING ENCLOSING CIRCLE+SEMANTIC SUPERSCRIPT
REGISTERED SIGN=LATIN CAPITAL LETTER R+COMBINING ENCLOSING CIRCLE+SEMANTIC SUPERSCRIPT
SOUND RECORDING COPYRIGHT=LATIN CAPITAL LETTER P+COMBINING ENCLOSING CIRCLE+SEMANTIC SUPERSCRIPT

COMBINING ENCLOSING SQUARE

Just used for 5 characters.

SQUARED DOT OPERATOR=DOT OPERATOR+COMBINING ENCLOSING SQUARE
SQUARED MINUS=MINUS SIGN+COMBINING ENCLOSING SQUARE
SQUARED PLUS=PLUS SIGN+COMBINING ENCLOSING SQUARE
SQUARED TIMES=MULTIPLICATION SIGN+COMBINING ENCLOSING SQUARE
IDEOGRAPHIC HALF FILL SPACE=SALTIRE+COMBINING ENCLOSING SQUARE

Characters marked as <square> are not enclosed in a square, they are just rendered as is. I suppose that if <square> was replaced by <compat> throughout, or just deleted (thereby making the composition canonical), no-one would notice. This would add 194 canonical decompositions.

COMBINING LONG SOLIDUS OVERLAY

This is used to ``cross things out´´, and also in

EMPTY SET=DIGIT ZERO+COMBINING LONG SOLIDUS OVERLAY
NOT TILDE=TILDE OPERATOR+COMBINING LONG SOLIDUS OVERLAY
RESPONSE=LATIN CAPITAL LETTER R+COMBINING LONG SOLIDUS OVERLAY
VERSICLE=LATIN CAPITAL LETTER V+COMBINING LONG SOLIDUS OVERLAY

COMBINING LONG VERTICAL LINE OVERLAY

Seen in

WHITE SQUARE WITH VERTICAL BISECTING LINE=WHITE SQUARE+COMBINING LONG VERTICAL LINE OVERLAY

COMBINING PALATALIZED HOOK BELOW

The following decomposition is missing for historical reasons only.

LATIN SMALL LETTER T WITH PALATAL HOOK=LATIN SMALL LETTER T+COMBINING PALATALIZED HOOK BELOW

The following are missing in order to ensure the rendering process does not apply the hook to the wrong leg of the letter n (which would produce a lower case eng). We can simply note that a renderer had better get it right.

LATIN CAPITAL LETTER N WITH LEFT HOOK=LATIN CAPITAL LETTER N+COMBINING PALATALIZED HOOK BELOW
LATIN SMALL LETTER N WITH LEFT HOOK=LATIN SMALL LETTER N+COMBINING PALATALIZED HOOK BELOW

COMBINING RETROFLEX HOOK BELOW

Some decompositions involving this character are also missing, for historical reasons.

LATIN CAPITAL LETTER T WITH RETROFLEX HOOK=LATIN CAPITAL LETTER T+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER D WITH TAIL=LATIN SMALL LETTER D+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER EZH WITH TAIL=LATIN SMALL LETTER EZH+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER L WITH RETROFLEX HOOK=LATIN SMALL LETTER L+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER N WITH RETROFLEX HOOK=LATIN SMALL LETTER N+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER R WITH TAIL=LATIN SMALL LETTER R+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER S WITH HOOK=LATIN SMALL LETTER S+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER SQUAT REVERSED ESH=LATIN SMALL LETTER REVERSED R WITH FISHHOOK+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER T WITH RETROFLEX HOOK=LATIN SMALL LETTER T+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER TURNED R WITH HOOK=LATIN SMALL LETTER TURNED R+COMBINING RETROFLEX HOOK BELOW
LATIN SMALL LETTER Z WITH RETROFLEX HOOK=LATIN SMALL LETTER Z+COMBINING RETROFLEX HOOK BELOW

The decomposition of LATIN SMALL LETTER EZH WITH TAIL is based on appearance, but that´s allowed for combining characters.

COMBINING RING OVERLAY

This should be used to decompose

CONTOUR INTEGRAL=INTEGRAL+COMBINING RING OVERLAY
SURFACE INTEGRAL=DOUBLE INTEGRAL+COMBINING RING OVERLAY
VOLUME INTEGRAL=TRIPLE INTEGRAL+COMBINING RING OVERLAY

Also related are

ANTICLOCKWISE CONTOUR INTEGRAL=INTEGRAL+COMBINING ANTICLOCKWISE RING OVERLAY
CLOCKWISE CONTOUR INTEGRAL=INTEGRAL+COMBINING CLOCKWISE RING OVERLAY
CLOCKWISE INTEGRAL=INTEGRAL+COMBINING CLOCKWISE ARROW ABOVE

COMBINING LONG STROKE OVERLAY

The Tibetan half-numbers are overprints with something that looks like like a long stroke---I assume the curve and hook of the stroke are to do with the font, not semantic.

LATIN CAPITAL LETTER H WITH STROKE=LATIN CAPITAL LETTER H+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF EIGHT=TIBETAN DIGIT EIGHT+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF FIVE=TIBETAN DIGIT FIVE+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF FOUR=TIBETAN DIGIT FOUR+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF NINE=TIBETAN DIGIT NINE+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF ONE=TIBETAN DIGIT ONE+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF SEVEN=TIBETAN DIGIT SEVEN+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF SIX=TIBETAN DIGIT SIX+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF THREE=TIBETAN DIGIT THREE+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF TWO=TIBETAN DIGIT TWO+COMBINING LONG STROKE OVERLAY
TIBETAN DIGIT HALF ZERO=TIBETAN DIGIT ZERO+COMBINING LONG STROKE OVERLAY

COMBINING SHORT SOLIDUS OVERLAY

Many of the characters described as ``with stroke´´ could be provided with decompositions using this character. The list is

BLANK SYMBOL=LATIN SMALL LETTER B+COMBINING SHORT SOLIDUS OVERLAY
LATIN CAPITAL LETTER L WITH STROKE=LATIN CAPITAL LETTER L+COMBINING SHORT SOLIDUS OVERLAY
LATIN CAPITAL LETTER O WITH STROKE AND ACUTE=LATIN CAPITAL LETTER O WITH ACUTE+COMBINING SHORT SOLIDUS OVERLAY
LATIN CAPITAL LETTER O WITH STROKE=LATIN CAPITAL LETTER O+COMBINING SHORT SOLIDUS OVERLAY
LATIN SMALL LETTER L WITH STROKE=LATIN SMALL LETTER L+COMBINING SHORT SOLIDUS OVERLAY
LATIN SMALL LETTER LAMBDA WITH STROKE=GREEK SMALL LETTER LAMDA+COMBINING SHORT SOLIDUS OVERLAY
LATIN SMALL LETTER O WITH STROKE AND ACUTE=LATIN SMALL LETTER O WITH ACUTE+COMBINING SHORT SOLIDUS OVERLAY
LATIN SMALL LETTER O WITH STROKE=LATIN SMALL LETTER O+COMBINING SHORT SOLIDUS OVERLAY
LEFT RIGHT ARROW WITH STROKE=LEFT RIGHT ARROW+COMBINING SHORT SOLIDUS OVERLAY
LEFT RIGHT DOUBLE ARROW WITH STROKE=LEFT RIGHT DOUBLE ARROW+COMBINING SHORT SOLIDUS OVERLAY
LEFTWARDS ARROW WITH STROKE=LEFTWARDS ARROW+COMBINING SHORT SOLIDUS OVERLAY
LEFTWARDS DOUBLE ARROW WITH STROKE=LEFTWARDS DOUBLE ARROW+COMBINING SHORT SOLIDUS OVERLAY
ORTHODOX CROSS=CROSS OF LORRAINE+COMBINING SHORT SOLIDUS OVERLAY
RIGHTWARDS ARROW WITH STROKE=RIGHTWARDS ARROW+COMBINING SHORT SOLIDUS OVERLAY
RIGHTWARDS DOUBLE ARROW WITH STROKE=RIGHTWARDS DOUBLE ARROW+COMBINING SHORT SOLIDUS OVERLAY

I suppose making this suggestion would result in howls of outrage from people whose alphabets contain these characters, as, e g, ``O WITH STROKE´´ is a letter in its own right, not a composed character, in these alphabets. There are 3 points in favour of making it a composite character though

Most other characters ``with stroke´´ are encoded with COMBINING SHORT STROKE OVERLAY.

COMBINING SHORT STROKE OVERLAY

The situation here is similar to the one for COMBINING SHORT SOLIDUS OVERLAY: a lot of characters described as ``with stroke´´, ``with bar´´, ``with quill´´, ``barred´´ or ``bar´´ could be provided with decompositions using this character.

CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS=CYRILLIC CAPITAL LETTER BARRED O+COMBINING DIAERESIS
CYRILLIC CAPITAL LETTER BARRED O=CYRILLIC CAPITAL LETTER O+COMBINING SHORT STROKE OVERLAY
CYRILLIC CAPITAL LETTER GHE WITH STROKE=CYRILLIC CAPITAL LETTER GHE+COMBINING SHORT STROKE OVERLAY
CYRILLIC CAPITAL LETTER KA WITH STROKE=CYRILLIC CAPITAL LETTER KA+COMBINING SHORT STROKE OVERLAY
CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE=CYRILLIC CAPITAL LETTER STRAIGHT U+COMBINING SHORT STROKE OVERLAY
CYRILLIC CAPITAL LETTER YAT=CYRILLIC CAPITAL LETTER SOFT SIGN+COMBINING SHORT STROKE OVERLAY
CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS=CYRILLIC SMALL LETTER BARRED O+COMBINING DIAERESIS
CYRILLIC SMALL LETTER BARRED O=CYRILLIC SMALL LETTER O+COMBINING SHORT STROKE OVERLAY
CYRILLIC SMALL LETTER GHE WITH STROKE=CYRILLIC SMALL LETTER GHE+COMBINING SHORT STROKE OVERLAY
CYRILLIC SMALL LETTER KA WITH STROKE=CYRILLIC SMALL LETTER KA+COMBINING SHORT STROKE OVERLAY
CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE=CYRILLIC SMALL LETTER STRAIGHT U+COMBINING SHORT STROKE OVERLAY
CYRILLIC SMALL LETTER YAT=CYRILLIC SMALL LETTER SOFT SIGN+COMBINING SHORT STROKE OVERLAY
LATIN CAPITAL LETTER D WITH STROKE=LATIN CAPITAL LETTER D+COMBINING SHORT STROKE OVERLAY
LATIN CAPITAL LETTER G WITH STROKE=LATIN CAPITAL LETTER G+COMBINING SHORT STROKE OVERLAY
LATIN CAPITAL LETTER I WITH STROKE=LATIN CAPITAL LETTER I+COMBINING SHORT STROKE OVERLAY
LATIN CAPITAL LETTER T WITH STROKE=LATIN CAPITAL LETTER T+COMBINING SHORT STROKE OVERLAY
LATIN CAPITAL LETTER Z WITH STROKE=LATIN CAPITAL LETTER Z+COMBINING SHORT STROKE OVERLAY
LATIN LETTER GLOTTAL STOP WITH STROKE=LATIN LETTER GLOTTAL STOP+COMBINING SHORT STROKE OVERLAY
LATIN LETTER TWO WITH STROKE=DIGIT TWO+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER B WITH STROKE=LATIN SMALL LETTER B+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER BARRED O=LATIN SMALL LETTER O+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER D WITH STROKE=LATIN SMALL LETTER D+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER G WITH STROKE=LATIN SMALL LETTER G+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER H WITH STROKE=LATIN SMALL LETTER H+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER I WITH STROKE=LATIN SMALL LETTER I+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER L WITH BAR=LATIN SMALL LETTER L+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER T WITH STROKE=LATIN SMALL LETTER T+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER U BAR=LATIN SMALL LETTER U+COMBINING SHORT STROKE OVERLAY
LATIN SMALL LETTER Z WITH STROKE=LATIN SMALL LETTER Z+COMBINING SHORT STROKE OVERLAY
LEFT SQUARE BRACKET WITH QUILL=LEFT SQUARE BRACKET+COMBINING SHORT STROKE OVERLAY
RIGHT SQUARE BRACKET WITH QUILL=RIGHT SQUARE BRACKET+COMBINING SHORT STROKE OVERLAY

Also, some characters have a ``double stroke´´.

DOWNWARDS ARROW WITH DOUBLE STROKE=DOWNWARDS ARROW+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
LATIN LETTER ALVEOLAR CLICK=VERTICAL LINE+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
UPWARDS ARROW WITH DOUBLE STROKE=UPWARDS ARROW+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING

Although some algorithmic sofistikashun would be required to get the bar in exactly the right place, in practice it might be well enough to just go ahead and overprint it, with maybe a few special cases for instances where it is in an unusual position (e g, LATIN SMALL LETTER G WITH STROKE).

COMBINING SHORT VERTICAL LINE OVERLAY

This might have a few uses apart from the 4 cyrillic characters ``with vertical stroke´´ which could be composed from this character.

CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE=CYRILLIC CAPITAL LETTER CHE+COMBINING SHORT VERTICAL LINE OVERLAY
CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE=CYRILLIC CAPITAL LETTER KA+COMBINING SHORT VERTICAL LINE OVERLAY
CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE=CYRILLIC SMALL LETTER CHE+COMBINING SHORT VERTICAL LINE OVERLAY
CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE=CYRILLIC SMALL LETTER KA+COMBINING SHORT VERTICAL LINE OVERLAY

COMBINING TILDE OVERLAY

Used in

LATIN CAPITAL LETTER O WITH MIDDLE TILDE=LATIN CAPITAL LETTER O+COMBINING TILDE OVERLAY
LATIN SMALL LETTER L WITH MIDDLE TILDE=LATIN SMALL LETTER L+COMBINING TILDE OVERLAY

COMBINING VERTICAL LINE BELOW

This seems to be another case where some decompositions have been omitted in order to ensure that the rendering process does not put the mark in the wrong place. Caveat implementor!

LATIN SMALL LETTER N WITH LONG RIGHT LEG=LATIN SMALL LETTER N+COMBINING VERTICAL LINE BELOW
LATIN SMALL LETTER R WITH LONG LEG=LATIN SMALL LETTER R+COMBINING VERTICAL LINE BELOW
LATIN SMALL LETTER TURNED M WITH LONG LEG=LATIN SMALL LETTER TURNED M+COMBINING VERTICAL LINE BELOW

FRACTION SLASH

This is an existing character, but we give it more precise semantics by specifying that it lies between 1 character or group on the left, and 1 on the right. In other words, it is ``binary´´, just like SEMANTICS LIGATURE, COMPOSE, ABOVE and BELOW. It is used in all the decompositions marked with <fraction> (there are 16 of these), and the following:

ACCOUNT OF=LATIN SMALL LETTER A+FRACTION SLASH+LATIN SMALL LETTER C
ADDRESSED TO THE SUBJECT=LATIN SMALL LETTER A+FRACTION SLASH+LATIN SMALL LETTER S
CADA UNA=LATIN SMALL LETTER C+FRACTION SLASH+LATIN SMALL LETTER U
CARE OF=LATIN SMALL LETTER C+FRACTION SLASH+LATIN SMALL LETTER O
FRACTION NUMERATOR ONE=DIGIT ONE+FRACTION SLASH
PER MILLE SIGN=DIGIT ZERO+FRACTION SLASH+START GROUP+DIGIT ZERO+DIGIT ZERO+POP DIRECTIONAL FORMATTING
PER TEN THOUSAND SIGN=DIGIT ZERO+FRACTION SLASH+START GROUP+DIGIT ZERO+DIGIT ZERO+DIGIT ZERO+POP DIRECTIONAL FORMATTING
PERCENT SIGN=DIGIT ZERO+FRACTION SLASH+DIGIT ZERO
VULGAR FRACTION FIVE EIGHTHS=DIGIT FIVE+FRACTION SLASH+DIGIT EIGHT
VULGAR FRACTION FIVE SIXTHS=DIGIT FIVE+FRACTION SLASH+DIGIT SIX
VULGAR FRACTION FOUR FIFTHS=DIGIT FOUR+FRACTION SLASH+DIGIT FIVE
VULGAR FRACTION ONE EIGHTH=DIGIT ONE+FRACTION SLASH+DIGIT EIGHT
VULGAR FRACTION ONE FIFTH=DIGIT ONE+FRACTION SLASH+DIGIT FIVE
VULGAR FRACTION ONE HALF=DIGIT ONE+FRACTION SLASH+DIGIT TWO
VULGAR FRACTION ONE QUARTER=DIGIT ONE+FRACTION SLASH+DIGIT FOUR
VULGAR FRACTION ONE SIXTH=DIGIT ONE+FRACTION SLASH+DIGIT SIX
VULGAR FRACTION ONE THIRD=DIGIT ONE+FRACTION SLASH+DIGIT THREE
VULGAR FRACTION SEVEN EIGHTHS=DIGIT SEVEN+FRACTION SLASH+DIGIT EIGHT
VULGAR FRACTION THREE EIGHTHS=DIGIT THREE+FRACTION SLASH+DIGIT EIGHT
VULGAR FRACTION THREE FIFTHS=DIGIT THREE+FRACTION SLASH+DIGIT FIVE
VULGAR FRACTION THREE QUARTERS=DIGIT THREE+FRACTION SLASH+DIGIT FOUR
VULGAR FRACTION TWO FIFTHS=DIGIT TWO+FRACTION SLASH+DIGIT FIVE
VULGAR FRACTION TWO THIRDS=DIGIT TWO+FRACTION SLASH+DIGIT THREE

A sophisticated rendering agent is explicitly allowed to stack the top and bottom of a fraction over each other (maybe varying their size as well), and use a horizontal or oblique rule to represent the division. This is because a decomposition like VULGAR FRACTION ONE QUARTER=DIGIT ONE+FRACTION SLASH+DIGIT FOUR is canonical, so it is permitted (but not required) to use a special glyph, such as would be present in a Latin-1 font.

LEFT-TO-RIGHT OVERRIDE

Some Hebrew characters are used in mathematical text. These have obvious decompositions which should be encoded. Doing this will enable mathematicians to use any other Hebrew characters as symbols (by using the decomposition) without needing to get them encoded in the U C S first.

ALEF SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER ALEF+POP DIRECTIONAL FORMATTING
BET SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER BET+POP DIRECTIONAL FORMATTING
DALET SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER DALET+POP DIRECTIONAL FORMATTING
GIMEL SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER GIMEL+POP DIRECTIONAL FORMATTING

ZERO WIDTH JOINER, ZERO WIDTH NON-JOINER

All compatibility decompositions involving <isolated>, <initial>, <medial> and <final> could be replaced by canonical one involving the characters which already exist for this purpose.

My knowledge of the languages that use these characters is almost 0, but I think the following are required to work according to the existing standard.

Any character with a compatibility decomposition including <isolated> gets a canonical decomposition by deleting the <isolated>, and adding ZERO WIDTH NON-JOINER´s at the start and end.

Any character with a compatibility decomposition including <initial> gets a canonical decomposition by deleting the <initial>, and adding a ZERO WIDTH NON-JOINER at the start, and a ZERO WIDTH JOINER at the end.

Any character with a compatibility decomposition including <medial> gets a canonical decomposition by deleting the <medial>, and adding a ZERO WIDTH JOINER at the start, and a ZERO WIDTH JOINER at the end.

Any character with a compatibility decomposition including <final> gets a canonical decomposition by deleting the <final>, and adding a ZERO WIDTH JOINER at the start, and a ZERO WIDTH NON-JOINER at the end.

There are also a few other characters that can be treated in this way. Although there are 5 special FINAL forms for Hebrew, these are not really required: it is just as easy for a user interface to insert a ZERO WIDTH JOINER+HEBREW LETTER KAF+ZERO WIDTH NON-JOINER as it is to insert a HEBREW LETTER FINAL KAF, for whatever reason, and both have to work anyway. We also encode GREEK SMALL LETTER FINAL SIGMA in the same way, in case a renderer chooses to treat this combination specially. (If it does, users must be ready to write a word like Eros as GREEK CAPITAL LETTER ETA+GREEK SMALL LETTER RHO+GREEK SMALL LETTER OMICRON+GREEK SMALL LETTER SIGMA+ZERO WIDTH JOINER, unless they really do want to see the final form of sigma.) Admittedly, a typical user might be very surprised to see a final character changing shape before their eyes, but then, that is what it´s for.

There certainly seems little excuse to omit the decompositions in languages like Arabic and Hebrew where it is expected.

ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA INITIAL FORM=<initial>+ARABIC LETTER ALEF MAKSURA
ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA MEDIAL FORM=<medial>+ARABIC LETTER ALEF MAKSURA
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA FINAL FORM=<final>+ARABIC LETTER YEH WITH HAMZA ABOVE+ARABIC LETTER ALEF MAKSURA
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM=<initial>+ARABIC LETTER YEH WITH HAMZA ABOVE+ARABIC LETTER ALEF MAKSURA
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM=<isolated>+ARABIC LETTER YEH WITH HAMZA ABOVE+ARABIC LETTER ALEF MAKSURA
GREEK SMALL LETTER FINAL SIGMA=<final>+GREEK SMALL LETTER SIGMA
HEBREW LETTER FINAL KAF=<final>+HEBREW LETTER KAF
HEBREW LETTER FINAL MEM=<final>+HEBREW LETTER MEM
HEBREW LETTER FINAL NUN=<final>+HEBREW LETTER NUN
HEBREW LETTER FINAL PE=<final>+HEBREW LETTER PE
HEBREW LETTER FINAL TSADI=<final>+HEBREW LETTER TSADI

We also have to replace the decompositions for the combining marks (starting with SPACE), or we get unnecessary extra space characters.

ARABIC DAMMA ISOLATED FORM=<isolated>+ARABIC DAMMA
ARABIC DAMMATAN ISOLATED FORM=<isolated>+ARABIC DAMMATAN
ARABIC FATHA ISOLATED FORM=<isolated>+ARABIC FATHA
ARABIC FATHATAN ISOLATED FORM=<isolated>+ARABIC FATHATAN
ARABIC KASRA ISOLATED FORM=<isolated>+ARABIC KASRA
ARABIC KASRATAN ISOLATED FORM=<isolated>+ARABIC KASRATAN
ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC DAMMA
ARABIC LIGATURE SHADDA WITH DAMMA MEDIAL FORM=<medial>+ARABIC SHADDA+ARABIC DAMMA
ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC DAMMATAN
ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC FATHA
ARABIC LIGATURE SHADDA WITH FATHA MEDIAL FORM=<medial>+ARABIC SHADDA+ARABIC FATHA
ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC KASRA
ARABIC LIGATURE SHADDA WITH KASRA MEDIAL FORM=<medial>+ARABIC SHADDA+ARABIC KASRA
ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC KASRATAN
ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC LETTER SUPERSCRIPT ALEF
ARABIC SHADDA ISOLATED FORM=<isolated>+ARABIC SHADDA
ARABIC SUKUN ISOLATED FORM=<isolated>+ARABIC SUKUN

New combining characters

COMBINING HOOK

Many characters are described as ``with hook´´ or ``with middle hook´´, but no combining form of this mark is encoded. This is probably because the position of the hook moves around a lot depending on which character is to receive it, and because there are a few different forms of hook, 3 of which are encoded separately and were considered above. The fact that the hook moves around should be seen as a rendering problem, easily solved by a repository of precomposed glyphs for the cases that are actually used.

If there was to be a COMBINING HOOK character, the characters that use it would be

CYRILLIC CAPITAL LETTER EN WITH HOOK=CYRILLIC CAPITAL LETTER EN+COMBINING HOOK
CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK=CYRILLIC CAPITAL LETTER GHE+COMBINING HOOK
CYRILLIC CAPITAL LETTER KA WITH HOOK=CYRILLIC CAPITAL LETTER KA+COMBINING HOOK
CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK=CYRILLIC CAPITAL LETTER PE+COMBINING HOOK
CYRILLIC SMALL LETTER EN WITH HOOK=CYRILLIC SMALL LETTER EN+COMBINING HOOK
CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK=CYRILLIC SMALL LETTER GHE+COMBINING HOOK
CYRILLIC SMALL LETTER KA WITH HOOK=CYRILLIC SMALL LETTER KA+COMBINING HOOK
CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK=CYRILLIC SMALL LETTER PE+COMBINING HOOK
EIGHTH NOTE=QUARTER NOTE+COMBINING HOOK
LATIN CAPITAL LETTER B WITH HOOK=LATIN CAPITAL LETTER B+COMBINING HOOK
LATIN CAPITAL LETTER C WITH HOOK=LATIN CAPITAL LETTER C+COMBINING HOOK
LATIN CAPITAL LETTER D WITH HOOK=LATIN CAPITAL LETTER D+COMBINING HOOK
LATIN CAPITAL LETTER F WITH HOOK=LATIN CAPITAL LETTER F+COMBINING HOOK
LATIN CAPITAL LETTER G WITH HOOK=LATIN CAPITAL LETTER G+COMBINING HOOK
LATIN CAPITAL LETTER K WITH HOOK=LATIN CAPITAL LETTER K+COMBINING HOOK
LATIN CAPITAL LETTER P WITH HOOK=LATIN CAPITAL LETTER P+COMBINING HOOK
LATIN CAPITAL LETTER T WITH HOOK=LATIN CAPITAL LETTER T+COMBINING HOOK
LATIN CAPITAL LETTER Y WITH HOOK=LATIN CAPITAL LETTER Y+COMBINING HOOK
LATIN SMALL LETTER B WITH HOOK=LATIN SMALL LETTER B+COMBINING HOOK
LATIN SMALL LETTER C WITH HOOK=LATIN SMALL LETTER C+COMBINING HOOK
LATIN SMALL LETTER D WITH HOOK=LATIN SMALL LETTER D+COMBINING HOOK
LATIN SMALL LETTER F WITH HOOK=LATIN SMALL LETTER F+COMBINING HOOK
LATIN SMALL LETTER G WITH HOOK=LATIN SMALL LETTER G+COMBINING HOOK
LATIN SMALL LETTER H WITH HOOK=LATIN SMALL LETTER H+COMBINING HOOK
LATIN SMALL LETTER K WITH HOOK=LATIN SMALL LETTER K+COMBINING HOOK
LATIN SMALL LETTER M WITH HOOK=LATIN SMALL LETTER M+COMBINING HOOK
LATIN SMALL LETTER P WITH HOOK=LATIN SMALL LETTER P+COMBINING HOOK
LATIN SMALL LETTER Q WITH HOOK=LATIN SMALL LETTER Q+COMBINING HOOK
LATIN SMALL LETTER T WITH HOOK=LATIN SMALL LETTER T+COMBINING HOOK
LATIN SMALL LETTER Y WITH HOOK=LATIN SMALL LETTER Y+COMBINING HOOK
LEFTWARDS ARROW WITH HOOK=LEFTWARDS ARROW+COMBINING HOOK
RIGHTWARDS ARROW WITH HOOK=RIGHTWARDS ARROW+COMBINING HOOK

It´s odd that although there is a LATIN SMALL LETTER HENG WITH HOOK, there is no LATIN SMALL LETTER HENG. It should be represented as a ligature of h and eng, and that gives us

LATIN SMALL LETTER HENG WITH HOOK=LATIN SMALL LETTER H+SEMANTIC LIGATURE+LATIN SMALL LETTER ENG+COMBINING HOOK

COMBINING CURL

The case of characters ``with curl´´ is similar to those ``with hook´´, in that the curl moves around depending on the character being modified. But if there was a combining curl, it would be used for 12 characters, if we also follow the principal of Occam´s Razor and include crossed-tail, belted, looped and closed characters in this set, as is justified by their visual appearance.

LATIN LETTER REVERSED ESH LOOP=LATIN SMALL LETTER ESH+SEMANTIC REVERSED+COMBINING CURL
LATIN SMALL LETTER C WITH CURL=LATIN SMALL LETTER C+COMBINING CURL
LATIN SMALL LETTER CLOSED OMEGA=GREEK SMALL LETTER OMEGA+COMBINING CURL
LATIN SMALL LETTER CLOSED OPEN E=LATIN SMALL LETTER OPEN E+COMBINING CURL
LATIN SMALL LETTER ESH WITH CURL=LATIN SMALL LETTER ESH+COMBINING CURL
LATIN SMALL LETTER EZH WITH CURL=LATIN SMALL LETTER EZH+COMBINING CURL
LATIN SMALL LETTER J WITH CROSSED-TAIL=LATIN SMALL LETTER J+COMBINING CURL
LATIN SMALL LETTER L WITH BELT=LATIN SMALL LETTER L+COMBINING CURL
LATIN SMALL LETTER Z WITH CURL=LATIN SMALL LETTER Z+COMBINING CURL
LEFTWARDS ARROW WITH LOOP=LEFTWARDS ARROW+COMBINING CURL
RIGHTWARDS ARROW WITH LOOP=RIGHTWARDS ARROW+COMBINING CURL
SCRIPT SMALL G=LATIN SMALL LETTER G+SEMANTIC SCRIPT+COMBINING CURL

SCRIPT SMALL G is here because the only difference between it and LATIN SMALL LETTER SCRIPT G (at least, as they appear in The Book) is the fact that the descender crosses itself.

COMBINING CYRILLIC DESCENDER

These characters are described as ``with descender´´. The visual appearance of the descender is variable.

CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER=CYRILLIC CAPITAL LETTER ABKHASIAN CHE+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER CHE WITH DESCENDER=CYRILLIC CAPITAL LETTER CHE+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER EN WITH DESCENDER=CYRILLIC CAPITAL LETTER EN+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER ES WITH DESCENDER=CYRILLIC CAPITAL LETTER ES+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER HA WITH DESCENDER=CYRILLIC CAPITAL LETTER HA+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER KA WITH DESCENDER=CYRILLIC CAPITAL LETTER KA+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER TE WITH DESCENDER=CYRILLIC CAPITAL LETTER TE+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER ZE WITH DESCENDER=CYRILLIC CAPITAL LETTER ZE+COMBINING CYRILLIC DESCENDER
CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER=CYRILLIC CAPITAL LETTER ZHE+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER=CYRILLIC SMALL LETTER ABKHASIAN CHE+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER CHE WITH DESCENDER=CYRILLIC SMALL LETTER CHE+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER EN WITH DESCENDER=CYRILLIC SMALL LETTER EN+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER ES WITH DESCENDER=CYRILLIC SMALL LETTER ES+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER HA WITH DESCENDER=CYRILLIC SMALL LETTER HA+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER KA WITH DESCENDER=CYRILLIC SMALL LETTER KA+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER TE WITH DESCENDER=CYRILLIC SMALL LETTER TE+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER ZE WITH DESCENDER=CYRILLIC SMALL LETTER ZE+COMBINING CYRILLIC DESCENDER
CYRILLIC SMALL LETTER ZHE WITH DESCENDER=CYRILLIC SMALL LETTER ZHE+COMBINING CYRILLIC DESCENDER

COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK

This can be used to build up the following character.

VERTICAL KANA REPEAT WITH VOICED SOUND MARK=VERTICAL KANA REPEAT MARK+COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK

COLUMN SEPARATOR

This character does the same job for table layout and HORIZONTAL TABULATION that LINE SEPARATOR does for vertical layout and LINE FEED, namely, defining a character to ``unambiguously represent this semantic´´. A renderer should arrange all lines within a paragraph (as delimited by PARAGRAPH SEPARATOR characters) that contain this character so that the columns so marked line up vertically. Apart from that, the character has zero width, so adjacent columns of the table will be contiguous. (They could still contain space characters though ...)

Tables defined in this way are logically unable to nest, so the complexity of a Unicode renderer that implements COLUMN SEPARATOR is much, much less than an H T M L browser, which might have to recursively format tables of tables of ... of tables. All it has to do is split the paragraph into lines and then the lines into columns, padding shorter columns out to the lengths of the longer ones.

NEGATIVE SPACE

When placed between 2 glyphs, this character causes them to be placed closer together than if they were simply written one after the other. The characters may or may not remain distinct: the spacing between them may be removed entirely, causing them to touch. There is no expectation that multiple uses of NEGATIVE SPACE can be used to move back through running text. This is not BACKSPACE!

The question of whether to prefer NEGATIVE SPACE over SEMANTIC AFTER is not very obvious. It is used where some extra ``jostling together´´ appears to be called for.

APPROXIMATELY EQUAL TO OR THE IMAGE OF=DOT ABOVE+NEGATIVE SPACE+EQUALS SIGN+NEGATIVE SPACE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+DOT ABOVE+POP DIRECTIONAL FORMATTING
BETWEEN=LEFT PARENTHESIS+NEGATIVE SPACE+RIGHT PARENTHESIS
COLON EQUALS=COLON+NEGATIVE SPACE+EQUALS SIGN
DOUBLE INTEGRAL=INTEGRAL+NEGATIVE SPACE+INTEGRAL
DOUBLE SUBSET=SUBSET OF+NEGATIVE SPACE+SUBSET OF
DOUBLE SUPERSET=SUPERSET OF+NEGATIVE SPACE+SUPERSET OF
IMAGE OF OR APPROXIMATELY EQUAL TO=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+DOT ABOVE+NEGATIVE SPACE+EQUALS SIGN+NEGATIVE SPACE+DOT ABOVE
MUCH GREATER-THAN=GREATER-THAN SIGN+NEGATIVE SPACE+GREATER-THAN SIGN
MUCH LESS-THAN=LESS-THAN SIGN+NEGATIVE SPACE+LESS-THAN SIGN
RIGHTWARDS ARROW TO BAR=RIGHTWARDS ARROW+NEGATIVE SPACE+VERTICAL STROKE
TRIPLE INTEGRAL=INTEGRAL+NEGATIVE SPACE+INTEGRAL+NEGATIVE SPACE+INTEGRAL
VERY MUCH GREATER-THAN=GREATER-THAN SIGN+NEGATIVE SPACE+GREATER-THAN SIGN+NEGATIVE SPACE+GREATER-THAN SIGN
VERY MUCH LESS-THAN=LESS-THAN SIGN+NEGATIVE SPACE+LESS-THAN SIGN+NEGATIVE SPACE+LESS-THAN SIGN

It seems sensible to provide the following as well, where the spacing goes in the other direction.

HORIZONTAL ELLIPSIS=FULL STOP+THIN SPACE+FULL STOP+THIN SPACE+FULL STOP
MIDLINE HORIZONTAL ELLIPSIS=MIDDLE DOT+THIN SPACE+MIDDLE DOT+THIN SPACE+MIDDLE DOT
TWO DOT LEADER=FULL STOP+THIN SPACE+FULL STOP

Easy to do algorithmically.

Unlikely to be very productive in forming new characters, as it´s easier to just write a character twice, and there is visually little difference. But has ``prior art´´ in TEX, where it is represented as \<.

If used but not recognised, quite likely to cause the resulting text to be misinterpreted.

Currency symbols

Since we know that the currency symbols were invented as typographic variants of existing characters, it seems a good idea to encode this. Then (a) software with no glyph can generate an acceptable alternative and (b) when a new currency is invented, a symbol can be given to it without needing to go through a standardisation process. I suggest that

POUND SIGN=LATIN CAPITAL LETTER L+SEMANTIC SCRIPT+COMBINING SHORT STROKE OVERLAY

is historically right, right by current usage, and gives a result that will be understandable to an English national if there is no better glyph available (namely, `L´). Other currency symbols should be treated the same way (including the symbol for Euro which looks to me like LATIN SMALL LETTER C+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING---no doubt, any such suggestion would give the European Commission a collective fit ...).

CENT SIGN=LATIN SMALL LETTER C+COMBINING LONG VERTICAL LINE OVERLAY
COLON SIGN=LATIN CAPITAL LETTER C+SEMANTIC OVERPRINT+START GROUP+SOLIDUS+SOLIDUS+POP DIRECTIONAL FORMATTING
CRUZEIRO SIGN=LATIN CAPITAL LETTER C+SEMANTIC OVERPRINT+LATIN SMALL LETTER R
DOLLAR SIGN=LATIN CAPITAL LETTER S+COMBINING LONG VERTICAL LINE OVERLAY
DONG SIGN=LATIN SMALL LETTER D+COMBINING LOW LINE+COMBINING SHORT STROKE OVERLAY
EURO SIGN=LATIN SMALL LETTER C+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
EURO-CURRENCY SIGN=LATIN CAPITAL LETTER C+SEMANTIC LIGATURE+LATIN CAPITAL LETTER E
FRENCH FRANC SIGN=LATIN CAPITAL LETTER F+SEMANTIC OVERPRINT+LATIN SMALL LETTER R
LIRA SIGN=LATIN CAPITAL LETTER L+SEMANTIC SCRIPT+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
MILL SIGN=LATIN SMALL LETTER M+COMBINING SHORT SOLIDUS OVERLAY
NAIRA SIGN=LATIN CAPITAL LETTER N+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
PESETA SIGN=LATIN CAPITAL LETTER P+COMBINING SHORT STROKE OVERLAY
RUPEE SIGN=LATIN CAPITAL LETTER R+COMBINING SHORT STROKE OVERLAY+SEMANTIC LIGATURE+LATIN SMALL LETTER S
THAI CURRENCY SYMBOL BAHT=LATIN CAPITAL LETTER B+COMBINING LONG SOLIDUS OVERLAY
WON SIGN=LATIN CAPITAL LETTER W+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
YEN SIGN=LATIN CAPITAL LETTER Y+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING

It seems futile to deny history and claim that these are fully-formed characters in their own right. This doesn´t deny anyone the right to design specialised glyphs, if they wish. There is no reason to change current practice, just to systematise it.

Spacing characters

According to the Unicode Standard, a rendering agent is allowed some discretion in its approach to line breaking. For instance, it may choose to break lines at HYPHEN; but it doesn´t have to.

Since such a lot is specified about line breaking, it would be interesting to specify it completely.

Let a space character have one or more of the following properties:

A certain natural width.
It may allow line breaks.
It may be flexible, in the sense that extra space can be inserted there for justification.
It may be a word-constituent.

To be able to analyse spacing characters in a way consistent with the rest of the Atomic Theory, we need to ascribe these properties to certain characters, and build the rest up from them.

The most natural choice of atoms seems to be:

THIN SPACE Has a width of 1/6en, but otherwise just like any other character (no special line-breaking property, not flexible).
ZERO WIDTH SPACE Marks a place where a line may be broken.
HAIR SPACE Marks a place where justification may be inserted.
ZERO WIDTH JOINER Continues a word: the characters before and after it (if letters) are not considered to be in final or initial position.

Then we can express all other spacing characters in terms of these. We go along with TEX by specifying that EM QUAD (\qquad) and EN QUAD (\quad) provide spaces that are legitimate breakpoints within a paragraph, while EN SPACE (\enspace) and THIN SPACE (\thinspace) produce space that cannot cause a break. TEX´s \enskip is our EN QUAD+HAIR SPACE, and

EM QUAD=EM SPACE+ZERO WIDTH SPACE
EM SPACE=EN SPACE+EN SPACE
EN QUAD=EN SPACE+ZERO WIDTH SPACE
EN SPACE=FOUR-PER-EM SPACE+FOUR-PER-EM SPACE
FOUR-PER-EM SPACE=THIN SPACE+THIN SPACE+THIN SPACE
HYPHEN=NON-BREAKING HYPHEN+ZERO WIDTH SPACE
SIX-PER-EM SPACE=THIN SPACE+THIN SPACE
SPACE=NO-BREAK SPACE+ZERO WIDTH SPACE+HAIR SPACE
THREE-PER-EM SPACE=SIX-PER-EM SPACE+SIX-PER-EM SPACE
ZERO WIDTH NON-JOINER=ZERO WIDTH NO-BREAK SPACE

Since lines are only broken at ZERO WIDTH SPACE, we also have

TIBETAN MARK DELIMITER TSHEG BSTAR=
TIBETAN MARK INTERSYLLABIC TSHEG=TIBETAN MARK DELIMITER TSHEG BSTAR+ZERO WIDTH SPACE

We also have to delete a few existing decompositions for this to work.

HAIR SPACE=
NO-BREAK SPACE=
NON-BREAKING HYPHEN=
THIN SPACE=
ZERO WIDTH JOINER=
ZERO WIDTH SPACE=

HAIR SPACE can also be used in tables (using the COLUMN SEPARATOR character) to mark places where extra spacing can be added. This can provide control over right justification or centring of cell contents.

The widths of NO-BREAK SPACE (the typical interword gap), FIGURE SPACE (a digit) and PUNCTUATION SPACE (a FULL STOP) are not given here as they are at the separate discretion of a font designer, rather than being a certain specific width (even in ens). We follow TEX (``\phantom´´) once again and invent a (*)SEMANTIC PHANTOM, which when applied to a glyph causes all the ink to be removed---but without changing the bounding box. Of course, the main reason for providing this is to provide another tool for those writing complex documents using plain text: it allows fine control over certain sorts of layout.

FIGURE SPACE=DIGIT ZERO+SEMANTIC PHANTOM
PUNCTUATION SPACE=FULL STOP+SEMANTIC PHANTOM

(Unicode provides compatibility decompositions that assume they are all the same width.)

So, an ordinary SPACE is NO-BREAK SPACE+ZERO WIDTH SPACE+HAIR SPACE, which means

Who´d have thought it was so complicated?

Utterly useless characters

This sounds a bit harsh, but some of the existing characters are present purely for compatibility purposes, and are so specialised that a general-purpoe renderer should never see them, can make no real sense of them, and so needs no glyphs for them.

So we just throw them away (while preserving them in user data, of course).

BOTTOM HALF INTEGRAL=REPLACEMENT CHARACTER
COMBINING DOUBLE TILDE LEFT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER
COMBINING DOUBLE TILDE RIGHT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER
COMBINING LIGATURE LEFT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER
COMBINING LIGATURE RIGHT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER
LOWER HALF INVERSE WHITE CIRCLE=REPLACEMENT CHARACTER
TOP HALF INTEGRAL=REPLACEMENT CHARACTER
UPPER HALF INVERSE WHITE CIRCLE=REPLACEMENT CHARACTER
VERTICAL KANA REPEAT MARK LOWER HALF=REPLACEMENT CHARACTER
VERTICAL KANA REPEAT MARK UPPER HALF=REPLACEMENT CHARACTER
VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF=REPLACEMENT CHARACTER

Hebrew decomposition

Most of the Hebrew accents and points are easily built up from other characters.

HEBREW ACCENT DEHI=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+START GROUP+LOWER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT GERESH MUQDAM=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+UPPER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT GERESH=SEMANTIC ABOVE+START GROUP+UPPER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT GERSHAYIM=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+START GROUP+UPPER LEFT QUADRANT CIRCULAR ARC+UPPER LEFT QUADRANT CIRCULAR ARC+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT ILUY=SEMANTIC ABOVE+START GROUP+RIGHT FLOOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT MAHAPAKH=SEMANTIC BELOW+START GROUP+LESS-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT MERKHA KEFULA=SEMANTIC BELOW+START GROUP+START GROUP+LOWER RIGHT QUADRANT CIRCULAR ARC+LOWER RIGHT QUADRANT CIRCULAR ARC+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT MERKHA=SEMANTIC BELOW+START GROUP+LOWER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT MUNAH=SEMANTIC BELOW+START GROUP+RIGHT FLOOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT OLE=SEMANTIC ABOVE+START GROUP+LESS-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT PASHTA=SEMANTIC AFTER+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT QADMA=SEMANTIC ABOVE+START GROUP+UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT REVIA=SEMANTIC ABOVE+START GROUP+BLACK DIAMOND+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT SEGOL=COMBINING DIAERESIS+COMBINING DOT ABOVE
HEBREW ACCENT TEVIR=SEMANTIC BELOW+START GROUP+START GROUP+LOWER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC OVERPRINT+MIDDLE DOT+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT TIPEHA=SEMANTIC BELOW+START GROUP+LOWER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT YETIV=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+START GROUP+LESS-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT ZAQEF GADOL=SEMANTIC ABOVE+START GROUP+START GROUP+VERTICAL STROKE+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW ACCENT ZAQEF QATAN=COMBINING DOT ABOVE+COMBINING DOT ABOVE
HEBREW ACCENT ZARQA=SEMANTIC ABOVE+INVERTED LAZY S
HEBREW ACCENT ZINOR=SEMANTIC AFTER+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+INVERTED LAZY S+POP DIRECTIONAL FORMATTING
HEBREW MARK MASORA CIRCLE=COMBINING RING ABOVE
HEBREW MARK UPPER DOT=COMBINING DOT ABOVE
HEBREW POINT DAGESH OR MAPIQ=SEMANTIC OVERPRINT+MIDDLE DOT
HEBREW POINT HATAF PATAH=SEMANTIC BELOW+START GROUP+MACRON+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING
HEBREW POINT HATAF QAMATS=SEMANTIC BELOW+START GROUP+DOWN TACK+SEMANTIC SMALL+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING
HEBREW POINT HATAF SEGOL=SEMANTIC BELOW+START GROUP+DIAERESIS+COMBINING DOT BELOW+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING
HEBREW POINT HIRIQ=COMBINING DOT BELOW
HEBREW POINT HOLAM=SEMANTIC AFTER+DOT ABOVE
HEBREW POINT JUDEO-SPANISH VARIKA=COMBINING BREVE BELOW
HEBREW POINT METEG=COMBINING VERTICAL LINE BELOW
HEBREW POINT PATAH=COMBINING MACRON BELOW
HEBREW POINT QAMATS=COMBINING DOWN TACK BELOW
HEBREW POINT QUBUTS=SEMANTIC BELOW+START GROUP+DOWN RIGHT DIAGONAL ELLIPSIS+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
HEBREW POINT RAFE=COMBINING MACRON
HEBREW POINT SEGOL=COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
HEBREW POINT SHEVA=COMBINING DOT BELOW+COMBINING DOT BELOW
HEBREW POINT SHIN DOT=SEMANTIC BEFORE+DOT ABOVE
HEBREW POINT SIN DOT=SEMANTIC AFTER+DOT ABOVE
HEBREW POINT TSERE=COMBINING DIAERESIS BELOW
HEBREW PUNCTUATION GERESH=PRIME
HEBREW PUNCTUATION GERSHAYIM=DOUBLE PRIME
HEBREW PUNCTUATION MAQAF=MACRON
HEBREW PUNCTUATION PASEQ=VERTICAL LINE
HEBREW PUNCTUATION SOF PASUQ=COLON

Arabic decomposition

So many Arabic characters are compositions involving the 29 letters of the alphabet that we may as well just take them in order and see what happens.

ARABIC DAMMA=COMBINING COMMA ABOVE
ARABIC DAMMATAN=ARABIC DAMMA+ARABIC DAMMA
ARABIC EMPTY CENTRE HIGH STOP=COMBINING RING ABOVE
ARABIC EMPTY CENTRE LOW STOP=COMBINING RING BELOW
ARABIC FATHA=COMBINING ACUTE ACCENT
ARABIC FATHATAN=ARABIC FATHA+ARABIC FATHA
ARABIC FULL STOP=NON-BREAKING HYPHEN
ARABIC KASRA=COMBINING GRAVE ACCENT
ARABIC KASRATAN=ARABIC KASRA+ARABIC KASRA
ARABIC LETTER AIN WITH THREE DOTS ABOVE=ARABIC LETTER AIN+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER ALEF WITH HAMZA ABOVE=ARABIC LETTER ALEF+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER ALEF WITH HAMZA BELOW=ARABIC LETTER ALEF+SEMANTIC BELOW+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER ALEF WITH MADDA ABOVE=ARABIC LETTER ALEF+ARABIC SMALL HIGH MADDA
ARABIC LETTER ALEF WITH WAVY HAMZA ABOVE=ARABIC LETTER ALEF+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+SEMANTIC VARIANT+POP DIRECTIONAL FORMATTING
ARABIC LETTER ALEF WITH WAVY HAMZA BELOW=ARABIC LETTER ALEF+SEMANTIC BELOW+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+SEMANTIC VARIANT+POP DIRECTIONAL FORMATTING
ARABIC LETTER BEEH=ARABIC LETTER DOTLESS BEH+COMBINING DOT BELOW+COMBINING DOT BELOW
ARABIC LETTER BEH=ARABIC LETTER DOTLESS BEH+COMBINING DOT BELOW
ARABIC LETTER BEHEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS BELOW+COMBINING DIAERESIS BELOW
ARABIC LETTER DAD=ARABIC LETTER SAD+COMBINING DOT ABOVE
ARABIC LETTER DAHAL=ARABIC LETTER DAL+COMBINING DIAERESIS
ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH=ARABIC LETTER DAL+COMBINING DOT BELOW+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER DAL WITH DOT BELOW=ARABIC LETTER DAL+COMBINING DOT BELOW
ARABIC LETTER DAL WITH FOUR DOTS ABOVE=ARABIC LETTER DAL+COMBINING DIAERESIS+COMBINING DIAERESIS
ARABIC LETTER DAL WITH RING=ARABIC LETTER DAL+COMBINING RING BELOW
ARABIC LETTER DAL WITH THREE DOTS ABOVE DOWNWARDS=ARABIC LETTER DAL+COMBINING DOT ABOVE+COMBINING DIAERESIS
ARABIC LETTER DDAHAL=ARABIC LETTER DAL+COMBINING DIAERESIS BELOW
ARABIC LETTER DDAL=ARABIC LETTER DAL+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER DUL=ARABIC LETTER DAL+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER DYEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+COLON
ARABIC LETTER E=ARABIC LETTER FARSI YEH+COMBINING DOT BELOW+COMBINING DOT BELOW
ARABIC LETTER FEH WITH DOT BELOW=ARABIC LETTER FEH+COMBINING DOT BELOW
ARABIC LETTER FEH WITH DOT MOVED BELOW=ARABIC LETTER DOTLESS FEH+COMBINING DOT BELOW
ARABIC LETTER FEH WITH THREE DOTS BELOW=ARABIC LETTER DOTLESS FEH+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
ARABIC LETTER FEH=ARABIC LETTER DOTLESS FEH+COMBINING DOT ABOVE
ARABIC LETTER GAF WITH RING=ARABIC LETTER GAF+COMBINING RING OVERLAY
ARABIC LETTER GAF WITH THREE DOTS ABOVE=ARABIC LETTER GAF+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER GAF WITH TWO DOTS BELOW=ARABIC LETTER GAF+COMBINING DIAERESIS BELOW
ARABIC LETTER GAF=ARABIC LETTER KEHEH+ARABIC FATHA
ARABIC LETTER GHAIN=ARABIC LETTER AIN+COMBINING DOT ABOVE
ARABIC LETTER GUEH=ARABIC LETTER GAF+COMBINING DOT BELOW+COMBINING DOT BELOW
ARABIC LETTER HAH WITH HAMZA ABOVE=ARABIC LETTER HAH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER HAH WITH THREE DOTS ABOVE=ARABIC LETTER HAH+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE=ARABIC LETTER HAH+COMBINING DOT ABOVE+COMBINING DOT ABOVE
ARABIC LETTER HEH DOACHASHMEE=ARABIC LETTER HEH+SEMANTIC VARIANT
ARABIC LETTER HEH GOAL WITH HAMZA ABOVE=ARABIC LETTER HEH GOAL+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER HEH WITH YEH ABOVE=ARABIC LETTER AE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER HIGH HAMZA ALEF=ARABIC LETTER ALEF+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA
ARABIC LETTER HIGH HAMZA WAW=ARABIC LETTER WAW+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA
ARABIC LETTER HIGH HAMZA YEH=ARABIC LETTER YEH+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA
ARABIC LETTER HIGH HAMZA=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER JEEM=ARABIC LETTER HAH+SEMANTIC OVERPRINT+MIDDLE DOT
ARABIC LETTER JEH=ARABIC LETTER REH+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER KAF WITH DOT ABOVE=ARABIC LETTER KAF+COMBINING DOT ABOVE
ARABIC LETTER KAF WITH RING=ARABIC LETTER KEHEH+COMBINING RING OVERLAY
ARABIC LETTER KAF WITH THREE DOTS BELOW=ARABIC LETTER KAF+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
ARABIC LETTER KHAH=ARABIC LETTER HAH+COMBINING DOT ABOVE
ARABIC LETTER KIRGHIZ OE=ARABIC LETTER WAW+COMBINING SHORT STROKE OVERLAY
ARABIC LETTER KIRGHIZ YU=ARABIC LETTER WAW+COMBINING CIRCUMFLEX ACCENT
ARABIC LETTER LAM WITH DOT ABOVE=ARABIC LETTER LAM+COMBINING DOT ABOVE
ARABIC LETTER LAM WITH SMALL V=ARABIC LETTER LAM+COMBINING CARON
ARABIC LETTER LAM WITH THREE DOTS ABOVE=ARABIC LETTER LAM+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER NG=ARABIC LETTER KAF+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER NGOEH=ARABIC LETTER GAF+COMBINING DIAERESIS
ARABIC LETTER NOON WITH RING=ARABIC LETTER NOON+COMBINING RING BELOW
ARABIC LETTER NOON WITH THREE DOTS ABOVE=ARABIC LETTER NOON GHUNNA+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER NOON=ARABIC LETTER NOON GHUNNA+COMBINING DOT ABOVE
ARABIC LETTER NYEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+START GROUP+MIDDLE DOT+MIDDLE DOT+POP DIRECTIONAL FORMATTING
ARABIC LETTER OE=ARABIC LETTER WAW+COMBINING CARON
ARABIC LETTER PEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
ARABIC LETTER PEHEH=ARABIC LETTER DOTLESS FEH+COMBINING DIAERESIS+COMBINING DIAERESIS
ARABIC LETTER QAF WITH DOT ABOVE=ARABIC LETTER DOTLESS QAF+COMBINING DOT ABOVE
ARABIC LETTER QAF WITH THREE DOTS ABOVE=ARABIC LETTER DOTLESS QAF+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER QAF=ARABIC LETTER DOTLESS QAF+COMBINING DIAERESIS
ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE=ARABIC LETTER REH+COMBINING DOT BELOW+SEMANTIC OVERPRINT+MIDDLE DOT
ARABIC LETTER REH WITH DOT BELOW=ARABIC LETTER REH+COMBINING DOT BELOW
ARABIC LETTER REH WITH FOUR DOTS ABOVE=ARABIC LETTER REH+COMBINING DIAERESIS+COMBINING DIAERESIS
ARABIC LETTER REH WITH RING=ARABIC LETTER REH+COMBINING RING BELOW
ARABIC LETTER REH WITH SMALL V BELOW=ARABIC LETTER REH+COMBINING CARON BELOW
ARABIC LETTER REH WITH SMALL V=ARABIC LETTER REH+COMBINING CARON
ARABIC LETTER REH WITH TWO DOTS ABOVE=ARABIC LETTER REH+COMBINING DIAERESIS
ARABIC LETTER RNOON=ARABIC LETTER NOON GHUNNA+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER RREH=ARABIC LETTER REH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER SAD WITH THREE DOTS ABOVE=ARABIC LETTER SAD+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER SAD WITH TWO DOTS BELOW=ARABIC LETTER SAD+COMBINING DIAERESIS BELOW
ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE=ARABIC LETTER SEEN+COMBINING DOT ABOVE+COMBINING DOT BELOW
ARABIC LETTER SEEN WITH THREE DOTS BELOW AND THREE DOTS ABOVE=ARABIC LETTER SEEN+COMBINING DIAERESIS+COMBINING DOT ABOVE+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
ARABIC LETTER SEEN WITH THREE DOTS BELOW=ARABIC LETTER SEEN+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
ARABIC LETTER SHEEN=ARABIC LETTER SEEN+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER SUPERSCRIPT ALEF=SEMANTIC ABOVE+START GROUP+ARABIC LETTER ALEF+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER SWASH KAF=ARABIC LETTER KEHEH+SEMANTIC VARIANT
ARABIC LETTER TAH WITH THREE DOTS ABOVE=ARABIC LETTER TAH+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER TCHEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+START GROUP+START GROUP+MIDDLE DOT+MIDDLE DOT+POP DIRECTIONAL FORMATTING+SEMANTIC BELOW+MIDDLE DOT+POP DIRECTIONAL FORMATTING
ARABIC LETTER TCHEHEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+START GROUP+COLON+COLON+POP DIRECTIONAL FORMATTING
ARABIC LETTER TEH MARBUTA GOAL=ARABIC LETTER HEH GOAL+COMBINING DIAERESIS
ARABIC LETTER TEH MARBUTA=ARABIC LETTER AE+COMBINING DIAERESIS
ARABIC LETTER TEH WITH RING=ARABIC LETTER TEH+COMBINING RING BELOW
ARABIC LETTER TEH WITH THREE DOTS ABOVE DOWNWARDS=ARABIC LETTER DOTLESS BEH+COMBINING DOT ABOVE+COMBINING DIAERESIS
ARABIC LETTER TEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS
ARABIC LETTER TEHEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS+COMBINING DIAERESIS
ARABIC LETTER THAL=ARABIC LETTER DAL+COMBINING DOT ABOVE
ARABIC LETTER THEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER TTEH=ARABIC LETTER DOTLESS BEH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER TTEHEH=ARABIC LETTER DOTLESS BEH+COMBINING DOT ABOVE+COMBINING DOT ABOVE
ARABIC LETTER U WITH HAMZA ABOVE=ARABIC LETTER U+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA
ARABIC LETTER U=ARABIC LETTER WAW+ARABIC DAMMA
ARABIC LETTER VE=ARABIC LETTER WAW+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER VEH=ARABIC LETTER DOTLESS FEH+COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC LETTER WAW WITH HAMZA ABOVE=ARABIC LETTER WAW+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER WAW WITH RING=ARABIC LETTER WAW+COMBINING RING OVERLAY
ARABIC LETTER WAW WITH TWO DOTS ABOVE=ARABIC LETTER WAW+COMBINING DIAERESIS
ARABIC LETTER YEH BARREE WITH HAMZA ABOVE=ARABIC LETTER YEH BARREE+SEMANTIC VARIANT+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER YEH BARREE=ARABIC LETTER FARSI YEH+SEMANTIC VARIANT
ARABIC LETTER YEH WITH HAMZA ABOVE=ARABIC LETTER YEH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC LETTER YEH WITH SMALL V=ARABIC LETTER FARSI YEH+COMBINING CARON
ARABIC LETTER YEH WITH TAIL=ARABIC LETTER FARSI YEH+COMBINING HOOK
ARABIC LETTER YEH WITH THREE DOTS BELOW=ARABIC LETTER FARSI YEH+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW
ARABIC LETTER YEH=ARABIC LETTER ALEF MAKSURA+COMBINING DIAERESIS BELOW
ARABIC LETTER YU=ARABIC LETTER WAW+ARABIC LETTER SUPERSCRIPT ALEF
ARABIC LETTER ZAH=ARABIC LETTER TAH+COMBINING DOT ABOVE
ARABIC LETTER ZAIN=ARABIC LETTER REH+COMBINING DOT ABOVE
ARABIC PERCENT SIGN=MIDDLE DOT+FRACTION SLASH+MIDDLE DOT
ARABIC QUESTION MARK=QUESTION MARK+SEMANTIC REVERSED
ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE=COMBINING DOT ABOVE
ARABIC SMALL HIGH DOTLESS HEAD OF KHAH=SEMANTIC ABOVE+START GROUP+GREATER-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH JEEM=SEMANTIC ABOVE+START GROUP+ARABIC LETTER JEEM+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH LAM ALEF=SEMANTIC ABOVE+START GROUP+START GROUP+ARABIC LETTER LAM+ARABIC LETTER ALEF+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA=SEMANTIC ABOVE+START GROUP+START GROUP+ARABIC LETTER QAF+ARABIC LETTER LAM+ARABIC LETTER ALEF MAKSURA+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA=SEMANTIC ABOVE+START GROUP+START GROUP+ARABIC LETTER SAD+ARABIC LETTER LAM+ARABIC LETTER ALEF MAKSURA+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH MEEM INITIAL FORM=SEMANTIC ABOVE+START GROUP+START GROUP+ZERO WIDTH NON-JOINER+ARABIC LETTER MEEM+ZERO WIDTH JOINER+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH MEEM ISOLATED FORM=SEMANTIC ABOVE+START GROUP+START GROUP+ZERO WIDTH NON-JOINER+ARABIC LETTER MEEM+ZERO WIDTH NON-JOINER+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH NOON=SEMANTIC ABOVE+START GROUP+ARABIC LETTER NOON+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH ROUNDED ZERO=COMBINING DOT ABOVE
ARABIC SMALL HIGH SEEN=SEMANTIC ABOVE+START GROUP+ARABIC LETTER SEEN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL HIGH THREE DOTS=COMBINING DIAERESIS+COMBINING DOT ABOVE
ARABIC SMALL HIGH YEH=SEMANTIC ABOVE+START GROUP+ARABIC LETTER YEH BARREE+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL LOW MEEM=SEMANTIC BELOW+START GROUP+ARABIC LETTER MEEM+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL LOW SEEN=SEMANTIC BELOW+START GROUP+ARABIC LETTER SEEN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL WAW=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER WAW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC SMALL YEH=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER YEH BARREE+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
ARABIC TATWEEL=ZERO WIDTH JOINER+EN DASH+ZERO WIDTH JOINER
ARABIC THOUSANDS SEPARATOR=COMMA
EXTENDED ARABIC-INDIC DIGIT EIGHT=ARABIC-INDIC DIGIT EIGHT+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT FIVE=ARABIC-INDIC DIGIT FIVE+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT FOUR=ARABIC-INDIC DIGIT FOUR+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT NINE=ARABIC-INDIC DIGIT NINE+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT ONE=ARABIC-INDIC DIGIT ONE+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT SEVEN=ARABIC-INDIC DIGIT SEVEN+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT SIX=ARABIC-INDIC DIGIT SIX+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT THREE=ARABIC-INDIC DIGIT THREE+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT TWO=ARABIC-INDIC DIGIT TWO+SEMANTIC VARIANT
EXTENDED ARABIC-INDIC DIGIT ZERO=ARABIC-INDIC DIGIT ZERO+SEMANTIC VARIANT

Indic decompositions

There is structure within each of the Indic scripts which is described in the Unicode standard, but not specified in the code charts: any letter which is a vowel can be represented as a letter A followed by the corresponding vowel sign. This is completely hopeless from the point of simplifying rendering---in fact, it complicates it, as the glyphs bear little visual relationship. But the structure is there, and should be dealt with correctly.

BENGALI LETTER AA=BENGALI LETTER A+BENGALI VOWEL SIGN AA
BENGALI LETTER AI=BENGALI LETTER A+BENGALI VOWEL SIGN AI
BENGALI LETTER AU=BENGALI LETTER A+BENGALI VOWEL SIGN AU
BENGALI LETTER E=BENGALI LETTER A+BENGALI VOWEL SIGN E
BENGALI LETTER I=BENGALI LETTER A+BENGALI VOWEL SIGN I
BENGALI LETTER II=BENGALI LETTER A+BENGALI VOWEL SIGN II
BENGALI LETTER O=BENGALI LETTER A+BENGALI VOWEL SIGN O
BENGALI LETTER U=BENGALI LETTER A+BENGALI VOWEL SIGN U
BENGALI LETTER UU=BENGALI LETTER A+BENGALI VOWEL SIGN UU
BENGALI LETTER VOCALIC L=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC L
BENGALI LETTER VOCALIC LL=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC LL
BENGALI LETTER VOCALIC R=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC R
BENGALI LETTER VOCALIC RR=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC RR
DEVANAGARI LETTER AA=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN AA
DEVANAGARI LETTER AI=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN AI
DEVANAGARI LETTER AU=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN AU
DEVANAGARI LETTER CANDRA E=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN CANDRA E
DEVANAGARI LETTER CANDRA O=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN CANDRA E
DEVANAGARI LETTER E=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN E
DEVANAGARI LETTER I=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN I
DEVANAGARI LETTER II=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN II
DEVANAGARI LETTER NGA=DEVANAGARI LETTER DDA+SEMANTIC AFTER+MIDDLE DOT
DEVANAGARI LETTER O=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN O
DEVANAGARI LETTER SHORT E=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN SHORT E
DEVANAGARI LETTER SHORT O=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN SHORT E
DEVANAGARI LETTER U=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN U
DEVANAGARI LETTER UU=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN UU
DEVANAGARI LETTER VOCALIC L=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC L
DEVANAGARI LETTER VOCALIC LL=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC LL
DEVANAGARI LETTER VOCALIC R=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC R
DEVANAGARI LETTER VOCALIC RR=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC RR
GUJARATI LETTER AA=GUJARATI LETTER A+GUJARATI VOWEL SIGN AA
GUJARATI LETTER AI=GUJARATI LETTER A+GUJARATI VOWEL SIGN AI
GUJARATI LETTER AU=GUJARATI LETTER A+GUJARATI VOWEL SIGN AU
GUJARATI LETTER E=GUJARATI LETTER A+GUJARATI VOWEL SIGN E
GUJARATI LETTER I=GUJARATI LETTER A+GUJARATI VOWEL SIGN I
GUJARATI LETTER II=GUJARATI LETTER A+GUJARATI VOWEL SIGN II
GUJARATI LETTER O=GUJARATI LETTER A+GUJARATI VOWEL SIGN O
GUJARATI LETTER U=GUJARATI LETTER A+GUJARATI VOWEL SIGN U
GUJARATI LETTER UU=GUJARATI LETTER A+GUJARATI VOWEL SIGN UU
GUJARATI LETTER VOCALIC R=GUJARATI LETTER A+GUJARATI VOWEL SIGN VOCALIC R
GUJARATI LETTER VOCALIC RR=GUJARATI LETTER A+GUJARATI VOWEL SIGN VOCALIC RR
GUJARATI VOWEL CANDRA E=GUJARATI LETTER A+GUJARATI VOWEL SIGN CANDRA E
GUJARATI VOWEL CANDRA O=GUJARATI LETTER A+GUJARATI VOWEL SIGN CANDRA O
GURMUKHI LETTER AA=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN AA
GURMUKHI LETTER AI=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN AI
GURMUKHI LETTER AU=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN AU
GURMUKHI LETTER EE=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN EE
GURMUKHI LETTER I=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN I
GURMUKHI LETTER II=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN II
GURMUKHI LETTER OO=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN OO
GURMUKHI LETTER U=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN U
GURMUKHI LETTER UU=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN UU
KANNADA LETTER AA=KANNADA LETTER A+KANNADA VOWEL SIGN AA
KANNADA LETTER AI=KANNADA LETTER A+KANNADA VOWEL SIGN AI
KANNADA LETTER AU=KANNADA LETTER A+KANNADA VOWEL SIGN AU
KANNADA LETTER E=KANNADA LETTER A+KANNADA VOWEL SIGN E
KANNADA LETTER EE=KANNADA LETTER A+KANNADA VOWEL SIGN EE
KANNADA LETTER I=KANNADA LETTER A+KANNADA VOWEL SIGN I
KANNADA LETTER II=KANNADA LETTER A+KANNADA VOWEL SIGN II
KANNADA LETTER O=KANNADA LETTER A+KANNADA VOWEL SIGN O
KANNADA LETTER OO=KANNADA LETTER A+KANNADA VOWEL SIGN OO
KANNADA LETTER U=KANNADA LETTER A+KANNADA VOWEL SIGN U
KANNADA LETTER UU=KANNADA LETTER A+KANNADA VOWEL SIGN UU
KANNADA LETTER VOCALIC R=KANNADA LETTER A+KANNADA VOWEL SIGN VOCALIC R
KANNADA LETTER VOCALIC RR=KANNADA LETTER A+KANNADA VOWEL SIGN VOCALIC RR
MALAYALAM LETTER AA=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN AA
MALAYALAM LETTER AI=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN AI
MALAYALAM LETTER AU=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN AU
MALAYALAM LETTER E=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN E
MALAYALAM LETTER EE=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN EE
MALAYALAM LETTER I=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN I
MALAYALAM LETTER II=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN II
MALAYALAM LETTER O=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN O
MALAYALAM LETTER OO=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN OO
MALAYALAM LETTER U=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN U
MALAYALAM LETTER UU=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN UU
MALAYALAM LETTER VOCALIC R=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN VOCALIC R
ORIYA LETTER AA=ORIYA LETTER A+ORIYA VOWEL SIGN AA
ORIYA LETTER AI=ORIYA LETTER A+ORIYA VOWEL SIGN AI
ORIYA LETTER AU=ORIYA LETTER A+ORIYA VOWEL SIGN AU
ORIYA LETTER E=ORIYA LETTER A+ORIYA VOWEL SIGN E
ORIYA LETTER I=ORIYA LETTER A+ORIYA VOWEL SIGN I
ORIYA LETTER II=ORIYA LETTER A+ORIYA VOWEL SIGN II
ORIYA LETTER O=ORIYA LETTER A+ORIYA VOWEL SIGN O
ORIYA LETTER U=ORIYA LETTER A+ORIYA VOWEL SIGN U
ORIYA LETTER UU=ORIYA LETTER A+ORIYA VOWEL SIGN UU
ORIYA LETTER VOCALIC R=ORIYA LETTER A+ORIYA VOWEL SIGN VOCALIC R
TAMIL LETTER AA=TAMIL LETTER A+TAMIL VOWEL SIGN AA
TAMIL LETTER AI=TAMIL LETTER A+TAMIL VOWEL SIGN AI
TAMIL LETTER E=TAMIL LETTER A+TAMIL VOWEL SIGN E
TAMIL LETTER EE=TAMIL LETTER A+TAMIL VOWEL SIGN EE
TAMIL LETTER I=TAMIL LETTER A+TAMIL VOWEL SIGN I
TAMIL LETTER II=TAMIL LETTER A+TAMIL VOWEL SIGN II
TAMIL LETTER O=TAMIL LETTER A+TAMIL VOWEL SIGN O
TAMIL LETTER OO=TAMIL LETTER A+TAMIL VOWEL SIGN OO
TAMIL LETTER U=TAMIL LETTER A+TAMIL VOWEL SIGN U
TAMIL LETTER UU=TAMIL LETTER A+TAMIL VOWEL SIGN UU
TELUGU LETTER AA=TELUGU LETTER A+TELUGU VOWEL SIGN AA
TELUGU LETTER AI=TELUGU LETTER A+TELUGU VOWEL SIGN AI
TELUGU LETTER AU=TELUGU LETTER A+TELUGU VOWEL SIGN AU
TELUGU LETTER E=TELUGU LETTER A+TELUGU VOWEL SIGN E
TELUGU LETTER EE=TELUGU LETTER A+TELUGU VOWEL SIGN EE
TELUGU LETTER I=TELUGU LETTER A+TELUGU VOWEL SIGN I
TELUGU LETTER II=TELUGU LETTER A+TELUGU VOWEL SIGN II
TELUGU LETTER O=TELUGU LETTER A+TELUGU VOWEL SIGN O
TELUGU LETTER OO=TELUGU LETTER A+TELUGU VOWEL SIGN OO
TELUGU LETTER U=TELUGU LETTER A+TELUGU VOWEL SIGN U
TELUGU LETTER UU=TELUGU LETTER A+TELUGU VOWEL SIGN UU
TELUGU LETTER VOCALIC R=TELUGU LETTER A+TELUGU VOWEL SIGN VOCALIC R
TELUGU LETTER VOCALIC RR=TELUGU LETTER A+TELUGU VOWEL SIGN VOCALIC RR

Furthermore, the various Indic scripts are carefully kept in numerical harmony in order to facilitate a simple algorithmic transliteration between them. But unless this structural parallel is explicitly exposed, an ordinary user cannot make use of it (unless the implementor has gone out of its way to give help). Therefore, we introduce a set of semantics to represent the relationship: SEMANTIC BENGALI, SEMANTIC GUJARATI, SEMANTIC GURMUKHI, SEMANTIC KANNADA, SEMANTIC MALAYALAM, SEMANTIC ORIYA, SEMANTIC TAMIL and SEMANTIC TELUGU.

If a font engine uses these in the obvious way, it is possible to switch between 2 Indic scripts by enclosing arbitrary text between START GROUP and POP DIRECTIONAL FORMATTING characters, and appending a SEMANTIC character for the new script. (This assumes that only the last such suggestion has any effect.)

Since Devanagari is the oldest of the scripts, we use that as the basis for decomposition. But in order to maintain cultural neutrality we invent some non-existent Devanagari characters, so that all the scripts share the same structure. We also introduce a semantic for Devanagari itself: if these characters appear ``unadorned´´ by any script suggestion, a rendering agent could decide how to present them based on global user settings (e g, the current locale), rather than just assuming that they are ``real´´ Devanagari.

Amusing engineering side-effects of this machinery include the ability to code (in some sense) a whole new script, by adding a single character to the U C S---and still get some legibility if the character is not understood by the renderer.

So, here´s the list. The boy must be mad ...

BENGALI AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC BENGALI
BENGALI DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC BENGALI
BENGALI DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC BENGALI
BENGALI DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC BENGALI
BENGALI DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC BENGALI
BENGALI DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC BENGALI
BENGALI DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC BENGALI
BENGALI DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC BENGALI
BENGALI DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC BENGALI
BENGALI DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC BENGALI
BENGALI DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC BENGALI
BENGALI LETTER A=DEVANAGARI LETTER A+SEMANTIC BENGALI
BENGALI LETTER BA=DEVANAGARI LETTER BA+SEMANTIC BENGALI
BENGALI LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC BENGALI
BENGALI LETTER CA=DEVANAGARI LETTER CA+SEMANTIC BENGALI
BENGALI LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC BENGALI
BENGALI LETTER DA=DEVANAGARI LETTER DA+SEMANTIC BENGALI
BENGALI LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC BENGALI
BENGALI LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC BENGALI
BENGALI LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC BENGALI
BENGALI LETTER GA=DEVANAGARI LETTER GA+SEMANTIC BENGALI
BENGALI LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC BENGALI
BENGALI LETTER HA=DEVANAGARI LETTER HA+SEMANTIC BENGALI
BENGALI LETTER JA=DEVANAGARI LETTER JA+SEMANTIC BENGALI
BENGALI LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC BENGALI
BENGALI LETTER KA=DEVANAGARI LETTER KA+SEMANTIC BENGALI
BENGALI LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC BENGALI
BENGALI LETTER LA=DEVANAGARI LETTER LA+SEMANTIC BENGALI
BENGALI LETTER MA=DEVANAGARI LETTER MA+SEMANTIC BENGALI
BENGALI LETTER NA=DEVANAGARI LETTER NA+SEMANTIC BENGALI
BENGALI LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC BENGALI
BENGALI LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC BENGALI
BENGALI LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC BENGALI
BENGALI LETTER PA=DEVANAGARI LETTER PA+SEMANTIC BENGALI
BENGALI LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC BENGALI
BENGALI LETTER SA=DEVANAGARI LETTER SA+SEMANTIC BENGALI
BENGALI LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC BENGALI
BENGALI LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC BENGALI
BENGALI LETTER TA=DEVANAGARI LETTER TA+SEMANTIC BENGALI
BENGALI LETTER THA=DEVANAGARI LETTER THA+SEMANTIC BENGALI
BENGALI LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC BENGALI
BENGALI LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC BENGALI
BENGALI LETTER YA=DEVANAGARI LETTER YA+SEMANTIC BENGALI
BENGALI SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING
BENGALI SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING
BENGALI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING
BENGALI SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC BENGALI
BENGALI SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING
BENGALI VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC BENGALI
BENGALI VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC BENGALI
BENGALI VOWEL SIGN E=DEVANAGARI VOWEL SIGN E+SEMANTIC BENGALI
BENGALI VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC BENGALI
BENGALI VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC BENGALI
BENGALI VOWEL SIGN O=DEVANAGARI VOWEL SIGN O+SEMANTIC BENGALI
BENGALI VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC BENGALI
BENGALI VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC BENGALI
BENGALI VOWEL SIGN VOCALIC L=DEVANAGARI VOWEL SIGN VOCALIC L+SEMANTIC BENGALI
BENGALI VOWEL SIGN VOCALIC LL=DEVANAGARI VOWEL SIGN VOCALIC LL+SEMANTIC BENGALI
BENGALI VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC BENGALI
BENGALI VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC BENGALI
GUJARATI DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC GUJARATI
GUJARATI DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC GUJARATI
GUJARATI DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC GUJARATI
GUJARATI DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC GUJARATI
GUJARATI DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC GUJARATI
GUJARATI DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC GUJARATI
GUJARATI DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC GUJARATI
GUJARATI DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC GUJARATI
GUJARATI DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC GUJARATI
GUJARATI DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC GUJARATI
GUJARATI LETTER A=DEVANAGARI LETTER A+SEMANTIC GUJARATI
GUJARATI LETTER BA=DEVANAGARI LETTER BA+SEMANTIC GUJARATI
GUJARATI LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC GUJARATI
GUJARATI LETTER CA=DEVANAGARI LETTER CA+SEMANTIC GUJARATI
GUJARATI LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC GUJARATI
GUJARATI LETTER DA=DEVANAGARI LETTER DA+SEMANTIC GUJARATI
GUJARATI LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC GUJARATI
GUJARATI LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC GUJARATI
GUJARATI LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC GUJARATI
GUJARATI LETTER GA=DEVANAGARI LETTER GA+SEMANTIC GUJARATI
GUJARATI LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC GUJARATI
GUJARATI LETTER HA=DEVANAGARI LETTER HA+SEMANTIC GUJARATI
GUJARATI LETTER JA=DEVANAGARI LETTER JA+SEMANTIC GUJARATI
GUJARATI LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC GUJARATI
GUJARATI LETTER KA=DEVANAGARI LETTER KA+SEMANTIC GUJARATI
GUJARATI LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC GUJARATI
GUJARATI LETTER LA=DEVANAGARI LETTER LA+SEMANTIC GUJARATI
GUJARATI LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC GUJARATI
GUJARATI LETTER MA=DEVANAGARI LETTER MA+SEMANTIC GUJARATI
GUJARATI LETTER NA=DEVANAGARI LETTER NA+SEMANTIC GUJARATI
GUJARATI LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC GUJARATI
GUJARATI LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC GUJARATI
GUJARATI LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC GUJARATI
GUJARATI LETTER PA=DEVANAGARI LETTER PA+SEMANTIC GUJARATI
GUJARATI LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC GUJARATI
GUJARATI LETTER RA=DEVANAGARI LETTER RA+SEMANTIC GUJARATI
GUJARATI LETTER SA=DEVANAGARI LETTER SA+SEMANTIC GUJARATI
GUJARATI LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC GUJARATI
GUJARATI LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC GUJARATI
GUJARATI LETTER TA=DEVANAGARI LETTER TA+SEMANTIC GUJARATI
GUJARATI LETTER THA=DEVANAGARI LETTER THA+SEMANTIC GUJARATI
GUJARATI LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC GUJARATI
GUJARATI LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC GUJARATI
GUJARATI LETTER VA=DEVANAGARI LETTER VA+SEMANTIC GUJARATI
GUJARATI LETTER YA=DEVANAGARI LETTER YA+SEMANTIC GUJARATI
GUJARATI OM=DEVANAGARI OM+SEMANTIC GUJARATI
GUJARATI SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING
GUJARATI SIGN AVAGRAHA=DEVANAGARI SIGN AVAGRAHA+SEMANTIC GUJARATI
GUJARATI SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING
GUJARATI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING
GUJARATI SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC GUJARATI
GUJARATI SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING
GUJARATI VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN CANDRA E=DEVANAGARI VOWEL SIGN CANDRA E+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN CANDRA O=DEVANAGARI VOWEL SIGN CANDRA O+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN E=DEVANAGARI VOWEL SIGN E+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN O=DEVANAGARI VOWEL SIGN O+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC GUJARATI
GUJARATI VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC GUJARATI
GURMUKHI ADDAK=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING
GURMUKHI DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC GURMUKHI
GURMUKHI DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC GURMUKHI
GURMUKHI DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC GURMUKHI
GURMUKHI DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC GURMUKHI
GURMUKHI DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC GURMUKHI
GURMUKHI DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC GURMUKHI
GURMUKHI DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC GURMUKHI
GURMUKHI DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC GURMUKHI
GURMUKHI DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC GURMUKHI
GURMUKHI DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC GURMUKHI
GURMUKHI LETTER A=DEVANAGARI LETTER A+SEMANTIC GURMUKHI
GURMUKHI LETTER BA=DEVANAGARI LETTER BA+SEMANTIC GURMUKHI
GURMUKHI LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC GURMUKHI
GURMUKHI LETTER CA=DEVANAGARI LETTER CA+SEMANTIC GURMUKHI
GURMUKHI LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC GURMUKHI
GURMUKHI LETTER DA=DEVANAGARI LETTER DA+SEMANTIC GURMUKHI
GURMUKHI LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC GURMUKHI
GURMUKHI LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC GURMUKHI
GURMUKHI LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC GURMUKHI
GURMUKHI LETTER GA=DEVANAGARI LETTER GA+SEMANTIC GURMUKHI
GURMUKHI LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC GURMUKHI
GURMUKHI LETTER HA=DEVANAGARI LETTER HA+SEMANTIC GURMUKHI
GURMUKHI LETTER JA=DEVANAGARI LETTER JA+SEMANTIC GURMUKHI
GURMUKHI LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC GURMUKHI
GURMUKHI LETTER KA=DEVANAGARI LETTER KA+SEMANTIC GURMUKHI
GURMUKHI LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC GURMUKHI
GURMUKHI LETTER LA=DEVANAGARI LETTER LA+SEMANTIC GURMUKHI
GURMUKHI LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC GURMUKHI
GURMUKHI LETTER MA=DEVANAGARI LETTER MA+SEMANTIC GURMUKHI
GURMUKHI LETTER NA=DEVANAGARI LETTER NA+SEMANTIC GURMUKHI
GURMUKHI LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC GURMUKHI
GURMUKHI LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC GURMUKHI
GURMUKHI LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC GURMUKHI
GURMUKHI LETTER PA=DEVANAGARI LETTER PA+SEMANTIC GURMUKHI
GURMUKHI LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC GURMUKHI
GURMUKHI LETTER RA=DEVANAGARI LETTER RA+SEMANTIC GURMUKHI
GURMUKHI LETTER SA=DEVANAGARI LETTER SA+SEMANTIC GURMUKHI
GURMUKHI LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC GURMUKHI
GURMUKHI LETTER TA=DEVANAGARI LETTER TA+SEMANTIC GURMUKHI
GURMUKHI LETTER THA=DEVANAGARI LETTER THA+SEMANTIC GURMUKHI
GURMUKHI LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC GURMUKHI
GURMUKHI LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC GURMUKHI
GURMUKHI LETTER VA=DEVANAGARI LETTER VA+SEMANTIC GURMUKHI
GURMUKHI LETTER YA=DEVANAGARI LETTER YA+SEMANTIC GURMUKHI
GURMUKHI SIGN BINDI=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING
GURMUKHI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING
GURMUKHI SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC GURMUKHI
GURMUKHI TIPPI=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC TURNED+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING
GURMUKHI VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC GURMUKHI
GURMUKHI VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC GURMUKHI
KANNADA AI LENGTH MARK=DEVANAGARI AI LENGTH MARK+SEMANTIC KANNADA
KANNADA DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC KANNADA
KANNADA DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC KANNADA
KANNADA DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC KANNADA
KANNADA DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC KANNADA
KANNADA DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC KANNADA
KANNADA DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC KANNADA
KANNADA DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC KANNADA
KANNADA DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC KANNADA
KANNADA DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC KANNADA
KANNADA DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC KANNADA
KANNADA LENGTH MARK=DEVANAGARI LENGTH MARK+SEMANTIC KANNADA
KANNADA LETTER A=DEVANAGARI LETTER A+SEMANTIC KANNADA
KANNADA LETTER BA=DEVANAGARI LETTER BA+SEMANTIC KANNADA
KANNADA LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC KANNADA
KANNADA LETTER CA=DEVANAGARI LETTER CA+SEMANTIC KANNADA
KANNADA LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC KANNADA
KANNADA LETTER DA=DEVANAGARI LETTER DA+SEMANTIC KANNADA
KANNADA LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC KANNADA
KANNADA LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC KANNADA
KANNADA LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC KANNADA
KANNADA LETTER FA=DEVANAGARI LETTER FA+SEMANTIC KANNADA
KANNADA LETTER GA=DEVANAGARI LETTER GA+SEMANTIC KANNADA
KANNADA LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC KANNADA
KANNADA LETTER HA=DEVANAGARI LETTER HA+SEMANTIC KANNADA
KANNADA LETTER JA=DEVANAGARI LETTER JA+SEMANTIC KANNADA
KANNADA LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC KANNADA
KANNADA LETTER KA=DEVANAGARI LETTER KA+SEMANTIC KANNADA
KANNADA LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC KANNADA
KANNADA LETTER LA=DEVANAGARI LETTER LA+SEMANTIC KANNADA
KANNADA LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC KANNADA
KANNADA LETTER MA=DEVANAGARI LETTER MA+SEMANTIC KANNADA
KANNADA LETTER NA=DEVANAGARI LETTER NA+SEMANTIC KANNADA
KANNADA LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC KANNADA
KANNADA LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC KANNADA
KANNADA LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC KANNADA
KANNADA LETTER PA=DEVANAGARI LETTER PA+SEMANTIC KANNADA
KANNADA LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC KANNADA
KANNADA LETTER RA=DEVANAGARI LETTER RA+SEMANTIC KANNADA
KANNADA LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC KANNADA
KANNADA LETTER SA=DEVANAGARI LETTER SA+SEMANTIC KANNADA
KANNADA LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC KANNADA
KANNADA LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC KANNADA
KANNADA LETTER TA=DEVANAGARI LETTER TA+SEMANTIC KANNADA
KANNADA LETTER THA=DEVANAGARI LETTER THA+SEMANTIC KANNADA
KANNADA LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC KANNADA
KANNADA LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC KANNADA
KANNADA LETTER VA=DEVANAGARI LETTER VA+SEMANTIC KANNADA
KANNADA LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC KANNADA
KANNADA LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC KANNADA
KANNADA LETTER YA=DEVANAGARI LETTER YA+SEMANTIC KANNADA
KANNADA SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC KANNADA+POP DIRECTIONAL FORMATTING
KANNADA SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC KANNADA
KANNADA SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC KANNADA+POP DIRECTIONAL FORMATTING
KANNADA VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC KANNADA
KANNADA VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC KANNADA
KANNADA VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC KANNADA
KANNADA VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC KANNADA
KANNADA VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC KANNADA
KANNADA VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC KANNADA
KANNADA VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC KANNADA
KANNADA VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC KANNADA
KANNADA VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC KANNADA
KANNADA VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC KANNADA
KANNADA VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC KANNADA
MALAYALAM AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC MALAYALAM
MALAYALAM DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC MALAYALAM
MALAYALAM DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC MALAYALAM
MALAYALAM DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC MALAYALAM
MALAYALAM DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC MALAYALAM
MALAYALAM DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC MALAYALAM
MALAYALAM DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC MALAYALAM
MALAYALAM DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC MALAYALAM
MALAYALAM DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC MALAYALAM
MALAYALAM DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC MALAYALAM
MALAYALAM DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC MALAYALAM
MALAYALAM LETTER A=DEVANAGARI LETTER A+SEMANTIC MALAYALAM
MALAYALAM LETTER BA=DEVANAGARI LETTER BA+SEMANTIC MALAYALAM
MALAYALAM LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC MALAYALAM
MALAYALAM LETTER CA=DEVANAGARI LETTER CA+SEMANTIC MALAYALAM
MALAYALAM LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC MALAYALAM
MALAYALAM LETTER DA=DEVANAGARI LETTER DA+SEMANTIC MALAYALAM
MALAYALAM LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC MALAYALAM
MALAYALAM LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC MALAYALAM
MALAYALAM LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC MALAYALAM
MALAYALAM LETTER GA=DEVANAGARI LETTER GA+SEMANTIC MALAYALAM
MALAYALAM LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC MALAYALAM
MALAYALAM LETTER HA=DEVANAGARI LETTER HA+SEMANTIC MALAYALAM
MALAYALAM LETTER JA=DEVANAGARI LETTER JA+SEMANTIC MALAYALAM
MALAYALAM LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC MALAYALAM
MALAYALAM LETTER KA=DEVANAGARI LETTER KA+SEMANTIC MALAYALAM
MALAYALAM LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC MALAYALAM
MALAYALAM LETTER LA=DEVANAGARI LETTER LA+SEMANTIC MALAYALAM
MALAYALAM LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC MALAYALAM
MALAYALAM LETTER LLLA=DEVANAGARI LETTER LLLA+SEMANTIC MALAYALAM
MALAYALAM LETTER MA=DEVANAGARI LETTER MA+SEMANTIC MALAYALAM
MALAYALAM LETTER NA=DEVANAGARI LETTER NA+SEMANTIC MALAYALAM
MALAYALAM LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC MALAYALAM
MALAYALAM LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC MALAYALAM
MALAYALAM LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC MALAYALAM
MALAYALAM LETTER PA=DEVANAGARI LETTER PA+SEMANTIC MALAYALAM
MALAYALAM LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC MALAYALAM
MALAYALAM LETTER RA=DEVANAGARI LETTER RA+SEMANTIC MALAYALAM
MALAYALAM LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC MALAYALAM
MALAYALAM LETTER SA=DEVANAGARI LETTER SA+SEMANTIC MALAYALAM
MALAYALAM LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC MALAYALAM
MALAYALAM LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC MALAYALAM
MALAYALAM LETTER TA=DEVANAGARI LETTER TA+SEMANTIC MALAYALAM
MALAYALAM LETTER THA=DEVANAGARI LETTER THA+SEMANTIC MALAYALAM
MALAYALAM LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC MALAYALAM
MALAYALAM LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC MALAYALAM
MALAYALAM LETTER VA=DEVANAGARI LETTER VA+SEMANTIC MALAYALAM
MALAYALAM LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC MALAYALAM
MALAYALAM LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC MALAYALAM
MALAYALAM LETTER VOCALIC RR=DEVANAGARI LETTER VOCALIC RR+SEMANTIC MALAYALAM
MALAYALAM LETTER YA=DEVANAGARI LETTER YA+SEMANTIC MALAYALAM
MALAYALAM SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC MALAYALAM+POP DIRECTIONAL FORMATTING
MALAYALAM SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC MALAYALAM
MALAYALAM SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
MALAYALAM VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC MALAYALAM
MALAYALAM VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC MALAYALAM
ORIYA AI LENGTH MARK=DEVANAGARI AI LENGTH MARK+SEMANTIC ORIYA
ORIYA AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC ORIYA
ORIYA DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC ORIYA
ORIYA DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC ORIYA
ORIYA DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC ORIYA
ORIYA DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC ORIYA
ORIYA DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC ORIYA
ORIYA DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC ORIYA
ORIYA DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC ORIYA
ORIYA DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC ORIYA
ORIYA DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC ORIYA
ORIYA DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC ORIYA
ORIYA LETTER A=DEVANAGARI LETTER A+SEMANTIC ORIYA
ORIYA LETTER BA=DEVANAGARI LETTER BA+SEMANTIC ORIYA
ORIYA LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC ORIYA
ORIYA LETTER CA=DEVANAGARI LETTER CA+SEMANTIC ORIYA
ORIYA LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC ORIYA
ORIYA LETTER DA=DEVANAGARI LETTER DA+SEMANTIC ORIYA
ORIYA LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC ORIYA
ORIYA LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC ORIYA
ORIYA LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC ORIYA
ORIYA LETTER GA=DEVANAGARI LETTER GA+SEMANTIC ORIYA
ORIYA LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC ORIYA
ORIYA LETTER HA=DEVANAGARI LETTER HA+SEMANTIC ORIYA
ORIYA LETTER JA=DEVANAGARI LETTER JA+SEMANTIC ORIYA
ORIYA LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC ORIYA
ORIYA LETTER KA=DEVANAGARI LETTER KA+SEMANTIC ORIYA
ORIYA LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC ORIYA
ORIYA LETTER LA=DEVANAGARI LETTER LA+SEMANTIC ORIYA
ORIYA LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC ORIYA
ORIYA LETTER MA=DEVANAGARI LETTER MA+SEMANTIC ORIYA
ORIYA LETTER NA=DEVANAGARI LETTER NA+SEMANTIC ORIYA
ORIYA LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC ORIYA
ORIYA LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC ORIYA
ORIYA LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC ORIYA
ORIYA LETTER PA=DEVANAGARI LETTER PA+SEMANTIC ORIYA
ORIYA LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC ORIYA
ORIYA LETTER RA=DEVANAGARI LETTER RA+SEMANTIC ORIYA
ORIYA LETTER SA=DEVANAGARI LETTER SA+SEMANTIC ORIYA
ORIYA LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC ORIYA
ORIYA LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC ORIYA
ORIYA LETTER TA=DEVANAGARI LETTER TA+SEMANTIC ORIYA
ORIYA LETTER THA=DEVANAGARI LETTER THA+SEMANTIC ORIYA
ORIYA LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC ORIYA
ORIYA LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC ORIYA
ORIYA LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC ORIYA
ORIYA LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC ORIYA
ORIYA LETTER VOCALIC RR=DEVANAGARI LETTER VOCALIC RR+SEMANTIC ORIYA
ORIYA LETTER YA=DEVANAGARI LETTER YA+SEMANTIC ORIYA
ORIYA SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING
ORIYA SIGN AVAGRAHA=DEVANAGARI SIGN AVAGRAHA+SEMANTIC ORIYA
ORIYA SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING
ORIYA SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING
ORIYA SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC ORIYA
ORIYA SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING
ORIYA VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC ORIYA
ORIYA VOWEL SIGN E=DEVANAGARI VOWEL SIGN E+SEMANTIC ORIYA
ORIYA VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC ORIYA
ORIYA VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC ORIYA
ORIYA VOWEL SIGN O=DEVANAGARI VOWEL SIGN O+SEMANTIC ORIYA
ORIYA VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC ORIYA
ORIYA VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC ORIYA
ORIYA VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC ORIYA
TAMIL AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC TAMIL
TAMIL DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC TAMIL
TAMIL DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC TAMIL
TAMIL DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC TAMIL
TAMIL DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC TAMIL
TAMIL DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC TAMIL
TAMIL DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC TAMIL
TAMIL DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC TAMIL
TAMIL DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC TAMIL
TAMIL DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC TAMIL
TAMIL LETTER A=DEVANAGARI LETTER A+SEMANTIC TAMIL
TAMIL LETTER CA=DEVANAGARI LETTER CA+SEMANTIC TAMIL
TAMIL LETTER HA=DEVANAGARI LETTER HA+SEMANTIC TAMIL
TAMIL LETTER JA=DEVANAGARI LETTER JA+SEMANTIC TAMIL
TAMIL LETTER KA=DEVANAGARI LETTER KA+SEMANTIC TAMIL
TAMIL LETTER LA=DEVANAGARI LETTER LA+SEMANTIC TAMIL
TAMIL LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC TAMIL
TAMIL LETTER LLLA=DEVANAGARI LETTER LLLA+SEMANTIC TAMIL
TAMIL LETTER MA=DEVANAGARI LETTER MA+SEMANTIC TAMIL
TAMIL LETTER NA=DEVANAGARI LETTER NA+SEMANTIC TAMIL
TAMIL LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC TAMIL
TAMIL LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC TAMIL
TAMIL LETTER NNNA=DEVANAGARI LETTER NNNA+SEMANTIC TAMIL
TAMIL LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC TAMIL
TAMIL LETTER PA=DEVANAGARI LETTER PA+SEMANTIC TAMIL
TAMIL LETTER RA=DEVANAGARI LETTER RA+SEMANTIC TAMIL
TAMIL LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC TAMIL
TAMIL LETTER SA=DEVANAGARI LETTER SA+SEMANTIC TAMIL
TAMIL LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC TAMIL
TAMIL LETTER TA=DEVANAGARI LETTER TA+SEMANTIC TAMIL
TAMIL LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC TAMIL
TAMIL LETTER VA=DEVANAGARI LETTER VA+SEMANTIC TAMIL
TAMIL LETTER YA=DEVANAGARI LETTER YA+SEMANTIC TAMIL
TAMIL SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC TAMIL+POP DIRECTIONAL FORMATTING
TAMIL SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC TAMIL
TAMIL SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC TAMIL+POP DIRECTIONAL FORMATTING
TAMIL VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC TAMIL
TAMIL VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC TAMIL
TAMIL VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC TAMIL
TAMIL VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC TAMIL
TAMIL VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC TAMIL
TAMIL VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC TAMIL
TAMIL VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC TAMIL
TAMIL VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC TAMIL
TAMIL VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC TAMIL
TAMIL VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC TAMIL
TELUGU AI LENGTH MARK=DEVANAGARI AI LENGTH MARK+SEMANTIC TELUGU
TELUGU DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC TELUGU
TELUGU DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC TELUGU
TELUGU DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC TELUGU
TELUGU DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC TELUGU
TELUGU DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC TELUGU
TELUGU DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC TELUGU
TELUGU DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC TELUGU
TELUGU DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC TELUGU
TELUGU DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC TELUGU
TELUGU DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC TELUGU
TELUGU LENGTH MARK=DEVANAGARI LENGTH MARK+SEMANTIC TELUGU
TELUGU LETTER A=DEVANAGARI LETTER A+SEMANTIC TELUGU
TELUGU LETTER BA=DEVANAGARI LETTER BA+SEMANTIC TELUGU
TELUGU LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC TELUGU
TELUGU LETTER CA=DEVANAGARI LETTER CA+SEMANTIC TELUGU
TELUGU LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC TELUGU
TELUGU LETTER DA=DEVANAGARI LETTER DA+SEMANTIC TELUGU
TELUGU LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC TELUGU
TELUGU LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC TELUGU
TELUGU LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC TELUGU
TELUGU LETTER GA=DEVANAGARI LETTER GA+SEMANTIC TELUGU
TELUGU LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC TELUGU
TELUGU LETTER HA=DEVANAGARI LETTER HA+SEMANTIC TELUGU
TELUGU LETTER JA=DEVANAGARI LETTER JA+SEMANTIC TELUGU
TELUGU LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC TELUGU
TELUGU LETTER KA=DEVANAGARI LETTER KA+SEMANTIC TELUGU
TELUGU LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC TELUGU
TELUGU LETTER LA=DEVANAGARI LETTER LA+SEMANTIC TELUGU
TELUGU LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC TELUGU
TELUGU LETTER MA=DEVANAGARI LETTER MA+SEMANTIC TELUGU
TELUGU LETTER NA=DEVANAGARI LETTER NA+SEMANTIC TELUGU
TELUGU LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC TELUGU
TELUGU LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC TELUGU
TELUGU LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC TELUGU
TELUGU LETTER PA=DEVANAGARI LETTER PA+SEMANTIC TELUGU
TELUGU LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC TELUGU
TELUGU LETTER RA=DEVANAGARI LETTER RA+SEMANTIC TELUGU
TELUGU LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC TELUGU
TELUGU LETTER SA=DEVANAGARI LETTER SA+SEMANTIC TELUGU
TELUGU LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC TELUGU
TELUGU LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC TELUGU
TELUGU LETTER TA=DEVANAGARI LETTER TA+SEMANTIC TELUGU
TELUGU LETTER THA=DEVANAGARI LETTER THA+SEMANTIC TELUGU
TELUGU LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC TELUGU
TELUGU LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC TELUGU
TELUGU LETTER VA=DEVANAGARI LETTER VA+SEMANTIC TELUGU
TELUGU LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC TELUGU
TELUGU LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC TELUGU
TELUGU LETTER YA=DEVANAGARI LETTER YA+SEMANTIC TELUGU
TELUGU SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC TELUGU+POP DIRECTIONAL FORMATTING
TELUGU SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC TELUGU+POP DIRECTIONAL FORMATTING
TELUGU SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC TELUGU
TELUGU SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC TELUGU+POP DIRECTIONAL FORMATTING
TELUGU VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC TELUGU
TELUGU VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC TELUGU
TELUGU VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC TELUGU
TELUGU VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC TELUGU
TELUGU VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC TELUGU
TELUGU VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC TELUGU
TELUGU VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC TELUGU
TELUGU VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC TELUGU
TELUGU VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC TELUGU
TELUGU VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC TELUGU
TELUGU VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC TELUGU
TELUGU VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC TELUGU

Some accents are present in other parts of Unicode, and (provided the script suggestion is taken seriously) can be represented as follows:

DEVANAGARI ACUTE ACCENT=SEMANTIC ABOVE+START GROUP+ACUTE ACCENT+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI DANDA=VERTICAL LINE+SEMANTIC DEVANAGARI
DEVANAGARI DOUBLE DANDA=DOUBLE VERTICAL LINE+SEMANTIC DEVANAGARI
DEVANAGARI GRAVE ACCENT=SEMANTIC ABOVE+START GROUP+GRAVE ACCENT+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI STRESS SIGN ANUDATTA=SEMANTIC BELOW+START GROUP+MACRON+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
DEVANAGARI STRESS SIGN UDATTA=SEMANTIC ABOVE+START GROUP+MODIFIER LETTER VERTICAL LINE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING

Thai/Lao

There is a similar relationship between the Thai and Lao scripts. Again, we provide both SEMANTIC THAI and SEMANTIC LAO, even though we only use 1 of them, so that automatic transliteration can be done in either direction.

LAO CANCELLATION MARK=THAI CHARACTER THANTHAKHAT+SEMANTIC LAO
LAO DIGIT EIGHT=THAI DIGIT EIGHT+SEMANTIC LAO
LAO DIGIT FIVE=THAI DIGIT FIVE+SEMANTIC LAO
LAO DIGIT FOUR=THAI DIGIT FOUR+SEMANTIC LAO
LAO DIGIT NINE=THAI DIGIT NINE+SEMANTIC LAO
LAO DIGIT ONE=THAI DIGIT ONE+SEMANTIC LAO
LAO DIGIT SEVEN=THAI DIGIT SEVEN+SEMANTIC LAO
LAO DIGIT SIX=THAI DIGIT SIX+SEMANTIC LAO
LAO DIGIT THREE=THAI DIGIT THREE+SEMANTIC LAO
LAO DIGIT TWO=THAI DIGIT TWO+SEMANTIC LAO
LAO DIGIT ZERO=THAI DIGIT ZERO+SEMANTIC LAO
LAO ELLIPSIS=THAI CHARACTER PAIYANNOI+SEMANTIC LAO
LAO KO LA=THAI CHARACTER MAIYAMOK+SEMANTIC LAO
LAO LETTER BO=THAI CHARACTER BO BAIMAI+SEMANTIC LAO
LAO LETTER CO=THAI CHARACTER CHO CHAN+SEMANTIC LAO
LAO LETTER DO=THAI CHARACTER DO DEK+SEMANTIC LAO
LAO LETTER FO SUNG=THAI CHARACTER FO FAN+SEMANTIC LAO
LAO LETTER FO TAM=THAI CHARACTER FO FA+SEMANTIC LAO
LAO LETTER HO SUNG=THAI CHARACTER HO HIP+SEMANTIC LAO
LAO LETTER HO TAM=THAI CHARACTER HO NOKHUK+SEMANTIC LAO
LAO LETTER KHO SUNG=THAI CHARACTER KHO KHAI+SEMANTIC LAO
LAO LETTER KHO TAM=THAI CHARACTER KHO KHWAI+SEMANTIC LAO
LAO LETTER KO=THAI CHARACTER KO KAI+SEMANTIC LAO
LAO LETTER LO LING=THAI CHARACTER RO RUA+SEMANTIC LAO
LAO LETTER LO LOOT=THAI CHARACTER LO LING+SEMANTIC LAO
LAO LETTER MO=THAI CHARACTER MO MA+SEMANTIC LAO
LAO LETTER NGO=THAI CHARACTER NGO NGU+SEMANTIC LAO
LAO LETTER NO=THAI CHARACTER NO NU+SEMANTIC LAO
LAO LETTER NYO=THAI CHARACTER YO YING+SEMANTIC LAO
LAO LETTER O=THAI CHARACTER O ANG+SEMANTIC LAO
LAO LETTER PHO SUNG=THAI CHARACTER PHO PHUNG+SEMANTIC LAO
LAO LETTER PHO TAM=THAI CHARACTER PHO PHAN+SEMANTIC LAO
LAO LETTER PO=THAI CHARACTER PO PLA+SEMANTIC LAO
LAO LETTER SO SUNG=THAI CHARACTER SO SUA+SEMANTIC LAO
LAO LETTER SO TAM=THAI CHARACTER CHO CHANG+SEMANTIC LAO
LAO LETTER THO SUNG=THAI CHARACTER THO THUNG+SEMANTIC LAO
LAO LETTER THO TAM=THAI CHARACTER THO THAHAN+SEMANTIC LAO
LAO LETTER TO=THAI CHARACTER TO TAO+SEMANTIC LAO
LAO LETTER WO=THAI CHARACTER WO WAEN+SEMANTIC LAO
LAO LETTER YO=THAI CHARACTER YO YAK+SEMANTIC LAO
LAO NIGGAHITA=THAI CHARACTER NIKHAHIT+SEMANTIC LAO
LAO TONE MAI CATAWA=THAI CHARACTER MAI CHATTAWA+SEMANTIC LAO
LAO TONE MAI EK=THAI CHARACTER MAI EK+SEMANTIC LAO
LAO TONE MAI THO=THAI CHARACTER MAI THO+SEMANTIC LAO
LAO TONE MAI TI=THAI CHARACTER MAI TRI+SEMANTIC LAO
LAO VOWEL SIGN A=THAI CHARACTER SARA A+SEMANTIC LAO
LAO VOWEL SIGN AA=THAI CHARACTER SARA AA+SEMANTIC LAO
LAO VOWEL SIGN AI=THAI CHARACTER SARA AI MAIMALAI+SEMANTIC LAO
LAO VOWEL SIGN AY=THAI CHARACTER SARA AI MAIMUAN+SEMANTIC LAO
LAO VOWEL SIGN E=THAI CHARACTER SARA E+SEMANTIC LAO
LAO VOWEL SIGN EI=THAI CHARACTER SARA AE+SEMANTIC LAO
LAO VOWEL SIGN I=THAI CHARACTER SARA I+SEMANTIC LAO
LAO VOWEL SIGN II=THAI CHARACTER SARA II+SEMANTIC LAO
LAO VOWEL SIGN MAI KAN=THAI CHARACTER MAI HAN-AKAT+SEMANTIC LAO
LAO VOWEL SIGN O=THAI CHARACTER SARA O+SEMANTIC LAO
LAO VOWEL SIGN U=THAI CHARACTER SARA U+SEMANTIC LAO
LAO VOWEL SIGN UU=THAI CHARACTER SARA UU+SEMANTIC LAO
LAO VOWEL SIGN Y=THAI CHARACTER SARA UE+SEMANTIC LAO
LAO VOWEL SIGN YY=THAI CHARACTER SARA UEE+SEMANTIC LAO

Also,

THAI CHARACTER SARA AM=ZERO WIDTH NO-BREAK SPACE+THAI CHARACTER NIKHAHIT+THAI CHARACTER SARA A
LAO VOWEL SIGN AM=ZERO WIDTH NO-BREAK SPACE+LAO NIGGAHITA+LAO VOWEL SIGN A

Hangul

There is lots of structure left in the Hangul Jamo block, even after factoring out the compatibility breakdowns and the glyph variants already treated. These 17 glyphs are encoded twice each (once as CHOSEONG, once as JONGSEONG), but they are visually identical. Maybe they need a helpful ``SEMANTIC´´ marker to say what´s happened---for sorting purposes, maybe?---but I don't know enough to say.

HANGUL JONGSEONG KIYEOK=HANGUL CHOSEONG KIYEOK
HANGUL JONGSEONG NIEUN=HANGUL CHOSEONG NIEUN
HANGUL JONGSEONG TIKEUT=HANGUL CHOSEONG TIKEUT
HANGUL JONGSEONG RIEUL=HANGUL CHOSEONG RIEUL
HANGUL JONGSEONG MIEUM=HANGUL CHOSEONG MIEUM
HANGUL JONGSEONG PIEUP=HANGUL CHOSEONG PIEUP
HANGUL JONGSEONG SIOS=HANGUL CHOSEONG SIOS
HANGUL JONGSEONG IEUNG=HANGUL CHOSEONG IEUNG
HANGUL JONGSEONG CIEUC=HANGUL CHOSEONG CIEUC
HANGUL JONGSEONG CHIEUCH=HANGUL CHOSEONG CHIEUCH
HANGUL JONGSEONG KHIEUKH=HANGUL CHOSEONG KHIEUKH
HANGUL JONGSEONG THIEUTH=HANGUL CHOSEONG THIEUTH
HANGUL JONGSEONG PHIEUPH=HANGUL CHOSEONG PHIEUPH
HANGUL JONGSEONG HIEUH=HANGUL CHOSEONG HIEUH
HANGUL JONGSEONG YESIEUNG=HANGUL CHOSEONG YESIEUNG
HANGUL JONGSEONG YEORINHIEUH=HANGUL CHOSEONG YEORINHIEUH
HANGUL JUNGSEONG FILLER=HANGUL CHOSEONG FILLER

Syntax

In the sequence

      LATIN CAPITAL LETTER A
      SEMANTIC LIGATURE
      LATIN CAPITAL LETTER E
      SEMANTIC ITALIC
      

does the effect of the ``italic´´ include both of the characters, or only one?

We answer all questions of this type by giving a formal syntax of the relationships between the characters.

This syntax is not intended to be prescriptive---any sequence of characters can be considered ``valid´´, in some sense---but clearly, the introduction of characters which affect the others around them leads to some questions over how far the effects travel.

Define a binary character BIN as one of the 6 FRACTION SLASH, SEMANTIC OVERPRINT, SEMANTIC ABOVE, SEMANTIC AFTER, SEMANTIC BEFORE, SEMANTIC BELOW. (These can be given a new combining class---say 999---to indicate their special status.) Define a unary character UN as any of the existing combining characters, recognised by having a combining class > 0 but < 999. Define an opener SG as any of the characters START GROUP, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, LEFT-TO-RIGHT OVERRIDE, RIGHT-TO-LEFT OVERRIDE that start a new level of grouping, and a closer PDF as the character POP DIRECTIONAL FORMATTING. Define a base character BASE as anything else.

Then we can define a list of characters list as follows.

      list            = empty
                      | primary list
      primary         = secondary
                      | secondary unary-primary
      secondary       = BASE
                      | SG list PDF
      unary-primary   = unary-secondary
                      | unary-secondary unary-primary
      unary-secondary = BIN secondary
                      | UN
      binary-primary  = BIN
      

which means we parse the example above as

      list
         primary
            secondary
               BASE
                  LATIN CAPITAL LETTER A
            unary-primary
               unary-secondary
                  BIN
                     SEMANTIC LIGATURE
                  secondary
                     BASE
                        LATIN CAPITAL LETTER E
               unary-primary
                  unary-secondary
                     UN
                        SEMANTIC ITALIC
         list
            empty
      

and the answer to the question is that the italic applies to the whole ligature. For a ligature of roman A with italic E, you would write

      LATIN CAPITAL LETTER A
      SEMANTIC LIGATURE
      START GROUP
      LATIN CAPITAL LETTER E
      SEMANTIC ITALIC
      POP DIRECTIONAL FORMATTING
      

The definitions of unary-primary and binary primary are also useful because we can give this rule for decompositions: the decomposition of a base character must be a primary; of a unary character, a unary-primary; and of a binary character, a binary-primary. (Actually, there are none of the last.) This assures us that decompositions will behave syntactically in the same way regardless of whether they are treated as a single character or a character list.

The reason for having a completely left-associative grammar is to allow a renderer to render ``as much as it has´´ at any time. This is obviously important.

Summary

This note considers the 6588 letters and symbols encoded in Unicode 2.1, as well as CJK ideographs, and attempts to find and encode as much structure as possible. The result is a list of characters which can be considered ``primitive´´ in some sense. It turns out that there are only 1419 of these.