I wonder how many ``primitive characters´´ are encoded in Unicode?
This question has diverted me from several perspectives, and here I attempt to answer it.
I posted ``Some thoughts on character decomposition´´ on 4th June 1999 to the Unicode mailing list. Since then I have made a more thorough examination of the ideas I considered there. The main motivation is to simplify Unicode for engineers by providing more structure within the standard: this allows a lot of characters to be implemented by following a few clearly-stated rules; and at the same time, make the character set more extensible, thereby making it more universal.
It has the side effect of giving more control to the users of the standard by ``opening it up´´ so that people in special fields (e g, mathematics, phonetics), or those who just want novel effects in text, can have them without needing to petition the standardising body. This, I think, is what makes it more than just an exercise in classification.
Since this has been an exercise in trying to understand the internal structure of the U C S, I have called it the ``Atomic Theory´´ of Unicode. Maybe the analogy with chemistry would be closer, as single characters are like atoms, able to interact in their own right, or to join in various ways to make molecules.
My aim has been to identify the largest possible set of semantic decompositions, by using (or abusing) the ``markup´´ tags present in the decomposition fields of UNIDATA.TXT as explicit modifier characters, and by making explicit some of the information that is only represented in the name or visual appearance of the character. This is done with a mixture of existing combining characters, some new combining characters, and a new type of character called a SEMANTIC.
This resolves another question, as well: there is a script alphabet in the U C S, consisting of the characters a B Ee F g H I Ll M o P R Vv. There is also a ``turned´´ alphabet of Aa Cc Ee Ff h k m r t v w y (the longest word you can write upside-down in Unicode is `aftereffect´). Similar remarks apply to other alphabets. On one hand, it doesn´t make sense to have such an arbitrary set of characters; on the other hand, there is no obvious requirement for the others. The resolution is to give them all decompositions, putting them all on an equal footing.
The character START GROUP is needed to make this work. It is an open bracket, like LEFT-TO-RIGHT OVERRIDE but without any directional implication, terminated in the same way: by POP DIRECTIONAL FORMATTING.
I see the value of a decomposition as lying in 2 places: firstly, it provides new structure to existing characters, which can let rendering software make substitutions in an intelligent way, and thereby increase the readability of text to everyone (in other words, `R´ is better than `?´ as a rendering of DOUBLE-STRUCK CAPITAL R); and second, it may be productive as a means for characters to be generated without having to get new characters encoded (in other words, it gives access to (*)DOUBLE-STRUCK CAPITAL F, should anyone need it).
The second point is important, because it allows us to recapitulate the way many characters entered common use in the first place. The character LATIN SMALL LETTER TURNED Y did not just appear: it was adopted because a new symbol was needed, and the typographic technology made it convenient. So it seems sensible to acknowledge that LATIN SMALL LETTER TURNED Y is a LATIN SMALL LETTER Y that has had some sort of process applied to it.
I have also listed the characters whose definition would be affected. It is intended to be complete. An accompanying Tcl programme, vunicode, reads in files UNIDATA.TXT and PUDATA.TXT that define the available characters, and also reads this file and locates the new decompositions. Then it emits 2 files: prim.txt, a list of primitive characters, and comp.txt, a list of composite characters with decomposition. It checks there are no errors, as far as possible.
This requests that a black-letter, or fraktur, font be used. Certain mathematical symbols are conventionally written this way, and German mathematical publishing sometimes uses fraktur rather then heavy (or bold) for vectors.
There are 5 black-letter characters in the U C S.
BLACK-LETTER CAPITAL C=LATIN CAPITAL LETTER C+SEMANTIC BLACK-LETTER BLACK-LETTER CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC BLACK-LETTER BLACK-LETTER CAPITAL I=LATIN CAPITAL LETTER I+SEMANTIC BLACK-LETTER BLACK-LETTER CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC BLACK-LETTER BLACK-LETTER CAPITAL Z=LATIN CAPITAL LETTER Z+SEMANTIC BLACK-LETTER
The lower-case alphabet is available as LATIN SMALL LETTER whatever+SEMANTIC BLACK-LETTER, and because these are canonical decompositions, the resulting output would be completely compatible, visually and for all processing purposes, with the 5 precomposed forms already encoded.
In handwriting, these letters are written in a form called ``Sütterlin´´. This is regarded as a glyph variation of the fraktur alphabet.
Cannot be done algorithmically: either you have the right font, or you don´t. Falling back to the base glyph is likely to give good results though.
This suggests, of a digit, that a variant glyph be used with a style suitable for marking Zhuang tone. It is for the following:
LATIN CAPITAL LETTER TONE FIVE=DIGIT FIVE+SEMANTIC CAPITAL LETTER TONE LATIN CAPITAL LETTER TONE SIX=DIGIT SIX+SEMANTIC CAPITAL LETTER TONE LATIN CAPITAL LETTER TONE TWO=DIGIT TWO+SEMANTIC CAPITAL LETTER TONE
There is a relationship between CYRILLIC CAPITAL LETTER CHE and DIGIT FOUR+SEMANTIC CAPITAL LETTER TONE, and also between CYRILLIC CAPITAL LETTER ZE and DIGIT THREE+SEMANTIC CAPITAL LETTER TONE, in that they are likely to be the same glyph; but it would be odd to give decompositions like (*)CYRILLIC CAPITAL LETTER CHE=DIGIT FOUR+SEMANTIC CAPITAL LETTER TONE, as this would imply the wrong historical relationship. Instead, uses of CYRILLIC CAPITAL LETTER CHE as a tone mark should simply be superseded by DIGIT FOUR+SEMANTIC CAPITAL LETTER TONE.
By encoding this character, it becomes possible for sophisticated software to render suitable glyphs for all the tone letters, without needing separate encodings for tones 3, 4.
Requests that a sequence of characters be rendered as if they were the name of an ISO control character. There are 35 of these encoded already, but the C1 range contains another 32 which at the moment are second-class citizens. This SEMANTIC puts them all on an even footing.
SYMBOL FOR ACKNOWLEDGE=START GROUP+LATIN CAPITAL LETTER A+LATIN CAPITAL LETTER C+LATIN CAPITAL LETTER K+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR BACKSPACE=START GROUP+LATIN CAPITAL LETTER B+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR BELL=START GROUP+LATIN CAPITAL LETTER B+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR CANCEL=START GROUP+LATIN CAPITAL LETTER C+LATIN CAPITAL LETTER A+LATIN CAPITAL LETTER N+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR CARRIAGE RETURN=START GROUP+LATIN CAPITAL LETTER C+LATIN CAPITAL LETTER R+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR DATA LINK ESCAPE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER L+LATIN CAPITAL LETTER E+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR DELETE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR DEVICE CONTROL FOUR=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT FOUR+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR DEVICE CONTROL ONE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT ONE+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR DEVICE CONTROL THREE=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT THREE+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR DEVICE CONTROL TWO=START GROUP+LATIN CAPITAL LETTER D+LATIN CAPITAL LETTER C+DIGIT TWO+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR END OF MEDIUM=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER M+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR END OF TEXT=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER T+LATIN CAPITAL LETTER X+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR END OF TRANSMISSION=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER O+LATIN CAPITAL LETTER T+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR END OF TRANSMISSION BLOCK=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER T+LATIN CAPITAL LETTER B+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR ENQUIRY=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER Q+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR ESCAPE=START GROUP+LATIN CAPITAL LETTER E+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER C+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR FILE SEPARATOR=START GROUP+LATIN CAPITAL LETTER F+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR FORM FEED=START GROUP+LATIN CAPITAL LETTER F+LATIN CAPITAL LETTER F+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR GROUP SEPARATOR=START GROUP+LATIN CAPITAL LETTER G+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR HORIZONTAL TABULATION=START GROUP+LATIN CAPITAL LETTER H+LATIN CAPITAL LETTER T+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR LINE FEED=START GROUP+LATIN CAPITAL LETTER L+LATIN CAPITAL LETTER F+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR NEGATIVE ACKNOWLEDGE=START GROUP+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER A+LATIN CAPITAL LETTER K+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR NEWLINE=START GROUP+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR NULL=START GROUP+LATIN CAPITAL LETTER N+LATIN CAPITAL LETTER U+LATIN CAPITAL LETTER L+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR RECORD SEPARATOR=START GROUP+LATIN CAPITAL LETTER R+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR SHIFT IN=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER I+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR SHIFT OUT=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER O+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR SPACE=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER P+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR START OF HEADING=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER O+LATIN CAPITAL LETTER H+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR START OF TEXT=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER T+LATIN CAPITAL LETTER X+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR SUBSTITUTE=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER U+LATIN CAPITAL LETTER B+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR SYNCHRONOUS IDLE=START GROUP+LATIN CAPITAL LETTER S+LATIN CAPITAL LETTER Y+LATIN CAPITAL LETTER N+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR UNIT SEPARATOR=START GROUP+LATIN CAPITAL LETTER U+LATIN CAPITAL LETTER S+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL SYMBOL FOR VERTICAL TABULATION=START GROUP+LATIN CAPITAL LETTER V+LATIN CAPITAL LETTER T+POP DIRECTIONAL FORMATTING+SEMANTIC CONTROL SYMBOL
The glyphs could be rendered smaller, or as sequences with a raised first character and a lowered second, or in inverse video, for example. The standard abbreviations for the C1 range of control codes are (in order 0080 to 00A0) PAD, HOP, BPH, NBH, IND, NEL, SSA, ESA, HTS, HTJ, VTS, PLD, PLU, RI, SS2, SS3, DCS, PU1, PU2, STS, CCH, MW, SPA, EPA, SOS, SGCI, SCI, CSI, ST, OSC, PM, APC, NBSP, and we also have
BREAK PERMITTED HERE=ZERO WIDTH SPACE CARRIAGE RETURN=ZERO WIDTH NO-BREAK SPACE CHARACTER TABULATION SET=COLUMN SEPARATOR CHARACTER TABULATION WITH JUSTIFICATION=HAIR SPACE+COLUMN SEPARATOR FORM FEED=LINE SEPARATOR HORIZONTAL TABULATION=COLUMN SEPARATOR LINE FEED=LINE SEPARATOR LINE TABULATION SET=LINE SEPARATOR NEXT LINE=LINE SEPARATOR NO BREAK HERE=ZERO WIDTH NO-BREAK SPACE SUBSTITUTE=REPLACEMENT CHARACTER VERTICAL TABULATION=LINE SEPARATOR
(Formerly, something like a printer would have expected any of CARRIAGE RETURN+LINE FEED, CARRIAGE RETURN+VERTICAL TABULATION or CARRIAGE RETURN+FORM FEED to start a new line. This is why we discard CARRIAGE RETURN but treat LINE FEED, VERTICAL TABULATION, FORM FEED or NEW LINE as LINE SEPARATOR.)
Requests that a double-struck, ``open-face´´, ``blackboard bold´´ font be used.
Used in
DOUBLE-STRUCK CAPITAL C=LATIN CAPITAL LETTER C+SEMANTIC DOUBLE-STRUCK DOUBLE-STRUCK CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC DOUBLE-STRUCK DOUBLE-STRUCK CAPITAL N=LATIN CAPITAL LETTER N+SEMANTIC DOUBLE-STRUCK DOUBLE-STRUCK CAPITAL P=LATIN CAPITAL LETTER P+SEMANTIC DOUBLE-STRUCK DOUBLE-STRUCK CAPITAL Q=LATIN CAPITAL LETTER Q+SEMANTIC DOUBLE-STRUCK DOUBLE-STRUCK CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC DOUBLE-STRUCK DOUBLE-STRUCK CAPITAL Z=LATIN CAPITAL LETTER Z+SEMANTIC DOUBLE-STRUCK
and arguably in
CIRCLED OPEN CENTRE EIGHT POINTED STAR=EIGHT POINTED BLACK STAR+SEMANTIC DOUBLE-STRUCK+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DOWNWARDS DOUBLE ARROW=DOWNWARDS ARROW+SEMANTIC DOUBLE-STRUCK LEFT RIGHT DOUBLE ARROW=LEFT RIGHT ARROW+SEMANTIC DOUBLE-STRUCK LEFTWARDS DOUBLE ARROW=LEFTWARDS ARROW+SEMANTIC DOUBLE-STRUCK NORTH EAST DOUBLE ARROW=NORTH EAST ARROW+SEMANTIC DOUBLE-STRUCK NORTH WEST DOUBLE ARROW=NORTH WEST ARROW+SEMANTIC DOUBLE-STRUCK OPEN CENTRE ASTERISK=HEAVY ASTERISK+SEMANTIC DOUBLE-STRUCK OPEN CENTRE BLACK STAR=BLACK STAR+SEMANTIC DOUBLE-STRUCK OPEN CENTRE CROSS=PLUS SIGN+SEMANTIC DOUBLE-STRUCK OPEN CENTRE TEARDROP-SPOKED ASTERISK=TEARDROP-SPOKED ASTERISK+SEMANTIC DOUBLE-STRUCK RIGHTWARDS DOUBLE ARROW=RIGHTWARDS ARROW+SEMANTIC DOUBLE-STRUCK SOUTH EAST DOUBLE ARROW=SOUTH EAST ARROW+SEMANTIC DOUBLE-STRUCK SOUTH WEST DOUBLE ARROW=SOUTH WEST ARROW+SEMANTIC DOUBLE-STRUCK UP DOWN DOUBLE ARROW=UP DOWN ARROW+SEMANTIC DOUBLE-STRUCK UPWARDS DOUBLE ARROW=UPWARDS ARROW+SEMANTIC DOUBLE-STRUCK
Hard to do algorithmically, though possible: the character is outlined, and then the original part of the character is removed, except that some strokes are left alone.
May well be productive---in particular, F (as in ``Let F be a field ...´´) is missing, but often seen in the literature.
If used but not rendered, confusion is likely to be minimal, so highly desirable.
Requests that a drop-shadow be drawn behind the glyph. Conventionally, the light source is behind the left shoulder of the observer, as if the observer was right handed and working at a desk. (This can be changed by using TURNED, REVERSED or INVERTED.) The shadow is cast on a flat surface behind the glyph.
Could be used for
LOWER RIGHT DROP-SHADOWED WHITE SQUARE=WHITE SQUARE+SEMANTIC DROP-SHADOWED
Not much gain there, and unlikely to be useful for anything very much.
This is for characters whose decompositions include <wide>. It indicates that, if there is choice between 2 glyphs (the single-cell one or the double-cell one), the double-cell one should be chosen. It enables software to use decomposition to get good results without needing to understand anything else about fullwidth/halfwidth characters.
Decompositions including <small> are replaced by ones involving SEMANTIC FULLWIDTH and SEMANTIC SMALL (q v), as the character glyph is small, but it is centred in a double-cell space.
If it is true that the difference between the FULLWIDTH and non-FULLWIDTH form is present merely to distinguish different glyphs that carry the same meaning, but are being used simultaneously because of a trip through a character encoding that had both, maybe this semantic is not needed, and can be replaced by a canonical one.
This is used for halfwidth characters: those with HALFWIDTH in the name, or whose decompositions include <narrow>. It indicates that, if there is choice between 2 glyphs (the single-cell one or the double-cell one), the single-cell one should be chosen. It enables software to use decomposition and get good results without needing to understand anything else about fullwidth/halfwidth characters.
I also note the decompositions for non-Western characters, though they are not otherwise competently explored here.
If it is true that the difference between the HALFWIDTH and non-HALFWIDTH form is present merely to distinguish different glyphs that carry the same meaning, but are being used simultaneously because of a trip through a character encoding that had both, maybe this semantic is not needed, and can be replaced by a canonical one.
Requests the character be rendered in a heavy, ``bold´´, or ``black´´ font. The style is frequently used with important semantic content in mathematics, where it is used to represent a vector, and the magnitude of the vector is represented by the corresponding non-heavy character.
These are the heavy characters already in the U C S:
BLACK RIGHTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC HEAVY BULLET OPERATOR=DOT OPERATOR+SEMANTIC HEAVY BULLET=MIDDLE DOT+SEMANTIC HEAVY HEAVY ASTERISK=ASTERISK OPERATOR+SEMANTIC HEAVY HEAVY BALLOT X=BALLOT X+SEMANTIC HEAVY HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW=HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW+SEMANTIC INVERTED HEAVY BLACK HEART=BLACK HEART SUIT+SEMANTIC HEAVY HEAVY BLACK-FEATHERED NORTH EAST ARROW=BLACK-FEATHERED NORTH EAST ARROW+SEMANTIC HEAVY HEAVY BLACK-FEATHERED RIGHTWARDS ARROW=BLACK-FEATHERED RIGHTWARDS ARROW+SEMANTIC HEAVY HEAVY BLACK-FEATHERED SOUTH EAST ARROW=BLACK-FEATHERED SOUTH EAST ARROW+SEMANTIC HEAVY HEAVY CHECK MARK=CHECK MARK+SEMANTIC HEAVY HEAVY CHEVRON SNOWFLAKE=SNOWFLAKE+SEMANTIC HEAVY HYPHEN BULLET=HYPHEN+SEMANTIC HEAVY TRIANGULAR BULLET=BLACK RIGHT-POINTING SMALL TRIANGLE+SEMANTIC HEAVY HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW=DASHED TRIANGLE-HEADED RIGHTWARDS ARROW+SEMANTIC HEAVY HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT=RIGHT DOUBLE QUOTATION MARK+SEMANTIC HEAVY HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT=LEFT DOUBLE QUOTATION MARK+SEMANTIC HEAVY HEAVY EIGHT POINTED RECTILINEAR BLACK STAR=EIGHT POINTED RECTILINEAR BLACK STAR+SEMANTIC HEAVY HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK=EIGHT TEARDROP-SPOKED PROPELLER ASTERISK+SEMANTIC HEAVY HEAVY EXCLAMATION MARK ORNAMENT=EXCLAMATION MARK+SEMANTIC HEAVY HEAVY FOUR BALLOON-SPOKED ASTERISK=FOUR BALLOON-SPOKED ASTERISK+SEMANTIC HEAVY HEAVY GREEK CROSS=PLUS SIGN+SEMANTIC HEAVY HEAVY MULTIPLICATION X=MULTIPLICATION X+SEMANTIC HEAVY HEAVY NORTH EAST ARROW=NORTH EAST ARROW+SEMANTIC HEAVY HEAVY OPEN CENTRE CROSS=OPEN CENTRE CROSS+SEMANTIC HEAVY HEAVY OUTLINED BLACK STAR=OUTLINED BLACK STAR+SEMANTIC HEAVY HEAVY RIGHTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC HEAVY HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT=RIGHT SINGLE QUOTATION MARK+SEMANTIC HEAVY HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT=LEFT SINGLE QUOTATION MARK+SEMANTIC HEAVY HEAVY SOUTH EAST ARROW=SOUTH EAST ARROW+SEMANTIC HEAVY HEAVY SPARKLE=SPARKLE+SEMANTIC HEAVY HEAVY TEARDROP-SPOKED ASTERISK=TEARDROP-SPOKED ASTERISK+SEMANTIC HEAVY HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW=TRIANGLE-HEADED RIGHTWARDS ARROW+SEMANTIC HEAVY HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW=HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW+SEMANTIC INVERTED HEAVY VERTICAL BAR=MEDIUM VERTICAL BAR+SEMANTIC HEAVY HEAVY WEDGE-TAILED RIGHTWARDS ARROW=WEDGE-TAILED RIGHTWARDS ARROW+SEMANTIC HEAVY
HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK and HEAVY CHEVRON SNOWFLAKE both appear to be heavy, but the base form is not encoded. (This reminds me of the situation with proto-Indo-European *-words, whose existence we can deduce without direct evidence.) Maybe they should be added.
Hard to do well algorithmically, but easy to do to some legible standard.
If used but not recognised, unlikely to cause the resulting text to be misinterpreted (except in the mathematical use), so highly desirable.
Rotates the character (out of the paper) through a half-turn about a horizontal axis; equivalently, reflects the character about the horizontal axis. For characters where ``inverted´´ and ``turned´´ are equivalent, we describe the character as ``turned´´, out of deference to metal typography.
These characters are inverted copies of other characters:
BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW=FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW+SEMANTIC INVERTED BLACK LOWER RIGHT TRIANGLE=BLACK UPPER RIGHT TRIANGLE+SEMANTIC INVERTED BLACK-FEATHERED SOUTH EAST ARROW=BLACK-FEATHERED NORTH EAST ARROW+SEMANTIC INVERTED BOTTOM RIGHT CORNER=TOP RIGHT CORNER+SEMANTIC INVERTED BOTTOM RIGHT CROP=TOP RIGHT CROP+SEMANTIC INVERTED DOWNWARDS ARROW WITH TIP LEFTWARDS=UPWARDS ARROW WITH TIP LEFTWARDS+SEMANTIC INVERTED DOWNWARDS ARROW WITH TIP RIGHTWARDS=UPWARDS ARROW WITH TIP RIGHTWARDS+SEMANTIC INVERTED DOWNWARDS HARPOON WITH BARB LEFTWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC INVERTED LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE=LATIN LETTER GLOTTAL STOP WITH STROKE+SEMANTIC INVERTED LATIN LETTER INVERTED GLOTTAL STOP=LATIN LETTER GLOTTAL STOP+SEMANTIC INVERTED LATIN LETTER SMALL CAPITAL INVERTED R=LATIN LETTER SMALL CAPITAL R+SEMANTIC INVERTED LEFT CEILING=LEFT FLOOR+SEMANTIC INVERTED LOWER BLADE SCISSORS=UPPER BLADE SCISSORS+SEMANTIC INVERTED LOWER RIGHT PENCIL=UPPER RIGHT PENCIL+SEMANTIC INVERTED LOWER RIGHT QUADRANT CIRCULAR ARC=UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC INVERTED NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW=NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW+SEMANTIC INVERTED RIGHT CEILING=RIGHT FLOOR+SEMANTIC INVERTED RIGHTWARDS HARPOON WITH BARB DOWNWARDS=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC INVERTED SOUTH EAST ARROW=NORTH EAST ARROW+SEMANTIC INVERTED THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD=THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD+SEMANTIC INVERTED UPPER RIGHT DROP-SHADOWED WHITE SQUARE=LOWER RIGHT DROP-SHADOWED WHITE SQUARE+SEMANTIC INVERTED UPPER RIGHT SHADOWED WHITE SQUARE=LOWER RIGHT SHADOWED WHITE SQUARE+SEMANTIC INVERTED WHITE DOWN POINTING INDEX=WHITE UP POINTING INDEX+SEMANTIC INVERTED
There is also INVERTED LAZY S, but no (*)LAZY S. (*)LAZY S could be seen as a rotated version of LATIN SMALL LETTER S obtained if we use a SEMANTIC ROTATED decomposition, described later, but the glyphs are very different. In fact, we are assured that `reversed tilde and lazy s are glyph variants´, so that´s how we´ll encode it.
For arrows, the one pointing up (mathematical positive) is the one we define as ``the right way up´´, and its image is the ``inverted´´ glyph.
This is very easy to do in software, and the consequences of ignoring it are likely to be severe if arrows are important. (This only affects people who try to make up new characters, as existing characters are already encoded and should be well understood.)
Requests the glyph be rendered in an italic, oblique, or slanted font. This may be just slanted, or it may have additional ornamentation at the ends of strokes, but in this case should still be distinguishable from SCRIPT (q v).
Only needed for 2 characters currently encoded.
PLANCK CONSTANT=LATIN SMALL LETTER H+SEMANTIC ITALIC PLANCK CONSTANT OVER TWO PI=LATIN SMALL LETTER H+SEMANTIC ITALIC+COMBINING SHORT SOLIDUS OVERLAY
In mathematical text, there is usually a font difference between the characters used in the running text and the characters used for ordinary mathematical symbols. The recommended way to mark the distinction is with SEMANTIC ITALIC. Symbols represented as Greek characters are sometimes printed in a recognisably italic font, and sometimes an upright one: if there is only an italic Greek font available, it should be used for Greek characters with or without a SEMANTIC ITALIC.
Slanting, at least, can be done algorithmically with little difficulty for both outline and bit-mapped fonts.
If used but not recognised, unlikely to cause the resulting text to be misinterpreted (even in a mathematical application), so this is very desirable even though it´s only used for 2 existing characters.
A larger version of the same character. Used in
LIGHT VERTICAL BAR=VERTICAL LINE+SEMANTIC LARGE MULTIPLICATION X=MULTIPLICATION SIGN+SEMANTIC LARGE N-ARY INTERSECTION=INTERSECTION+SEMANTIC LARGE N-ARY LOGICAL AND=LOGICAL AND+SEMANTIC LARGE N-ARY LOGICAL OR=LOGICAL OR+SEMANTIC LARGE N-ARY PRODUCT=GREEK CAPITAL LETTER PI+SEMANTIC LARGE N-ARY SUMMATION=GREEK CAPITAL LETTER SIGMA+SEMANTIC LARGE N-ARY UNION=UNION+SEMANTIC LARGE
It´s odd that although there´s an N-ARY COPRODUCT, there´s no (*)COPRODUCT. It should be represented as GREEK CAPITAL LETTER PI+SEMANTIC TURNED. This semantic would also be the right one to use for the Hebrew wide letters:
HEBREW LETTER WIDE ALEF=HEBREW LETTER ALEF+SEMANTIC LARGE HEBREW LETTER WIDE DALET=HEBREW LETTER DALET+SEMANTIC LARGE HEBREW LETTER WIDE FINAL MEM=HEBREW LETTER FINAL MEM+SEMANTIC LARGE HEBREW LETTER WIDE HE=HEBREW LETTER HE+SEMANTIC LARGE HEBREW LETTER WIDE KAF=HEBREW LETTER KAF+SEMANTIC LARGE HEBREW LETTER WIDE LAMED=HEBREW LETTER LAMED+SEMANTIC LARGE HEBREW LETTER WIDE RESH=HEBREW LETTER RESH+SEMANTIC LARGE HEBREW LETTER WIDE TAV=HEBREW LETTER TAV+SEMANTIC LARGE
Requests some kind of ``artistic combination´´ of 2 characters into a single glyph. This is another ``binary operation´´: the SEMANTIC LIGATURE stands between 2 characters to be ligated. Either or both may have their own combining marks: to give a combining mark to the whole ligature, it would have to be first enclosed in START GROUP ... POP DIRECTIONAL FORMATTING.
Used for characters with LIGATURE or DIGRAPH in their name (except Arabic characters, where the expectation is that letters are joined anyway).
AMPERSAND=LATIN CAPITAL LETTER E+SEMANTIC LIGATURE+LATIN SMALL LETTER T ARMENIAN SMALL LIGATURE ECH YIWN=ARMENIAN SMALL LETTER ECH+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER YIWN ARMENIAN SMALL LIGATURE MEN ECH=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER ECH ARMENIAN SMALL LIGATURE MEN INI=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER INI ARMENIAN SMALL LIGATURE MEN NOW=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER NOW ARMENIAN SMALL LIGATURE MEN XEH=ARMENIAN SMALL LETTER MEN+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER XEH ARMENIAN SMALL LIGATURE VEW NOW=ARMENIAN SMALL LETTER VEW+SEMANTIC LIGATURE+ARMENIAN SMALL LETTER NOW BEAMED EIGHTH NOTES=EIGHTH NOTE+SEMANTIC LIGATURE+EIGHTH NOTE BEAMED SIXTEENTH NOTES=EIGHTH NOTE+COMBINING HOOK+SEMANTIC LIGATURE+START GROUP+EIGHTH NOTE+COMBINING HOOK+POP DIRECTIONAL FORMATTING CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER BIG YUS CYRILLIC CAPITAL LETTER IOTIFIED E=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER IE CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER LITTLE YUS CYRILLIC CAPITAL LETTER LJE=CYRILLIC CAPITAL LETTER EL+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER SOFT SIGN CYRILLIC CAPITAL LETTER NJE=CYRILLIC CAPITAL LETTER EN+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER SOFT SIGN CYRILLIC CAPITAL LETTER YU=CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER O CYRILLIC CAPITAL LIGATURE A IE=CYRILLIC CAPITAL LETTER A+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER IE CYRILLIC CAPITAL LIGATURE EN GHE=CYRILLIC CAPITAL LETTER EN+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER GHE CYRILLIC CAPITAL LIGATURE TE TSE=CYRILLIC CAPITAL LETTER TE+SEMANTIC LIGATURE+CYRILLIC CAPITAL LETTER TSE CYRILLIC SMALL LETTER IOTIFIED BIG YUS=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER BIG YUS CYRILLIC SMALL LETTER IOTIFIED E=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER E CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER LITTLE YUS CYRILLIC SMALL LETTER LJE=CYRILLIC SMALL LETTER EL+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER SOFT SIGN CYRILLIC SMALL LETTER NJE=CYRILLIC SMALL LETTER EN+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER SOFT SIGN CYRILLIC SMALL LETTER YU=CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER O CYRILLIC SMALL LIGATURE A IE=CYRILLIC SMALL LETTER A+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER IE CYRILLIC SMALL LIGATURE EN GHE=CYRILLIC SMALL LETTER EN+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER GHE CYRILLIC SMALL LIGATURE TE TSE=CYRILLIC SMALL LETTER TE+SEMANTIC LIGATURE+CYRILLIC SMALL LETTER TSE L B BAR SYMBOL=LATIN SMALL LETTER L+SEMANTIC LIGATURE+LATIN SMALL LETTER B LAO HO MO=LAO LETTER HO SUNG+SEMANTIC LIGATURE+LAO LETTER MO LAO HO NO=LAO LETTER HO SUNG+SEMANTIC LIGATURE+LAO LETTER NO LATIN CAPITAL LETTER AE=LATIN CAPITAL LETTER A+SEMANTIC LIGATURE+LATIN CAPITAL LETTER E LATIN CAPITAL LETTER OI=LATIN CAPITAL LETTER O+SEMANTIC LIGATURE+LATIN CAPITAL LETTER I LATIN CAPITAL LIGATURE IJ=LATIN CAPITAL LETTER I+SEMANTIC LIGATURE+LATIN CAPITAL LETTER J LATIN CAPITAL LIGATURE OE=LATIN CAPITAL LETTER O+SEMANTIC LIGATURE+LATIN SMALL LETTER E LATIN SMALL LETTER AE=LATIN SMALL LETTER A+SEMANTIC LIGATURE+LATIN SMALL LETTER E LATIN SMALL LETTER DEZH DIGRAPH=LATIN SMALL LETTER D+SEMANTIC LIGATURE+LATIN SMALL LETTER EZH LATIN SMALL LETTER DZ DIGRAPH WITH CURL=LATIN SMALL LETTER D+SEMANTIC LIGATURE+LATIN SMALL LETTER Z WITH CURL LATIN SMALL LETTER DZ DIGRAPH=LATIN SMALL LETTER D+SEMANTIC LIGATURE+LATIN SMALL LETTER Z LATIN SMALL LETTER HV=LATIN SMALL LETTER H+SEMANTIC LIGATURE+LATIN SMALL LETTER V LATIN SMALL LETTER LEZH=LATIN SMALL LETTER L+SEMANTIC LIGATURE+LATIN SMALL LETTER EZH LATIN SMALL LETTER OI=LATIN SMALL LETTER O+SEMANTIC LIGATURE+LATIN SMALL LETTER DOTLESS I LATIN SMALL LETTER REVERSED OPEN E WITH HOOK=LATIN SMALL LETTER REVERSED OPEN E+SEMANTIC LIGATURE+MODIFIER LETTER RHOTIC HOOK LATIN SMALL LETTER SCHWA WITH HOOK=LATIN SMALL LETTER SCHWA+SEMANTIC LIGATURE+MODIFIER LETTER RHOTIC HOOK LATIN SMALL LETTER SHARP S=LATIN SMALL LETTER S+SEMANTIC LIGATURE+LATIN SMALL LETTER S LATIN SMALL LETTER TC DIGRAPH WITH CURL=LATIN SMALL LETTER T+SEMANTIC LIGATURE+LATIN SMALL LETTER C WITH CURL LATIN SMALL LETTER TESH DIGRAPH=LATIN SMALL LETTER T+SEMANTIC LIGATURE+LATIN SMALL LETTER ESH LATIN SMALL LETTER TS DIGRAPH=LATIN SMALL LETTER T+SEMANTIC LIGATURE+LATIN SMALL LETTER S LATIN SMALL LIGATURE FF=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER F LATIN SMALL LIGATURE FFI=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER I LATIN SMALL LIGATURE FFL=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER L LATIN SMALL LIGATURE FI=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER I LATIN SMALL LIGATURE FL=LATIN SMALL LETTER F+SEMANTIC LIGATURE+LATIN SMALL LETTER L LATIN SMALL LIGATURE IJ=LATIN SMALL LETTER I+SEMANTIC LIGATURE+LATIN SMALL LETTER J LATIN SMALL LIGATURE LONG S T=LATIN SMALL LETTER LONG S+SEMANTIC LIGATURE+LATIN SMALL LETTER T LATIN SMALL LIGATURE OE=LATIN SMALL LETTER O+SEMANTIC LIGATURE+LATIN SMALL LETTER E LATIN SMALL LIGATURE ST=LATIN SMALL LETTER S+SEMANTIC LIGATURE+LATIN SMALL LETTER T NUMERO SIGN=LATIN CAPITAL LETTER N+SEMANTIC LIGATURE+LATIN SMALL LETTER O PRESCRIPTION TAKE=LATIN CAPITAL LETTER P+SEMANTIC LIGATURE+LATIN SMALL LETTER X ROMAN NUMERAL ONE THOUSAND C D=LATIN CAPITAL LETTER C+SEMANTIC LIGATURE+LATIN CAPITAL LETTER D
If the decomposition of CYRILLIC LETTER YU (historically CYRILLIC LETTER IOTIFIED O) as CYRILLIC LETTER BYELORUSSIAN-UKRAINIAN I+SEMANTIC LIGATURE+CYRILLIC LETTER O horrifies you, see the remarks on O WITH STROKE, below.
There is an argument for some of these---e g, LATIN CAPITAL LETTER AE---that they are not ligatures, and should not be decomposed as such. However, even when LATIN CAPITAL LETTER AE is being used as the letter ash, it is still appropriate to render it as AE if its glyph is not available: you´d see text like ``AElfred the Great´´. (The comments for ``O WITH STROKE´´ also apply here.)
We should enclose each decomposition in ``brackets´´ (START GROUP, ..., POP DIRECTIONAL FORMATTING) for formal reasons: it ensures that the ligature is treated as unit by any other processing that may be done.
I also note here the decompositions for Hebrew ligatures, though Hebrew is not otherwise competently explored here:
HEBREW LIGATURE ALEF LAMED=HEBREW LETTER ALEF+SEMANTIC LIGATURE+HEBREW LETTER LAMED HEBREW LIGATURE YIDDISH DOUBLE VAV=HEBREW LETTER VAV+SEMANTIC LIGATURE+HEBREW LETTER VAV HEBREW LIGATURE YIDDISH DOUBLE YOD=HEBREW LETTER YOD+SEMANTIC LIGATURE+HEBREW LETTER YOD HEBREW LIGATURE YIDDISH VAV YOD=HEBREW LETTER VAV+SEMANTIC LIGATURE+HEBREW LETTER YOD
There is a 3rd set of ligatures as well: many mathematical symbols are composed as a combination of other characters, but in at least some cases the composition is no longer purely algorithmic. The list is ...
ALMOST EQUAL OR EQUAL TO=ALMOST EQUAL TO+SEMANTIC LIGATURE+EQUALS SIGN BOWTIE=VERTICAL STROKE+SEMANTIC LIGATURE+MULTIPLICATION SIGN+SEMANTIC LIGATURE+VERTICAL STROKE CONTAINS AS NORMAL SUBGROUP OR EQUAL TO=CONTAINS AS NORMAL SUBGROUP+SEMANTIC LIGATURE+EQUALS SIGN EQUAL TO OR GREATER-THAN=EQUALS SIGN+SEMANTIC LIGATURE+GREATER-THAN SIGN EQUAL TO OR LESS-THAN=EQUALS SIGN+SEMANTIC LIGATURE+LESS-THAN SIGN EQUAL TO OR PRECEDES=EQUALS SIGN+SEMANTIC LIGATURE+PRECEDES EQUAL TO OR SUCCEEDS=EQUALS SIGN+SEMANTIC LIGATURE+SUCCEEDS GREATER-THAN BUT NOT EQUIVALENT TO=GREATER-THAN SIGN+SEMANTIC LIGATURE+NOT EQUIVALENT TO GREATER-THAN EQUAL TO OR LESS-THAN=GREATER-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN+SEMANTIC LIGATURE+LESS-THAN SIGN GREATER-THAN OR EQUAL TO=GREATER-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN GREATER-THAN OR EQUIVALENT TO=GREATER-THAN SIGN+SEMANTIC LIGATURE+EQUIVALENT TO GREATER-THAN OR LESS-THAN=GREATER-THAN SIGN+SEMANTIC LIGATURE+LESS-THAN SIGN LEFT NORMAL FACTOR SEMIDIRECT PRODUCT=VERTICAL STROKE+SEMANTIC LIGATURE+MULTIPLICATION SIGN LESS-THAN BUT NOT EQUIVALENT TO=LESS-THAN SIGN+SEMANTIC LIGATURE+NOT EQUIVALENT TO LESS-THAN EQUAL TO OR GREATER-THAN=LESS-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN+SEMANTIC LIGATURE+GREATER-THAN SIGN LESS-THAN OR EQUAL TO=LESS-THAN SIGN+SEMANTIC LIGATURE+EQUALS SIGN LESS-THAN OR EQUIVALENT TO=LESS-THAN SIGN+SEMANTIC LIGATURE+EQUIVALENT TO LESS-THAN OR GREATER-THAN=LESS-THAN SIGN+SEMANTIC LIGATURE+GREATER-THAN SIGN NORMAL SUBGROUP OF=LESS-THAN SIGN+SEMANTIC LIGATURE+VERTICAL STROKE NORMAL SUBGROUP OF OR EQUAL TO=NORMAL SUBGROUP OF+SEMANTIC LIGATURE+EQUALS SIGN POSTAL MARK FACE=POSTAL MARK+SEMANTIC LIGATURE+WHITE SMILING FACE PRECEDES BUT NOT EQUIVALENT TO=PRECEDES+SEMANTIC LIGATURE+NOT EQUIVALENT TO PRECEDES OR EQUAL TO=PRECEDES+SEMANTIC LIGATURE+EQUALS SIGN PRECEDES OR EQUIVALENT TO=PRECEDES+SEMANTIC LIGATURE+EQUIVALENT TO RIGHT NORMAL FACTOR SEMIDIRECT PRODUCT=MULTIPLICATION SIGN+SEMANTIC LIGATURE+VERTICAL STROKE SQUARE IMAGE OF OR EQUAL TO=SQUARE IMAGE OF+SEMANTIC LIGATURE+EQUALS SIGN SQUARE IMAGE OF OR NOT EQUAL TO=SQUARE IMAGE OF+SEMANTIC LIGATURE+NOT EQUAL TO SQUARE ORIGINAL OF OR EQUAL TO=SQUARE ORIGINAL OF+SEMANTIC LIGATURE+EQUALS SIGN SQUARE ORIGINAL OF OR NOT EQUAL TO=SQUARE ORIGINAL OF+SEMANTIC LIGATURE+NOT EQUAL TO SUBSET OF OR EQUAL TO=SUBSET OF+SEMANTIC LIGATURE+EQUALS SIGN SUBSET OF WITH NOT EQUAL TO=SUBSET OF+SEMANTIC LIGATURE+NOT EQUAL TO SUCCEEDS BUT NOT EQUIVALENT TO=SUCCEEDS+SEMANTIC LIGATURE+NOT EQUIVALENT TO SUCCEEDS OR EQUAL TO=SUCCEEDS+SEMANTIC LIGATURE+EQUALS SIGN SUCCEEDS OR EQUIVALENT TO=SUCCEEDS+SEMANTIC LIGATURE+EQUIVALENT TO SUPERSET OF OR EQUAL TO=SUPERSET OF+SEMANTIC LIGATURE+EQUALS SIGN SUPERSET OF WITH NOT EQUAL TO=SUPERSET OF+SEMANTIC LIGATURE+NOT EQUAL TO
Strangely, the decomposition with ligature is most useful for renderers that can´t do ligatures, e g, cell-based character terminals. They can just look up the decomposition and render the two glyphs---ignoring the PRESENTATION REQUEST LIGATURE completely---and get good, legible results.
Surrounds the character with a narrow line. 5 characters are described as ``outlined´´, and a few others fit the description.
BULLSEYE=WHITE BULLET+SEMANTIC OUTLINED FISHEYE=BULLET+SEMANTIC OUTLINED OPEN-OUTLINED RIGHTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC OUTLINED OUTLINED BLACK STAR=BLACK STAR+SEMANTIC OUTLINED OUTLINED GREEK CROSS=PLUS SIGN+SEMANTIC OUTLINED OUTLINED LATIN CROSS=LATIN CROSS+SEMANTIC OUTLINED STRESS OUTLINED WHITE STAR=WHITE STAR+SEMANTIC OUTLINED WHITE DIAMOND CONTAINING BLACK SMALL DIAMOND=BLACK DIAMOND+SEMANTIC SMALL+SEMANTIC OUTLINED WHITE SQUARE CONTAINING BLACK SMALL SQUARE=BLACK SMALL SQUARE+SEMANTIC OUTLINED
and EIGHT PETALLED OUTLINED BLACK FLORETTE lacks a base form.
Possible to do algorithmically, but seems like a very specialised thing to do for such little gain.
Substitution of the non-outlined glyph is unlikely to cause legibility problems though, so this would be a good decomposition to have even if noone uses it.
Requests that characters be overstruck. Applies to the 2 characters on each side, like a ``binary operator´´.
Although seemingly simple, this introduces a whole set of problems. What is the difference between following a character with a COMBINING ENCLOSING CIRCLE and OVERPRINTING it with a LARGE CIRCLE? Can you accent a character by composing it with a spacing accent character?
To avoid such problems, the OVERPRINT character is only used in cases where the derivation of the character is clearly understood, and known to be overstuck. This is a historical judgement.
It applies mostly to the A P L block, and there are 64 symbols:
APL FUNCTIONAL SYMBOL ALPHA UNDERBAR=APL FUNCTIONAL SYMBOL ALPHA+COMBINING LOW LINE APL FUNCTIONAL SYMBOL BACKSLASH BAR=REVERSE SOLIDUS+SEMANTIC OVERPRINT+MINUS SIGN APL FUNCTIONAL SYMBOL CIRCLE BACKSLASH=LARGE CIRCLE+SEMANTIC OVERPRINT+REVERSE SOLIDUS APL FUNCTIONAL SYMBOL CIRCLE DIAERESIS=LARGE CIRCLE+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL CIRCLE JOT=LARGE CIRCLE+SEMANTIC OVERPRINT+RING OPERATOR APL FUNCTIONAL SYMBOL CIRCLE STAR=LARGE CIRCLE+SEMANTIC OVERPRINT+ASTERISK OPERATOR APL FUNCTIONAL SYMBOL CIRCLE STILE=LARGE CIRCLE+SEMANTIC OVERPRINT+VERTICAL LINE APL FUNCTIONAL SYMBOL CIRCLE UNDERBAR=LARGE CIRCLE+COMBINING LOW LINE APL FUNCTIONAL SYMBOL COMMA BAR=COMMA+SEMANTIC OVERPRINT+MINUS SIGN APL FUNCTIONAL SYMBOL DEL DIAERESIS=INCREMENT+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL DEL STILE=INCREMENT+SEMANTIC OVERPRINT+VERTICAL LINE APL FUNCTIONAL SYMBOL DEL TILDE=INCREMENT+SEMANTIC OVERPRINT+TILDE OPERATOR APL FUNCTIONAL SYMBOL DELTA STILE=INCREMENT+SEMANTIC OVERPRINT+VERTICAL LINE APL FUNCTIONAL SYMBOL DELTA UNDERBAR=INCREMENT+COMBINING LOW LINE APL FUNCTIONAL SYMBOL DIAMOND UNDERBAR=DIAMOND OPERATOR+COMBINING LOW LINE APL FUNCTIONAL SYMBOL DOWN CARET TILDE=DOWN ARROWHEAD+SEMANTIC OVERPRINT+TILDE OPERATOR APL FUNCTIONAL SYMBOL DOWN SHOE STILE=UNION+SEMANTIC OVERPRINT+VERTICAL LINE APL FUNCTIONAL SYMBOL DOWN TACK JOT=DOWN TACK+SEMANTIC OVERPRINT+RING OPERATOR APL FUNCTIONAL SYMBOL DOWN TACK UNDERBAR=DOWN TACK+COMBINING LOW LINE APL FUNCTIONAL SYMBOL DOWNWARDS VANE=MINUS SIGN+SEMANTIC OVERPRINT+DOWNWARDS ARROW APL FUNCTIONAL SYMBOL EPSILON UNDERBAR=GREEK SMALL LETTER EPSILON+COMBINING LOW LINE APL FUNCTIONAL SYMBOL GREATER-THAN DIAERESIS=GREATER-THAN SIGN+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL IOTA UNDERBAR=APL FUNCTIONAL SYMBOL IOTA+COMBINING LOW LINE APL FUNCTIONAL SYMBOL JOT DIAERESIS=RING OPERATOR+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL JOT UNDERBAR=RING OPERATOR+COMBINING LOW LINE APL FUNCTIONAL SYMBOL LEFT SHOE STILE=SUBSET OF+SEMANTIC OVERPRINT+VERTICAL LINE APL FUNCTIONAL SYMBOL LEFTWARDS VANE=VERTICAL LINE+SEMANTIC OVERPRINT+LEFTWARDS ARROW APL FUNCTIONAL SYMBOL OMEGA UNDERBAR=APL FUNCTIONAL SYMBOL OMEGA+COMBINING LOW LINE APL FUNCTIONAL SYMBOL QUAD BACKSLASH=BALLOT BOX+SEMANTIC OVERPRINT+REVERSE SOLIDUS APL FUNCTIONAL SYMBOL QUAD CIRCLE=BALLOT BOX+SEMANTIC OVERPRINT+LARGE CIRCLE APL FUNCTIONAL SYMBOL QUAD COLON=BALLOT BOX+SEMANTIC OVERPRINT+COLON APL FUNCTIONAL SYMBOL QUAD DEL=BALLOT BOX+SEMANTIC OVERPRINT+INCREMENT APL FUNCTIONAL SYMBOL QUAD DELTA=BALLOT BOX+SEMANTIC OVERPRINT+INCREMENT APL FUNCTIONAL SYMBOL QUAD DIAMOND=BALLOT BOX+SEMANTIC OVERPRINT+DIAMOND OPERATOR APL FUNCTIONAL SYMBOL QUAD DIVIDE=BALLOT BOX+SEMANTIC OVERPRINT+DIVISION SIGN APL FUNCTIONAL SYMBOL QUAD DOWN CARET=BALLOT BOX+SEMANTIC OVERPRINT+DOWN ARROWHEAD APL FUNCTIONAL SYMBOL QUAD DOWNWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+DOWNWARDS ARROW APL FUNCTIONAL SYMBOL QUAD EQUAL=BALLOT BOX+SEMANTIC OVERPRINT+EQUALS SIGN APL FUNCTIONAL SYMBOL QUAD GREATER-THAN=BALLOT BOX+SEMANTIC OVERPRINT+GREATER-THAN SIGN APL FUNCTIONAL SYMBOL QUAD JOT=BALLOT BOX+SEMANTIC OVERPRINT+RING OPERATOR APL FUNCTIONAL SYMBOL QUAD LEFTWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+LEFTWARDS ARROW APL FUNCTIONAL SYMBOL QUAD LESS-THAN=BALLOT BOX+SEMANTIC OVERPRINT+LESS-THAN SIGN APL FUNCTIONAL SYMBOL QUAD NOT EQUAL=BALLOT BOX+SEMANTIC OVERPRINT+NOT EQUAL TO APL FUNCTIONAL SYMBOL QUAD QUESTION=BALLOT BOX+SEMANTIC OVERPRINT+QUESTION MARK APL FUNCTIONAL SYMBOL QUAD RIGHTWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+RIGHTWARDS ARROW APL FUNCTIONAL SYMBOL QUAD SLASH=BALLOT BOX+SEMANTIC OVERPRINT+SOLIDUS APL FUNCTIONAL SYMBOL QUAD UP CARET=BALLOT BOX+SEMANTIC OVERPRINT+UP ARROWHEAD APL FUNCTIONAL SYMBOL QUAD UPWARDS ARROW=BALLOT BOX+SEMANTIC OVERPRINT+UPWARDS ARROW APL FUNCTIONAL SYMBOL QUOTE QUAD=APOSTROPHE+SEMANTIC OVERPRINT+BALLOT BOX APL FUNCTIONAL SYMBOL QUOTE UNDERBAR=APOSTROPHE+COMBINING LOW LINE APL FUNCTIONAL SYMBOL RIGHTWARDS VANE=VERTICAL LINE+SEMANTIC OVERPRINT+RIGHTWARDS ARROW APL FUNCTIONAL SYMBOL SEMICOLON UNDERBAR=SEMICOLON+COMBINING LOW LINE APL FUNCTIONAL SYMBOL SLASH BAR=SOLIDUS+SEMANTIC OVERPRINT+MINUS SIGN APL FUNCTIONAL SYMBOL STAR DIAERESIS=ASTERISK OPERATOR+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL STILE TILDE=VERTICAL LINE+SEMANTIC OVERPRINT+TILDE OPERATOR APL FUNCTIONAL SYMBOL TILDE DIAERESIS=TILDE OPERATOR+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL UP CARET TILDE=UP ARROWHEAD+SEMANTIC OVERPRINT+TILDE OPERATOR APL FUNCTIONAL SYMBOL UP SHOE JOT=INTERSECTION+SEMANTIC OVERPRINT+RING OPERATOR APL FUNCTIONAL SYMBOL UP TACK DIAERESIS=UP TACK+COMBINING DIAERESIS APL FUNCTIONAL SYMBOL UP TACK JOT=UP TACK+SEMANTIC OVERPRINT+RING OPERATOR APL FUNCTIONAL SYMBOL UP TACK OVERBAR=UP TACK+COMBINING OVERLINE APL FUNCTIONAL SYMBOL UPWARDS VANE=MINUS SIGN+SEMANTIC OVERPRINT+UPWARDS ARROW APL FUNCTIONAL SYMBOL ZILDE=LARGE CIRCLE+SEMANTIC OVERPRINT+TILDE OPERATOR
Composition could also be used to shrink the number of box-drawing characters down to a very reasonable 10 or so, from which the rest can be built. It seems likely that a sophisicated renderer would not regard these as characters at all, but would convert them into drawing primitives taking into account the current leading.
Since there are no characters BOX DRAWINGS DOUBLE {DOWN, LEFT, RIGHT, UP}, we use BOX DRAWINGS LIGHT {DOWN, LEFT, RIGHT, UP}+SEMANTIC DOUBLE-STRUCK in their place.
BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL BOX DRAWINGS DOUBLE DOWN AND LEFT=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE DOWN AND RIGHT=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE HORIZONTAL=BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE UP AND HORIZONTAL=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL BOX DRAWINGS DOUBLE UP AND LEFT=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE UP AND RIGHT=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL BOX DRAWINGS DOUBLE VERTICAL AND LEFT=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE VERTICAL AND RIGHT=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOUBLE VERTICAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS DOWN HEAVY AND HORIZONTAL LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS DOWN HEAVY AND RIGHT LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS DOWN HEAVY AND RIGHT UP LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS DOWN HEAVY AND UP HORIZONTAL LIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS DOWN LIGHT AND HORIZONTAL HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP BOX DRAWINGS DOWN LIGHT AND RIGHT HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS DOWN LIGHT AND RIGHT UP HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP BOX DRAWINGS DOWN LIGHT AND UP HORIZONTAL HEAVY=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL=BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL+SEMANTIC HEAVY BOX DRAWINGS HEAVY DOUBLE DASH VERTICAL=BOX DRAWINGS LIGHT DOUBLE DASH VERTICAL+SEMANTIC HEAVY BOX DRAWINGS HEAVY DOWN AND HORIZONTAL=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS HEAVY DOWN AND LEFT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT BOX DRAWINGS HEAVY DOWN AND RIGHT=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS HEAVY DOWN=BOX DRAWINGS LIGHT DOWN+SEMANTIC HEAVY BOX DRAWINGS HEAVY HORIZONTAL=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS HEAVY LEFT AND LIGHT RIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS HEAVY LEFT=BOX DRAWINGS LIGHT LEFT+SEMANTIC HEAVY BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL=BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL+SEMANTIC HEAVY BOX DRAWINGS HEAVY QUADRUPLE DASH VERTICAL=BOX DRAWINGS LIGHT QUADRUPLE DASH VERTICAL+SEMANTIC HEAVY BOX DRAWINGS HEAVY RIGHT=BOX DRAWINGS LIGHT RIGHT+SEMANTIC HEAVY BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL=BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL+SEMANTIC HEAVY BOX DRAWINGS HEAVY TRIPLE DASH VERTICAL=BOX DRAWINGS LIGHT TRIPLE DASH VERTICAL+SEMANTIC HEAVY BOX DRAWINGS HEAVY UP AND HORIZONTAL=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS HEAVY UP AND LEFT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT BOX DRAWINGS HEAVY UP AND LIGHT DOWN=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS HEAVY UP AND RIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS HEAVY UP=BOX DRAWINGS LIGHT UP+SEMANTIC HEAVY BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS HEAVY VERTICAL AND LEFT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT BOX DRAWINGS HEAVY VERTICAL AND RIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS HEAVY VERTICAL=BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP BOX DRAWINGS LEFT DOWN HEAVY AND RIGHT UP LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS LEFT HEAVY AND RIGHT DOWN LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS LEFT HEAVY AND RIGHT UP LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS LEFT HEAVY AND RIGHT VERTICAL LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT VERTICAL BOX DRAWINGS LEFT LIGHT AND RIGHT DOWN HEAVY=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN BOX DRAWINGS LEFT LIGHT AND RIGHT UP HEAVY=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP BOX DRAWINGS LEFT LIGHT AND RIGHT VERTICAL HEAVY=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY VERTICAL BOX DRAWINGS LEFT UP HEAVY AND RIGHT DOWN LIGHT=BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS LIGHT ARC DOWN AND LEFT=BOX DRAWINGS LIGHT ARC UP AND RIGHT+SEMANTIC TURNED BOX DRAWINGS LIGHT ARC DOWN AND RIGHT=BOX DRAWINGS LIGHT ARC UP AND RIGHT+SEMANTIC INVERTED BOX DRAWINGS LIGHT ARC UP AND LEFT=BOX DRAWINGS LIGHT ARC UP AND RIGHT+SEMANTIC REVERSED BOX DRAWINGS LIGHT DIAGONAL CROSS=BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT=BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT+SEMANTIC REVERSED BOX DRAWINGS LIGHT DOWN AND HORIZONTAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS LIGHT DOWN AND LEFT=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS LIGHT DOWN AND RIGHT=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS LIGHT DOWN=BOX DRAWINGS LIGHT UP+SEMANTIC TURNED BOX DRAWINGS LIGHT HORIZONTAL=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS LIGHT LEFT AND HEAVY RIGHT=BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS LIGHT LEFT=BOX DRAWINGS LIGHT RIGHT+SEMANTIC TURNED BOX DRAWINGS LIGHT UP AND HEAVY DOWN=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN BOX DRAWINGS LIGHT UP AND HORIZONTAL=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS LIGHT UP AND LEFT=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS LIGHT UP AND RIGHT=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS LIGHT VERTICAL AND LEFT=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS LIGHT VERTICAL AND RIGHT=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS LIGHT VERTICAL=BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS RIGHT DOWN HEAVY AND LEFT UP LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS RIGHT HEAVY AND LEFT DOWN LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS RIGHT HEAVY AND LEFT UP LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT UP BOX DRAWINGS RIGHT HEAVY AND LEFT VERTICAL LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT VERTICAL BOX DRAWINGS RIGHT LIGHT AND LEFT DOWN HEAVY=BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN BOX DRAWINGS RIGHT LIGHT AND LEFT UP HEAVY=BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY=BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY VERTICAL BOX DRAWINGS RIGHT UP HEAVY AND LEFT DOWN LIGHT=BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS UP DOUBLE AND LEFT SINGLE=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE=BOX DRAWINGS LIGHT UP+SEMANTIC DOUBLE-STRUCK+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS UP HEAVY AND DOWN HORIZONTAL LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS UP HEAVY AND HORIZONTAL LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS UP HEAVY AND LEFT LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS UP HEAVY AND RIGHT DOWN LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT DOWN BOX DRAWINGS UP HEAVY AND RIGHT LIGHT=BOX DRAWINGS HEAVY UP+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS UP LIGHT AND DOWN HORIZONTAL HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS UP LIGHT AND HORIZONTAL HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN BOX DRAWINGS UP LIGHT AND LEFT HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT BOX DRAWINGS UP LIGHT AND RIGHT DOWN HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY DOWN BOX DRAWINGS UP LIGHT AND RIGHT HEAVY=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL BOX DRAWINGS UP SINGLE AND LEFT DOUBLE=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE=BOX DRAWINGS LIGHT UP+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE=BOX DRAWINGS DOUBLE VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS VERTICAL HEAVY AND HORIZONTAL LIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT HORIZONTAL BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT LEFT BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT=BOX DRAWINGS HEAVY VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS LIGHT RIGHT BOX DRAWINGS VERTICAL LIGHT AND HORIZONTAL HEAVY=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY HORIZONTAL BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY LEFT BOX DRAWINGS VERTICAL LIGHT AND RIGHT HEAVY=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS HEAVY RIGHT BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+BOX DRAWINGS DOUBLE HORIZONTAL BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT LEFT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE=BOX DRAWINGS LIGHT VERTICAL+SEMANTIC OVERPRINT+START GROUP+BOX DRAWINGS LIGHT RIGHT+SEMANTIC DOUBLE-STRUCK+POP DIRECTIONAL FORMATTING
It would also be hard to deny that
BALLOT BOX WITH CHECK=BALLOT BOX+SEMANTIC OVERPRINT+CHECK MARK BALLOT BOX WITH X=BALLOT BOX+SEMANTIC OVERPRINT+BALLOT X CHI RHO=GREEK CAPITAL LETTER CHI+SEMANTIC OVERPRINT+GREEK CAPITAL LETTER RHO DIVISION TIMES=DIVISION SIGN+SEMANTIC OVERPRINT+MULTIPLICATION SIGN EQUAL AND PARALLEL TO=EQUALS SIGN+SEMANTIC OVERPRINT+PARALLEL TO GEOMETRIC PROPORTION=PROPORTION+SEMANTIC OVERPRINT+MINUS SIGN GREATER-THAN WITH DOT=GREATER-THAN SIGN+SEMANTIC OVERPRINT+DOT OPERATOR HOMOTHETIC=TILDE OPERATOR+SEMANTIC OVERPRINT+COLON INTERROBANG=QUESTION MARK+SEMANTIC OVERPRINT+EXCLAMATION MARK LESS-THAN WITH DOT=LESS-THAN SIGN+SEMANTIC OVERPRINT+DOT OPERATOR MULTISET MULTIPLICATION=UNION+SEMANTIC OVERPRINT+DOT OPERATOR MULTISET UNION=UNION+SEMANTIC OVERPRINT+STAR OPERATOR MULTISET=UNION+SEMANTIC OVERPRINT+ELEMENT OF PITCHFORK=INTERSECTION+SEMANTIC OVERPRINT+VERTICAL LINE RING IN EQUAL TO=EQUALS SIGN+SEMANTIC OVERPRINT+RING OPERATOR SQUARE WITH DIAGONAL CROSSHATCH FILL=SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL+SEMANTIC OVERPRINT+SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL SQUARE WITH ORTHOGONAL CROSSHATCH FILL=SQUARE WITH HORIZONTAL FILL+SEMANTIC OVERPRINT+SQUARE WITH VERTICAL FILL WHITE UP-POINTING TRIANGLE WITH DOT=WHITE UP-POINTING TRIANGLE+SEMANTIC OVERPRINT+DOT OPERATOR
We should enclose each decomposition in ``brackets´´ (START GROUP, ..., POP DIRECTIONAL FORMATTING) similarly to the treatment of ligature. When 2 characters are overprinted, order is important: the second may be modified to take account of the first. If the 2nd is LARGE CIRLE, LOZENGE or BALLOT BOX, the assumption would be that it should enclose the first; if it is LOW LINE or OVERBAR, it should extend to the full width occupied by the first. This is analogous to the way in which the height of an accent above a base character depends on the height of that base character. But all the strokes of both characters will be fully visible. (There is no white ink!)
If widely deployed, the OVERPRINT operation could cause no end of havoc by encouraging the creation of new symbols in a very uncontrolled way. (On the other hand, maybe that´s a good thing.)
Rotates the character (out of the paper) through a half-turn about a vertical axis; equivalently, reflects the character about the vertical axis. For characters where ``reversed´´ and ``turned´´ are equivalent, we describe the character as ``turned´´, out of deference to metal typography.
ANTICLOCKWISE OPEN CIRCLE ARROW=CLOCKWISE OPEN CIRCLE ARROW+SEMANTIC REVERSED ANTICLOCKWISE TOP SEMICIRCLE ARROW=CLOCKWISE TOP SEMICIRCLE ARROW+SEMANTIC REVERSED BLACK LEFT POINTING INDEX=BLACK RIGHT POINTING INDEX+SEMANTIC REVERSED BLACK UPPER LEFT TRIANGLE=BLACK UPPER RIGHT TRIANGLE+SEMANTIC REVERSED GRAVE ACCENT=ACUTE ACCENT+SEMANTIC REVERSED HANGUL CHOSEONG CEONGCHIEUMCHIEUCH=HANGUL CHOSEONG CHITUEUMCHIEUCH+SEMANTIC REVERSED HANGUL CHOSEONG CEONGCHIEUMCIEUC=HANGUL CHOSEONG CHITUEUMCIEUC+SEMANTIC REVERSED HANGUL CHOSEONG CEONGCHIEUMSIOS=HANGUL CHOSEONG CHITUEUMSIOS+SEMANTIC REVERSED LATIN CAPITAL LETTER D WITH TOPBAR=LATIN CAPITAL LETTER B WITH TOPBAR+SEMANTIC REVERSED LATIN CAPITAL LETTER EZH REVERSED=LATIN CAPITAL LETTER EZH+SEMANTIC REVERSED LATIN LETTER PHARYNGEAL VOICED FRICATIVE=LATIN LETTER GLOTTAL STOP+SEMANTIC REVERSED LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE=LATIN LETTER GLOTTAL STOP WITH STROKE+SEMANTIC REVERSED LATIN SMALL LETTER CLOSED REVERSED OPEN E=LATIN SMALL LETTER CLOSED OPEN E+SEMANTIC REVERSED LATIN SMALL LETTER D WITH TOPBAR=LATIN SMALL LETTER B WITH TOPBAR+SEMANTIC REVERSED LATIN SMALL LETTER EZH REVERSED=LATIN SMALL LETTER EZH+SEMANTIC REVERSED LATIN SMALL LETTER REVERSED E=LATIN SMALL LETTER E+SEMANTIC REVERSED LATIN SMALL LETTER REVERSED OPEN E=LATIN SMALL LETTER OPEN E+SEMANTIC REVERSED LATIN SMALL LETTER REVERSED R WITH FISHHOOK=LATIN SMALL LETTER R WITH FISHHOOK+SEMANTIC REVERSED LEFTWARDS HARPOON WITH BARB UPWARDS=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC REVERSED NORTH WEST ARROW=NORTH EAST ARROW+SEMANTIC REVERSED REVERSE SOLIDUS=SOLIDUS+SEMANTIC REVERSED REVERSED DOUBLE PRIME QUOTATION MARK=DOUBLE PRIME QUOTATION MARK+SEMANTIC REVERSED REVERSED DOUBLE PRIME=DOUBLE PRIME+SEMANTIC REVERSED REVERSED NOT SIGN=NOT SIGN+SEMANTIC REVERSED REVERSED PRIME=PRIME+SEMANTIC REVERSED REVERSED TILDE=TILDE OPERATOR+SEMANTIC REVERSED REVERSED TRIPLE PRIME=TRIPLE PRIME+SEMANTIC REVERSED SINGLE HIGH-REVERSED-9 QUOTATION MARK=RIGHT SINGLE QUOTATION MARK+SEMANTIC REVERSED SQUARE WITH UPPER LEFT DIAGONAL HALF BLACK=SQUARE WITH LOWER RIGHT DIAGONAL HALF BLACK+SEMANTIC REVERSED SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL=SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL+SEMANTIC REVERSED TIBETAN LETTER DDA=TIBETAN LETTER DA+SEMANTIC REVERSED TIBETAN LETTER NNA=TIBETAN LETTER NA+SEMANTIC REVERSED TIBETAN LETTER SSA=TIBETAN LETTER SHA+SEMANTIC REVERSED TIBETAN LETTER TTA=TIBETAN LETTER TA+SEMANTIC REVERSED TIBETAN LETTER TTHA=TIBETAN LETTER THA+SEMANTIC REVERSED TIBETAN MARK ANG KHANG GYAS=TIBETAN MARK ANG KHANG GYON+SEMANTIC REVERSED TIBETAN MARK GUG RTAGS GYAS=TIBETAN MARK GUG RTAGS GYON+SEMANTIC REVERSED TIBETAN VOWEL SIGN REVERSED I=TIBETAN VOWEL SIGN I+SEMANTIC REVERSED TOP LEFT CORNER=TOP RIGHT CORNER+SEMANTIC REVERSED TOP LEFT CROP=TOP RIGHT CROP+SEMANTIC REVERSED UP-POINTING TRIANGLE WITH LEFT HALF BLACK=UP-POINTING TRIANGLE WITH RIGHT HALF BLACK+SEMANTIC REVERSED UPPER LEFT QUADRANT CIRCULAR ARC=UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC REVERSED UPWARDS ARROW WITH TIP LEFTWARDS=UPWARDS ARROW WITH TIP RIGHTWARDS+SEMANTIC REVERSED UPWARDS HARPOON WITH BARB RIGHTWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC REVERSED
For arrows, the one pointing to the right (mathematically positive) is the one we define as ``forwards´´, and its image is the ``reversed´´ glyph.
Combining characters cannot take direct advantage of the semantics, so there is no way to decompose a character like COMBINING REVERSED COMMA ABOVE by means of SEMANTIC REVERSED. (... Unless you wish to reverse the character, put an ordinary comma above on it, and turn it back: (*)COMBINING REVERSED COMMA ABOVE=SEMANTIC REVERSED+COMBINING COMMA ABOVE+SEMANTIC REVERSED? What sick mind would try to do such a thing??) But we can use a different composition to get the same effect.
It seems possible that there should be some relationship between SEMANTIC REVERSED and the ``symmetric swapping´´ that happens when text is flowing from right to left. It may be appropriate to decompose all ``left´´ characters as reversed ``right´´ characters, to make this explicit, as in LEFT PARENTHESIS=RIGHT PARENTHESIS+SEMANTIC REVERSED; etc. I´m not convinced this is so desirable, but if it was, we would get the following.
RIGHT ANGLE BRACKET=LEFT ANGLE BRACKET+SEMANTIC REVERSED RIGHT BLACK LENTICULAR BRACKET=LEFT BLACK LENTICULAR BRACKET+SEMANTIC REVERSED RIGHT CURLY BRACKET=LEFT CURLY BRACKET+SEMANTIC REVERSED RIGHT FLOOR=LEFT FLOOR+SEMANTIC REVERSED RIGHT PARENTHESIS=LEFT PARENTHESIS+SEMANTIC REVERSED RIGHT SEMIDIRECT PRODUCT=LEFT SEMIDIRECT PRODUCT+SEMANTIC REVERSED RIGHT SQUARE BRACKET=LEFT SQUARE BRACKET+SEMANTIC REVERSED RIGHT TORTOISE SHELL BRACKET=LEFT TORTOISE SHELL BRACKET+SEMANTIC REVERSED RIGHT-POINTING ANGLE BRACKET=LEFT-POINTING ANGLE BRACKET+SEMANTIC REVERSED SINGLE RIGHT-POINTING ANGLE QUOTATION MARK=SINGLE LEFT-POINTING ANGLE QUOTATION MARK+SEMANTIC REVERSED
In order to implement symmetric swapping, a renderer is in any case going to need some extra reversed glyphs which are not in the U C S at all. This is no trouble for a renderer that implements the Atomic Theory, because it can do the REVERSE operation; but others may be surprised to find that they need such characters as ANGLE+SEMANTIC REVERSED, INTEGRAL+SEMANTIC REVERSED, PROPORTIONAL TO+SEMANTIC REVERSED and many more.
Reversing is very easy to do in software, and the consequences of ignoring it are likely to be severe if arrows are important. (This only affects people who try to make up new symbols, as existing characters are already encoded and should be well understood.)
This is a rotation of a quarter-turn clockwise (mathematically -90°), staying in the plane of the paper. Typographically, it is unusual to use rotated characters, because traditional type is designed to fit in a constant height, but with varying widths. (A rotated character would just fall out of the stick too easily.) There is only 2 characters in the U C S that are described as rotated by name, and it is rotated the other way.
However, lots of arrows could be described as rotated versions of other arrows, e g
BLACK RIGHT-POINTING TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC ROTATED DOWNWARDS ARROW WITH CORNER LEFTWARDS=RIGHTWARDS ARROW WITH CORNER DOWNWARDS+SEMANTIC ROTATED LEFT RIGHT ARROW=UP DOWN ARROW+SEMANTIC ROTATED RIGHT TACK=UP TACK+SEMANTIC ROTATED RIGHTWARDS ARROW FROM BAR=UPWARDS ARROW FROM BAR+SEMANTIC ROTATED RIGHTWARDS ARROW=UPWARDS ARROW+SEMANTIC ROTATED RIGHTWARDS DASHED ARROW=UPWARDS DASHED ARROW+SEMANTIC ROTATED RIGHTWARDS HARPOON WITH BARB UPWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC ROTATED RIGHTWARDS PAIRED ARROWS=UPWARDS PAIRED ARROWS+SEMANTIC ROTATED RIGHTWARDS TWO HEADED ARROW=UPWARDS TWO HEADED ARROW+SEMANTIC ROTATED
But the main reason for the existence of this character is for decompositions involving the <vertical> tag (21 of them). The decomposition given for PRESENTATION FORM FOR VERTICAL WAVY LOW LINE loses information, so we replace it with
PRESENTATION FORM FOR VERTICAL WAVY LOW LINE=<vertical>+WAVY LOW LINE
Also, a few other characters
BLACK VERTICAL RECTANGLE=BLACK RECTANGLE+SEMANTIC ROTATED CIRCLE WITH RIGHT HALF BLACK=CIRCLE WITH UPPER HALF BLACK+SEMANTIC ROTATED LEFT FIVE EIGHTHS BLOCK=LOWER FIVE EIGHTHS BLOCK+SEMANTIC ROTATED LEFT HALF BLOCK=LOWER HALF BLOCK+SEMANTIC ROTATED LEFT ONE EIGHTH BLOCK=LOWER ONE EIGHTH BLOCK+SEMANTIC ROTATED LEFT ONE QUARTER BLOCK=LOWER ONE QUARTER BLOCK+SEMANTIC ROTATED LEFT SEVEN EIGHTHS BLOCK=LOWER SEVEN EIGHTHS BLOCK+SEMANTIC ROTATED LEFT THREE EIGHTHS BLOCK=LOWER THREE EIGHTHS BLOCK+SEMANTIC ROTATED LEFT THREE QUARTERS BLOCK=LOWER THREE QUARTERS BLOCK+SEMANTIC ROTATED ROTATED FLORAL HEART BULLET=FLORAL HEART+SEMANTIC ROTATED+SEMANTIC ROTATED+SEMANTIC ROTATED ROTATED HEAVY BLACK HEART BULLET=HEAVY BLACK HEART+SEMANTIC ROTATED+SEMANTIC ROTATED+SEMANTIC ROTATED SQUARE WITH VERTICAL FILL=SQUARE WITH HORIZONTAL FILL+SEMANTIC ROTATED UP RIGHT DIAGONAL ELLIPSIS=DOWN RIGHT DIAGONAL ELLIPSIS+SEMANTIC ROTATED UPPER HALF BLOCK=LEFT HALF BLOCK+SEMANTIC ROTATED UPPER ONE EIGHTH BLOCK=LEFT ONE EIGHTH BLOCK+SEMANTIC ROTATED VERTICAL ELLIPSIS=HORIZONTAL ELLIPSIS+SEMANTIC ROTATED WAVY LINE=WAVY DASH+SEMANTIC ROTATED WREATH PRODUCT=TILDE OPERATOR+SEMANTIC ROTATED
Rotation can be done algorithmically, but is harder than INVERTED, REVERSED or TURNED because the resulting character has a different bounding box. This means it is not just a question of moving the ink around, but has wider implications for line-length etc. (This is related to the typographic point.)
If widely deployed, could be a very useful source of new symbols in many different disciplines.
This requests a sans-serif font to be used. Since there is no requirement in Unicode for a font to have serifs in the first place, this could easily be a null operation. However, the concept is in the U C S in the names of the dingbats DINGBAT CIRCLED SANS-SERIF DIGIT ONE--NUMBER TEN and DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE--NUMBER TEN. Even with that, it seems unlikely that anyone would support it as a character. If Dingbats are to be allowed decompositions (and there are good reasons to do so), maybe the sans-serif numbers could be decomposed using SEMANTIC VARIANT, together with SEMANTIC WHITE and COMBINING ENCLOSING CIRCLE.
DINGBAT CIRCLED SANS-SERIF DIGIT EIGHT=DIGIT EIGHT+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT FIVE=DIGIT FIVE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT FOUR=DIGIT FOUR+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT NINE=DIGIT NINE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT ONE=DIGIT ONE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT SEVEN=DIGIT SEVEN+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT SIX=DIGIT SIX+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT THREE=DIGIT THREE+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF DIGIT TWO=DIGIT TWO+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT CIRCLED SANS-SERIF NUMBER TEN=START GROUP+DIGIT ONE+DIGIT ZERO+POP DIRECTIONAL FORMATTING+SEMANTIC VARIANT+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT EIGHT=DIGIT EIGHT+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FIVE=DIGIT FIVE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FOUR=DIGIT FOUR+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE=DIGIT NINE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE=DIGIT ONE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SEVEN=DIGIT SEVEN+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SIX=DIGIT SIX+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT THREE=DIGIT THREE+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT TWO=DIGIT TWO+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN=START GROUP+DIGIT ONE+DIGIT ZERO+POP DIRECTIONAL FORMATTING+SEMANTIC VARIANT+SEMANTIC WHITE+COMBINING ENCLOSING CIRCLE
Requests a script font be used. Many script characters are already present:
LATIN CAPITAL LETTER V WITH HOOK=LATIN CAPITAL LETTER V+SEMANTIC SCRIPT LATIN SMALL LETTER ALPHA=LATIN SMALL LETTER A+SEMANTIC SCRIPT LATIN SMALL LETTER SCRIPT G=LATIN SMALL LETTER G+SEMANTIC SCRIPT LATIN SMALL LETTER V WITH HOOK=LATIN SMALL LETTER V+SEMANTIC SCRIPT SCRIPT CAPITAL B=LATIN CAPITAL LETTER B+SEMANTIC SCRIPT SCRIPT CAPITAL E=LATIN CAPITAL LETTER E+SEMANTIC SCRIPT SCRIPT CAPITAL F=LATIN CAPITAL LETTER F+SEMANTIC SCRIPT SCRIPT CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC SCRIPT SCRIPT CAPITAL I=LATIN CAPITAL LETTER I+SEMANTIC SCRIPT SCRIPT CAPITAL L=LATIN CAPITAL LETTER L+SEMANTIC SCRIPT SCRIPT CAPITAL M=LATIN CAPITAL LETTER M+SEMANTIC SCRIPT SCRIPT CAPITAL P=LATIN SMALL LETTER P+SEMANTIC SCRIPT SCRIPT CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC SCRIPT SCRIPT SMALL E=LATIN SMALL LETTER E+SEMANTIC SCRIPT SCRIPT SMALL L=LATIN SMALL LETTER L+SEMANTIC SCRIPT SCRIPT SMALL O=LATIN SMALL LETTER O+SEMANTIC SCRIPT
(The v´s ``with hook´´ are really script letters, but there´s more on hooks later.)
Requests that a plinth-like shadow be drawn, with the glyph as the top surface. Conventionally, the light source is above and slightly to the left of the observer. (This can be changed by using TURNED, INVERTED or REVERSED.)
Could be used for
HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW=HEAVY RIGHTWARDS ARROW+SEMANTIC WHITE+SEMANTIC SHADOWED SHADOWED WHITE CIRCLE=WHITE CIRCLE+SEMANTIC SHADOWED SHADOWED WHITE LATIN CROSS=LATIN CROSS+SEMANTIC WHITE+SEMANTIC SHADOWED SHADOWED WHITE STAR=WHITE STAR+SEMANTIC SHADOWED LOWER RIGHT SHADOWED WHITE SQUARE=WHITE SQUARE+SEMANTIC SHADOWED
and there are some shadowed characters that have no base form: BACKTILTED SHADOWED WHITE RIGHTWARDS ARROW, FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW, NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW, NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW.
Hard to do well in software, but a missing shadow is not going to make the difference between comprehension and confusion, so the decompositions would be useful.
This SMALL just asks for a smaller version of the same character. It should not be confused with the SMALL in LATIN SMALL LETTER A, which means lower-case. It is also used, in conjunction with SEMANTIC FULLWIDTH, for decompositions including the tag <small>.
It is used in
BLACK DOWN-POINTING SMALL TRIANGLE=BLACK DOWN-POINTING TRIANGLE+SEMANTIC SMALL BLACK LEFT-POINTING SMALL TRIANGLE=BLACK LEFT-POINTING TRIANGLE+SEMANTIC SMALL BLACK RIGHT-POINTING SMALL TRIANGLE=BLACK RIGHT-POINTING TRIANGLE+SEMANTIC SMALL BLACK SMALL SQUARE=BLACK SQUARE+SEMANTIC SMALL BLACK UP-POINTING SMALL TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC SMALL HIRAGANA LETTER SMALL A=HIRAGANA LETTER A+SEMANTIC SMALL HIRAGANA LETTER SMALL E=HIRAGANA LETTER E+SEMANTIC SMALL HIRAGANA LETTER SMALL I=HIRAGANA LETTER I+SEMANTIC SMALL HIRAGANA LETTER SMALL O=HIRAGANA LETTER O+SEMANTIC SMALL HIRAGANA LETTER SMALL TU=HIRAGANA LETTER TU+SEMANTIC SMALL HIRAGANA LETTER SMALL U=HIRAGANA LETTER U+SEMANTIC SMALL HIRAGANA LETTER SMALL WA=HIRAGANA LETTER WA+SEMANTIC SMALL HIRAGANA LETTER SMALL YA=HIRAGANA LETTER YA+SEMANTIC SMALL HIRAGANA LETTER SMALL YO=HIRAGANA LETTER YO+SEMANTIC SMALL HIRAGANA LETTER SMALL YU=HIRAGANA LETTER YU+SEMANTIC SMALL KATAKANA LETTER SMALL A=KATAKANA LETTER A+SEMANTIC SMALL KATAKANA LETTER SMALL E=KATAKANA LETTER E+SEMANTIC SMALL KATAKANA LETTER SMALL I=KATAKANA LETTER I+SEMANTIC SMALL KATAKANA LETTER SMALL KA=KATAKANA LETTER KA+SEMANTIC SMALL KATAKANA LETTER SMALL KE=KATAKANA LETTER KE+SEMANTIC SMALL KATAKANA LETTER SMALL O=KATAKANA LETTER O+SEMANTIC SMALL KATAKANA LETTER SMALL TU=KATAKANA LETTER TU+SEMANTIC SMALL KATAKANA LETTER SMALL U=KATAKANA LETTER U+SEMANTIC SMALL KATAKANA LETTER SMALL WA=KATAKANA LETTER WA+SEMANTIC SMALL KATAKANA LETTER SMALL YA=KATAKANA LETTER YA+SEMANTIC SMALL KATAKANA LETTER SMALL YO=KATAKANA LETTER YO+SEMANTIC SMALL KATAKANA LETTER SMALL YU=KATAKANA LETTER YU+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL B=LATIN CAPITAL LETTER B+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL G WITH HOOK=LATIN CAPITAL LETTER G WITH HOOK+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL G=LATIN CAPITAL LETTER G+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL H=LATIN CAPITAL LETTER H+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL I=LATIN CAPITAL LETTER I+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL L=LATIN CAPITAL LETTER L+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL N=LATIN CAPITAL LETTER N+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL OE=LATIN CAPITAL LIGATURE OE+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL R=LATIN CAPITAL LETTER R+SEMANTIC SMALL LATIN LETTER SMALL CAPITAL Y=LATIN CAPITAL LETTER Y+SEMANTIC SMALL LATIN SMALL LETTER KRA=LATIN CAPITAL LETTER K+SEMANTIC SMALL MODIFIER LETTER DOWN TACK=DOWN TACK+SEMANTIC SMALL MODIFIER LETTER MINUS SIGN=MINUS SIGN+SEMANTIC SMALL MODIFIER LETTER PLUS SIGN=PLUS SIGN+SEMANTIC SMALL MODIFIER LETTER UP TACK=UP TACK+SEMANTIC SMALL
SMALL ELEMENT OF and SMALL CONTAINS AS MEMBER are really presentation variants of GREEK SMALL LETTER EPSILON, treated below.
Although it seems to be an easy thing to make a character smaller by algorithm, things are not as easy as they may seem. Stroke widths should typically remain harmonious with the rest of the font, so a simple reduction of scale may not be appropriate.
Some of these characters are used in ways that may not be clear from their representation: for example, LATIN LETTER SMALL CAPITAL R (= LATIN CAPITAL LETTER R + SEMANTIC SMALL) is the lower case form of LATIN LETTER YR, and LATIN SMALL LETTER KRA (= LATIN CAPITAL LETTER K + SEMANTIC SMALL) is also a lower case letter, though its upper case form is not coded.
This suggests, of a digit, that a variant glyph be used of a style suitable for marking Zhuang tone. It is for the following:
LATIN SMALL LETTER TONE FIVE=DIGIT FIVE+SEMANTIC SMALL LETTER TONE LATIN SMALL LETTER TONE SIX=DIGIT SIX+SEMANTIC SMALL LETTER TONE LATIN SMALL LETTER TONE TWO=DIGIT TWO+SEMANTIC SMALL LETTER TONE
There is a relationship between CYRILLIC SMALL LETTER CHE and DIGIT FOUR+SEMANTIC SMALL LETTER TONE, and also between CYRILLIC SMALL LETTER ZE and DIGIT THREE+SEMANTIC SMALL LETTER TONE, in that they are likely to be the same glyph; but it would be odd to give decompositions like (*)CYRILLIC SMALL LETTER CHE=DIGIT FOUR+SEMANTIC SMALL LETTER TONE, as this would imply the wrong historical relationship. Instead, uses of CYRILLIC SMALL LETTER CHE as a tone mark should simply be replaced by DIGIT FOUR+SEMANTIC SMALL LETTER TONE.
By encoding this character, it becomes possible for sophisticated software to render suitable glyphs for all the tone letters, without needing separate encodings for tones 3, 4.
Requests that characters be stacked in the given direction. The first character in the stack is placed at its normal position. The second is moved left, right, down or up to appear before, after, below or above the first. (`After´ and `before´ here refer to the current writing direction.)
This is another idea, like OVERPRINT, that could cause a lot of problems, as it is not obvious where a sensible place to stop might be.
Is an underlined character a character formed from COMBINING LOW LINE, or is it a down-stack with MINUS SIGN?
Can you make accented characters by stacking spacing accents above letters?
Is a LESS-THAN OR EQUAL TO sign a stack of a LESS-THAN SIGN and a MINUS SIGN? Although it looks like it in many fonts (including the one used in The Unicode Standard), we would really prefer it to be something to do with LESS-THAN SIGN and EQUALS SIGN, because that reflects the real meaning. (You might only get to see `<=´, which would still be very helpful.) As it´s also very often given its own glyph, e g with the underline parallel to the bottom part of the LESS-THAN SIGN, we prefer to regard this and similar characters as a ligature.
Despite these problems, the idea of a stack seems necessary. Consider the character EQUAL TO BY DEFINITION. This character is an equals sign with the small word `def´ on top of it. It seems ridiculous that this should be an atomic character, when the reason for its existence is the fact that d, e, f are the first 3 letters of the English word `definition´. Whichever mathematician invented that symbol was clearly ``sticking things together´´, and not just coming up with an arbitrary symbol from nowhere. Another mathematician might do a similar thing tomorrow, and it seems wrong that (in a perfectly logical world) that mathematician would have to get ``approval´´ from the Unicode Consortium (in the form of a character registration) before it could publish its book.
We need 4 different stacking characters to ensure visual harmony between the different presentation forms that can be generated. (Recall that the first character is rendered at its normal place, and the stack is built around it.) All are binary.
The SEMANTIC ABOVE concept has an antecedent in
T
The following are compositions using SEMANTIC ABOVE. In some cases (e g, MEASURED BY), there is a SEMANTIC SMALL for the second character.
ALL EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+REVERSED TILDE+SEMANTIC VARIANT+POP DIRECTIONAL FORMATTING ALMOST EQUAL TO=TILDE OPERATOR+SEMANTIC ABOVE+TILDE OPERATOR APPROACHES THE LIMIT=EQUALS SIGN+SEMANTIC ABOVE+DOT OPERATOR APPROXIMATELY BUT NOT ACTUALLY EQUAL TO=NOT EQUAL TO+SEMANTIC ABOVE+TILDE OPERATOR APPROXIMATELY EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+TILDE OPERATOR ASYMPTOTICALLY EQUAL TO=MINUS SIGN+SEMANTIC ABOVE+TILDE OPERATOR CORRESPONDS TO=EQUALS SIGN+SEMANTIC ABOVE+FROWN DELTA EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+INCREMENT+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING DOT MINUS=MINUS SIGN+SEMANTIC ABOVE+DOT OPERATOR DOT PLUS=PLUS SIGN+SEMANTIC ABOVE+DOT OPERATOR DOUBLE INTERSECTION=INTERSECTION+SEMANTIC ABOVE+INTERSECTION DOUBLE UNION=UNION+SEMANTIC ABOVE+UNION EQUIANGULAR TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+LOGICAL OR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING EQUIVALENT TO=FROWN+SEMANTIC ABOVE+SMILE ESTIMATES=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+LOGICAL AND+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING MEASURED BY=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+LATIN SMALL LETTER M+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING MINUS-OR-PLUS SIGN=PLUS SIGN+SEMANTIC ABOVE+MINUS SIGN NAND=LOGICAL AND+SEMANTIC ABOVE+LOW LINE NOR=LOGICAL OR+SEMANTIC ABOVE+LOW LINE NORTH WEST ARROW TO LONG BAR=NORTH WEST ARROW+COMBINING OVERLINE PERSPECTIVE=UP ARROWHEAD+SEMANTIC ABOVE+LOW LINE+SEMANTIC ABOVE+LOW LINE PROJECTIVE=UP ARROWHEAD+SEMANTIC ABOVE+LOW LINE QUESTIONED EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+QUESTION MARK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING REVERSED TILDE EQUALS=MINUS SIGN+SEMANTIC ABOVE+REVERSED TILDE RING EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+RING OPERATOR STAR EQUALS=EQUALS SIGN+SEMANTIC ABOVE+STAR OPERATOR TRIPLE TILDE=TILDE OPERATOR+SEMANTIC ABOVE+TILDE OPERATOR+SEMANTIC ABOVE+TILDE OPERATOR
The following use SEMANTIC BELOW:
DOUBLE LOW LINE=LOW LINE+COMBINING LOW LINE DOUBLE WAVY OVERLINE=WAVY OVERLINE+SEMANTIC ABOVE+WAVY OVERLINE GREATER-THAN BUT NOT EQUAL TO=GREATER-THAN SIGN+SEMANTIC BELOW+NOT EQUAL TO GREATER-THAN OVER EQUAL TO=GREATER-THAN SIGN+SEMANTIC BELOW+EQUALS SIGN LEFTWARDS ARROW OVER RIGHTWARDS ARROW=LEFTWARDS ARROW+SEMANTIC BELOW+RIGHTWARDS ARROW LEFTWARDS ARROW TO BAR OVER RIGHTWARDS ARROW TO BAR=LEFTWARDS ARROW TO BAR+SEMANTIC BELOW+RIGHTWARDS ARROW TO BAR LEFTWARDS HARPOON OVER RIGHTWARDS HARPOON=LEFTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC BELOW+RIGHTWARDS HARPOON WITH BARB DOWNWARDS LESS-THAN BUT NOT EQUAL TO=LESS-THAN SIGN+SEMANTIC BELOW+NOT EQUAL TO LESS-THAN OVER EQUAL TO=LESS-THAN SIGN+SEMANTIC BELOW+EQUALS SIGN MINUS TILDE=MINUS SIGN+SEMANTIC BELOW+TILDE OPERATOR PLUS-MINUS SIGN=PLUS SIGN+SEMANTIC BELOW+MINUS SIGN RIGHTWARDS ARROW OVER LEFTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC BELOW+LEFTWARDS ARROW RIGHTWARDS HARPOON OVER LEFTWARDS HARPOON=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC BELOW+LEFTWARDS HARPOON WITH BARB DOWNWARDS UP DOWN ARROW WITH BASE=UP DOWN ARROW+COMBINING LOW LINE UPWARDS ARROW FROM BAR=UPWARDS ARROW+COMBINING LOW LINE XOR=LOGICAL OR+COMBINING LOW LINE
Some characters need both:
DIVISION SIGN=MINUS SIGN+SEMANTIC ABOVE+DOT OPERATOR+SEMANTIC BELOW+DOT OPERATOR GEOMETRICALLY EQUAL TO=EQUALS SIGN+SEMANTIC ABOVE+DOT OPERATOR+SEMANTIC BELOW+DOT OPERATOR
(This assumes that SEMANTIC characters, if viewed as operators, are of equal precedence and associate to the left.) And of course there is character that caused all these problems:
EQUAL TO BY DEFINITION=EQUALS SIGN+SEMANTIC ABOVE+START GROUP+START GROUP+LATIN SMALL LETTER D+LATIN SMALL LETTER E+LATIN SMALL LETTER F+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING
(it seems horrible, but I see no real alternative). An unsophisticated rendering engine will be able to make a shot at this as `=def´, which seems like as good a result as one might hope for.
Many ``double´´ or ``triple´´ characters are made with AFTER:
ASTERISM=ASTERISK OPERATOR+SEMANTIC SMALL+SEMANTIC AFTER+START GROUP+ASTERISK OPERATOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING+SEMANTIC ABOVE+START GROUP+ASTERISK OPERATOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING DOUBLE EXCLAMATION MARK=EXCLAMATION MARK+SEMANTIC AFTER+EXCLAMATION MARK DOUBLE HIGH-REVERSED-9 QUOTATION MARK=SINGLE HIGH-REVERSED-9 QUOTATION MARK+SEMANTIC AFTER+SINGLE HIGH-REVERSED-9 QUOTATION MARK DOUBLE LOW-9 QUOTATION MARK=SINGLE LOW-9 QUOTATION MARK+SEMANTIC AFTER+SINGLE LOW-9 QUOTATION MARK DOUBLE PRIME=PRIME+SEMANTIC AFTER+PRIME DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE=VERTICAL LINE+SEMANTIC AFTER+TRUE DOUBLE VERTICAL LINE=VERTICAL LINE+SEMANTIC AFTER+VERTICAL LINE DOWNWARDS PAIRED ARROWS=DOWNWARDS ARROW+SEMANTIC AFTER+DOWNWARDS ARROW EQUALS COLON=EQUALS SIGN+SEMANTIC AFTER+COLON EXCESS=MINUS SIGN+SEMANTIC AFTER+COLON FORCES=VERTICAL LINE+SEMANTIC AFTER+ASSERTION IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL=SALTIRE+SEMANTIC AFTER+SALTIRE LATIN LETTER LATERAL CLICK=LATIN LETTER DENTAL CLICK+SEMANTIC AFTER+LATIN LETTER DENTAL CLICK LEFT DOUBLE ANGLE BRACKET=LEFT ANGLE BRACKET+SEMANTIC AFTER+LEFT ANGLE BRACKET LEFT DOUBLE QUOTATION MARK=LEFT SINGLE QUOTATION MARK+SEMANTIC AFTER+LEFT SINGLE QUOTATION MARK LEFT-POINTING DOUBLE ANGLE QUOTATION MARK=SINGLE LEFT-POINTING ANGLE QUOTATION MARK+SEMANTIC AFTER+SINGLE LEFT-POINTING ANGLE QUOTATION MARK LOW DOUBLE PRIME QUOTATION MARK=MODIFIER LETTER PRIME+SEMANTIC SUBSCRIPT+SEMANTIC AFTER+MODIFIER LETTER PRIME+SEMANTIC SUBSCRIPT PARALLEL TO=DIVIDES+SEMANTIC AFTER+DIVIDES PROPORTION=RATIO+SEMANTIC AFTER+RATIO QUOTATION MARK=APOSTROPHE+SEMANTIC AFTER+APOSTROPHE RIGHT DOUBLE ANGLE BRACKET=RIGHT ANGLE BRACKET+SEMANTIC AFTER+RIGHT ANGLE BRACKET RIGHT DOUBLE QUOTATION MARK=RIGHT SINGLE QUOTATION MARK+SEMANTIC AFTER+RIGHT SINGLE QUOTATION MARK RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK=SINGLE RIGHT-POINTING ANGLE QUOTATION MARK+SEMANTIC AFTER+SINGLE RIGHT-POINTING ANGLE QUOTATION MARK TIBETAN MARK NYIS SHAD=TIBETAN MARK SHAD+SEMANTIC AFTER+TIBETAN MARK SHAD TRIPLE PRIME=PRIME+SEMANTIC AFTER+PRIME+SEMANTIC AFTER+PRIME TRIPLE VERTICAL BAR RIGHT TURNSTILE=VERTICAL LINE+SEMANTIC AFTER+VERTICAL LINE+SEMANTIC AFTER+ASSERTION UPWARDS ARROW LEFTWARDS OF DOWNWARDS ARROW=UPWARDS ARROW+SEMANTIC AFTER+DOWNWARDS ARROW UPWARDS PAIRED ARROWS=UPWARDS ARROW+SEMANTIC AFTER+UPWARDS ARROW
Some characters would class as ligatures, execpt that there is no modification involved---one is just written straight after the other.
LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON=LATIN CAPITAL LETTER D+SEMANTIC AFTER+START GROUP+LATIN SMALL LETTER Z+COMBINING CARON+POP DIRECTIONAL FORMATTING LATIN CAPITAL LETTER D WITH SMALL LETTER Z=LATIN CAPITAL LETTER D+SEMANTIC AFTER+LATIN SMALL LETTER Z LATIN CAPITAL LETTER DZ WITH CARON=LATIN CAPITAL LETTER D+SEMANTIC AFTER+START GROUP+LATIN CAPITAL LETTER Z+COMBINING CARON+POP DIRECTIONAL FORMATTING LATIN CAPITAL LETTER DZ=LATIN CAPITAL LETTER D+SEMANTIC AFTER+LATIN CAPITAL LETTER Z LATIN CAPITAL LETTER L WITH SMALL LETTER J=LATIN CAPITAL LETTER L+SEMANTIC AFTER+LATIN SMALL LETTER J LATIN CAPITAL LETTER LJ=LATIN CAPITAL LETTER L+SEMANTIC AFTER+LATIN CAPITAL LETTER J LATIN CAPITAL LETTER N WITH SMALL LETTER J=LATIN CAPITAL LETTER N+SEMANTIC AFTER+LATIN SMALL LETTER J LATIN CAPITAL LETTER NJ=LATIN CAPITAL LETTER N+SEMANTIC AFTER+LATIN CAPITAL LETTER J LATIN SMALL LETTER DZ WITH CARON=LATIN SMALL LETTER D+SEMANTIC AFTER+START GROUP+LATIN SMALL LETTER Z+COMBINING CARON+POP DIRECTIONAL FORMATTING LATIN SMALL LETTER DZ=LATIN SMALL LETTER D+SEMANTIC AFTER+LATIN SMALL LETTER Z LATIN SMALL LETTER LJ=LATIN SMALL LETTER L+SEMANTIC AFTER+LATIN SMALL LETTER J LATIN SMALL LETTER NJ=LATIN SMALL LETTER N+SEMANTIC AFTER+LATIN SMALL LETTER J
There are even a few characters which seem to use BEFORE in a natural way.
CUBE ROOT=SQUARE ROOT+SEMANTIC BEFORE+SUPERSCRIPT THREE FOURTH ROOT=SQUARE ROOT+SEMANTIC BEFORE+SUPERSCRIPT FOUR
If widely deployed, the stack operations could cause no end of havoc by encouraging the creation of new ``symbols´´ in a very uncontrolled way. (On the other hand, maybe that´s a good thing.)
On the third hand, once we have accepted that this is a necessary operation, we could follow through and complete the job. If we consider PRESENTATION ABOVE to be a legitimate composition tool, we can see that a character like LATIN CAPITAL LETTER A WITH GRAVE is really just LATIN CAPITAL LETTER A+SEMANTIC ABOVE+GRAVE ACCENT. Since we already know that LATIN CAPITAL LETTER A WITH GRAVE=LATIN CAPITAL LETTER A+COMBINING GRAVE ACCENT, we are led to conclude that COMBINING GRAVE ACCENT=SEMANTIC ABOVE+GRAVE ACCENT. After a little consideration, this starts seeming to be a more natural view than the truth (which is that GRAVE ACCENT=SPACE+COMBINING GRAVE ACCENT), as it could allow us to decompose all combining characters into sequences involving ABOVE, BELOW and OVERPRINT. This approach allows a font designer to design just 1 ACUTE ACENT glyph, which can then be used automatically in all the places where it is right to do so.
COMBINING ACUTE ACCENT BELOW=SEMANTIC BELOW+ACUTE ACCENT COMBINING ACUTE ACCENT=SEMANTIC ABOVE+ACUTE ACCENT COMBINING ACUTE TONE MARK=SEMANTIC ABOVE+ACUTE ACCENT COMBINING ANTICLOCKWISE ARROW ABOVE=SEMANTIC ABOVE+ANTICLOCKWISE TOP SEMICIRCLE ARROW COMBINING ANTICLOCKWISE RING OVERLAY=SEMANTIC OVERPRINT+ANTICLOCKWISE OPEN CIRCLE ARROW COMBINING BREVE BELOW=SEMANTIC BELOW+BREVE COMBINING BREVE=SEMANTIC ABOVE+BREVE COMBINING BRIDGE BELOW=SEMANTIC BELOW+START GROUP+OPEN BOX+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING COMBINING CANDRABINDU=COMBINING BREVE+COMBINING DOT ABOVE COMBINING CARON BELOW=SEMANTIC BELOW+CARON COMBINING CARON=SEMANTIC ABOVE+CARON COMBINING CEDILLA=SEMANTIC BELOW+CEDILLA COMBINING CIRCUMFLEX ACCENT BELOW=SEMANTIC BELOW+MODIFIER LETTER CIRCUMFLEX ACCENT COMBINING CIRCUMFLEX ACCENT=SEMANTIC ABOVE+MODIFIER LETTER CIRCUMFLEX ACCENT COMBINING CLOCKWISE ARROW ABOVE=SEMANTIC ABOVE+CLOCKWISE TOP SEMICIRCLE ARROW COMBINING CLOCKWISE RING OVERLAY=SEMANTIC OVERPRINT+CLOCKWISE OPEN CIRCLE ARROW COMBINING COMMA ABOVE RIGHT=SEMANTIC AFTER+RIGHT SINGLE QUOTATION MARK COMBINING COMMA ABOVE=SEMANTIC ABOVE+COMMA COMBINING COMMA BELOW=SEMANTIC BELOW+COMMA COMBINING CYRILLIC DASIA PNEUMATA=SEMANTIC ABOVE+START GROUP+RIGHT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING CYRILLIC PSILI PNEUMATA=SEMANTIC ABOVE+START GROUP+LEFT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING DIAERESIS BELOW=SEMANTIC BELOW+DIAERESIS COMBINING DIAERESIS=SEMANTIC ABOVE+DIAERESIS COMBINING DOT ABOVE=SEMANTIC ABOVE+DOT ABOVE COMBINING DOT BELOW=SEMANTIC BELOW+DOT ABOVE COMBINING DOUBLE ACUTE ACCENT=SEMANTIC ABOVE+DOUBLE ACUTE ACCENT COMBINING DOUBLE GRAVE ACCENT=SEMANTIC ABOVE+START GROUP+GRAVE ACCENT+SEMANTIC AFTER+GRAVE ACCENT+POP DIRECTIONAL FORMATTING COMBINING DOUBLE LOW LINE=SEMANTIC BELOW+DOUBLE LOW LINE COMBINING DOUBLE OVERLINE=SEMANTIC ABOVE+LOW LINE+SEMANTIC ABOVE+LOW LINE COMBINING DOUBLE VERTICAL LINE ABOVE=SEMANTIC ABOVE+START GROUP+MODIFIER LETTER VERTICAL LINE+MODIFIER LETTER VERTICAL LINE+POP DIRECTIONAL FORMATTING COMBINING DOWN TACK BELOW=SEMANTIC BELOW+START GROUP+DOWN TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING ENCLOSING CIRCLE BACKSLASH=SEMANTIC OVERPRINT+LARGE CIRCLE+SEMANTIC OVERPRINT+REVERSE SOLIDUS COMBINING ENCLOSING CIRCLE=SEMANTIC OVERPRINT+LARGE CIRCLE COMBINING ENCLOSING DIAMOND=SEMANTIC OVERPRINT+LOZENGE COMBINING ENCLOSING SQUARE=SEMANTIC OVERPRINT+BALLOT BOX COMBINING FOUR DOTS ABOVE=SEMANTIC ABOVE+START GROUP+DOT ABOVE+DOT ABOVE+DOT ABOVE+DOT ABOVE+POP DIRECTIONAL FORMATTING COMBINING GRAVE ACCENT BELOW=SEMANTIC BELOW+GRAVE ACCENT COMBINING GRAVE ACCENT=SEMANTIC ABOVE+GRAVE ACCENT COMBINING GRAVE TONE MARK=SEMANTIC ABOVE+GRAVE ACCENT COMBINING GREEK DIALYTIKA TONOS=COMBINING DIAERESIS+COMBINING VERTICAL LINE ABOVE COMBINING GREEK KORONIS=SEMANTIC ABOVE+GREEK KORONIS COMBINING GREEK PERISPOMENI=SEMANTIC ABOVE+GREEK PERISPOMENI COMBINING GREEK YPOGEGRAMMENI=SEMANTIC BELOW+GREEK YPOGEGRAMMENI COMBINING HOOK ABOVE=SEMANTIC ABOVE+MODIFIER LETTER GLOTTAL STOP COMBINING INVERTED BREVE BELOW=SEMANTIC BELOW+START GROUP+BREVE+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING COMBINING INVERTED BREVE=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING COMBINING INVERTED BRIDGE BELOW=SEMANTIC BELOW+OPEN BOX COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK=SEMANTIC AFTER+KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK=SEMANTIC AFTER+KATAKANA-HIRAGANA VOICED SOUND MARK COMBINING LEFT ANGLE ABOVE=SEMANTIC ABOVE+START GROUP+NOT SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING LEFT ARROW ABOVE=SEMANTIC ABOVE+START GROUP+LEFTWARDS ARROW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING LEFT HALF RING BELOW=SEMANTIC BELOW+MODIFIER LETTER CENTRED LEFT HALF RING COMBINING LEFT HARPOON ABOVE=SEMANTIC ABOVE+START GROUP+LEFTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING LEFT RIGHT ARROW ABOVE=SEMANTIC ABOVE+START GROUP+LEFT RIGHT ARROW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING LEFT TACK BELOW=SEMANTIC BELOW+START GROUP+LEFT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING LONG SOLIDUS OVERLAY=SEMANTIC OVERPRINT+SOLIDUS COMBINING LONG STROKE OVERLAY=SEMANTIC OVERPRINT+EN DASH COMBINING LONG VERTICAL LINE OVERLAY=SEMANTIC OVERPRINT+VERTICAL LINE COMBINING LOW LINE=SEMANTIC BELOW+LOW LINE COMBINING MACRON BELOW=SEMANTIC BELOW+MACRON COMBINING MACRON=SEMANTIC ABOVE+MACRON COMBINING MINUS SIGN BELOW=SEMANTIC BELOW+START GROUP+MINUS SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING OGONEK=SEMANTIC BELOW+OGONEK COMBINING OVERLINE=SEMANTIC ABOVE+LOW LINE COMBINING PLUS SIGN BELOW=SEMANTIC BELOW+START GROUP+PLUS SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING REVERSED COMMA ABOVE=SEMANTIC ABOVE+START GROUP+COMMA+SEMANTIC REVERSED+POP DIRECTIONAL FORMATTING COMBINING RIGHT ARROW ABOVE=SEMANTIC ABOVE+START GROUP+RIGHTWARDS ARROW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING RIGHT HALF RING BELOW=SEMANTIC BELOW+MODIFIER LETTER CENTRED RIGHT HALF RING COMBINING RIGHT HARPOON ABOVE=SEMANTIC ABOVE+START GROUP+RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING RIGHT TACK BELOW=SEMANTIC BELOW+START GROUP+RIGHT TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING RING ABOVE=SEMANTIC ABOVE+RING ABOVE COMBINING RING BELOW=SEMANTIC BELOW+RING ABOVE COMBINING RING OVERLAY=SEMANTIC OVERPRINT+RING OPERATOR COMBINING SHORT STROKE OVERLAY=SEMANTIC OVERPRINT+NON-BREAKING HYPHEN COMBINING SHORT VERTICAL LINE OVERLAY=SEMANTIC OVERPRINT+VERTICAL STROKE COMBINING SQUARE BELOW=SEMANTIC BELOW+START GROUP+BALLOT BOX+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING THREE DOTS ABOVE=SEMANTIC ABOVE+START GROUP+DOT ABOVE+DOT ABOVE+DOT ABOVE+POP DIRECTIONAL FORMATTING COMBINING TILDE BELOW=SEMANTIC BELOW+SMALL TILDE COMBINING TILDE OVERLAY=SEMANTIC OVERPRINT+TILDE OPERATOR COMBINING TILDE=SEMANTIC ABOVE+SMALL TILDE COMBINING TURNED COMMA ABOVE=SEMANTIC ABOVE+START GROUP+COMMA+SEMANTIC TURNED+POP DIRECTIONAL FORMATTING COMBINING UP TACK BELOW=SEMANTIC BELOW+START GROUP+UP TACK+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING COMBINING VERTICAL LINE ABOVE=SEMANTIC ABOVE+MODIFIER LETTER VERTICAL LINE COMBINING VERTICAL LINE BELOW=SEMANTIC BELOW+MODIFIER LETTER LOW VERTICAL LINE COMBINING VERTICAL TILDE=SEMANTIC ABOVE+START GROUP+SMALL TILDE+SEMANTIC ROTATED+POP DIRECTIONAL FORMATTING COMBINING X ABOVE=SEMANTIC ABOVE+START GROUP+MULTIPLICATION SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HANGUL DOUBLE DOT TONE MARK=SEMANTIC BEFORE+COLON HANGUL SINGLE DOT TONE MARK=SEMANTIC BEFORE+MIDDLE DOT IDEOGRAPHIC DEPARTING TONE MARK=SEMANTIC AFTER+RING ABOVE IDEOGRAPHIC ENTERING TONE MARK=SEMANTIC AFTER+START GROUP+ZERO WIDTH NO-BREAK SPACE+COMBINING RING BELOW+POP DIRECTIONAL FORMATTING IDEOGRAPHIC LEVEL TONE MARK=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+COMBINING RING BELOW+POP DIRECTIONAL FORMATTING IDEOGRAPHIC RISING TONE MARK=SEMANTIC BEFORE+RING ABOVE LATIN CAPITAL LETTER L WITH MIDDLE DOT=LATIN CAPITAL LETTER L+SEMANTIC AFTER+MIDDLE DOT LATIN SMALL LETTER L WITH MIDDLE DOT=LATIN SMALL LETTER L+SEMANTIC AFTER+MIDDLE DOT LATIN SMALL LETTER N PRECEDED BY APOSTROPHE=LATIN SMALL LETTER N+SEMANTIC BEFORE+RIGHT SINGLE QUOTATION MARK TIBETAN MARK NGAS BZUNG NYI ZLA=COMBINING RING BELOW+COMBINING BREVE BELOW TIBETAN MARK NGAS BZUNG SGOR RTAGS=COMBINING RING BELOW TIBETAN SIGN RJES SU NGA RO=COMBINING RING ABOVE TIBETAN SIGN SNA LDAN=COMBINING BREVE+COMBINING RING ABOVE TIBETAN SIGN YANG RTAGS=COMBINING VERTICAL LINE ABOVE TIBETAN VOWEL SIGN EE=TIBETAN VOWEL SIGN E+TIBETAN VOWEL SIGN E TIBETAN VOWEL SIGN OO=TIBETAN VOWEL SIGN O+TIBETAN VOWEL SIGN O
The characters COMBINING DOUBLE TILDE and COMBINING DOUBLE INVERTED BREVE already have their own syntax (defined in section 3.9 of Unicode 2.0). For the purposes of the Atomic Theory, we prefer to brush these under the carpet: since they don´t fit the theory, we just pretend they don´t exist. (This isn´t as silly as it sounds: since we have a grouping mechanism, we can use that instead.) The 2 examples given on p3-9 of The Unicode Standard, Version 2·0 (LATIN SMALL LETTER O, COMBINING CIRCUMFLEX, COMBINING DOUBLE TILDE, LATIN SMALL LETTER O, COMBINING DIAERESIS; and LATIN SMALL LETTER O, COMBINING DOUBLE TILDE, COMBINING CIRCUMFLEX, LATIN SMALL LETTER O, COMBINING DIAERESIS) would be replaced by START GROUP, LATIN SMALL LETTER O, COMBINING CIRCUMFLEX, LATIN SMALL LETTER O, COMBINING DIAERESIS, POP DIRECTIONAL FORMATTING, COMBINING TILDE, where the renderer is assumed to be clever enough to work out that it needs an extra-big tilde to cover a group of 2 characters. (Or not, of course.)
In order to at least do something presentable with these characters, we say that
COMBINING DOUBLE INVERTED BREVE=COMBINING INVERTED BREVE COMBINING DOUBLE TILDE=COMBINING TILDE
Since WHITE CIRCLE, WHITE DIAMOND and WHITE SQUARE are themselves composite, we prefer to regard COMBINING ENCLOSING CIRCLE, COMBINING ENCLOSING DIAMOND and COMBINING ENCLOSING SQUARE as composed from LARGE CIRCLE, LOZENGE and BALLOT BOX respectively.
We also have to delete the decompositions for everything named on the right above (the ones starting with SPACE), or we get infinite regressions of decomposition.
ACUTE ACCENT= BREVE= CEDILLA= CENTRELINE LOW LINE= DASHED LOW LINE= DIAERESIS=DOT ABOVE+SEMANTIC AFTER+DOT ABOVE DOT ABOVE= DOUBLE ACUTE ACCENT=ACUTE ACCENT+SEMANTIC AFTER+ACUTE ACCENT KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK= KATAKANA-HIRAGANA VOICED SOUND MARK= LOW LINE= MACRON= MODIFIER LETTER CIRCUMFLEX ACCENT= OGONEK= RING ABOVE= SMALL TILDE= WAVY LOW LINE=
The Tibetan subjoined letters are BELOW.
TIBETAN SUBJOINED LETTER KA=SEMANTIC BELOW+TIBETAN LETTER KA TIBETAN SUBJOINED LETTER KHA=SEMANTIC BELOW+TIBETAN LETTER KHA TIBETAN SUBJOINED LETTER GA=SEMANTIC BELOW+TIBETAN LETTER GA TIBETAN SUBJOINED LETTER NGA=SEMANTIC BELOW+TIBETAN LETTER NGA TIBETAN SUBJOINED LETTER CA=SEMANTIC BELOW+TIBETAN LETTER CA TIBETAN SUBJOINED LETTER JA=SEMANTIC BELOW+TIBETAN LETTER JA TIBETAN SUBJOINED LETTER NYA=SEMANTIC BELOW+TIBETAN LETTER NYA TIBETAN SUBJOINED LETTER TTA=SEMANTIC BELOW+TIBETAN LETTER TTA TIBETAN SUBJOINED LETTER TTHA=SEMANTIC BELOW+TIBETAN LETTER TTHA TIBETAN SUBJOINED LETTER DDA=SEMANTIC BELOW+TIBETAN LETTER DDA TIBETAN SUBJOINED LETTER NNA=SEMANTIC BELOW+TIBETAN LETTER NNA TIBETAN SUBJOINED LETTER TA=SEMANTIC BELOW+TIBETAN LETTER TA TIBETAN SUBJOINED LETTER THA=SEMANTIC BELOW+TIBETAN LETTER THA TIBETAN SUBJOINED LETTER DA=SEMANTIC BELOW+TIBETAN LETTER DA TIBETAN SUBJOINED LETTER NA=SEMANTIC BELOW+TIBETAN LETTER NA TIBETAN SUBJOINED LETTER PA=SEMANTIC BELOW+TIBETAN LETTER PA TIBETAN SUBJOINED LETTER PHA=SEMANTIC BELOW+TIBETAN LETTER PHA TIBETAN SUBJOINED LETTER BA=SEMANTIC BELOW+TIBETAN LETTER BA TIBETAN SUBJOINED LETTER MA=SEMANTIC BELOW+TIBETAN LETTER MA TIBETAN SUBJOINED LETTER TSA=SEMANTIC BELOW+TIBETAN LETTER TSA TIBETAN SUBJOINED LETTER TSHA=SEMANTIC BELOW+TIBETAN LETTER TSHA TIBETAN SUBJOINED LETTER DZA=SEMANTIC BELOW+TIBETAN LETTER DZA TIBETAN SUBJOINED LETTER WA=SEMANTIC BELOW+TIBETAN LETTER WA TIBETAN SUBJOINED LETTER YA=SEMANTIC BELOW+TIBETAN LETTER YA TIBETAN SUBJOINED LETTER RA=SEMANTIC BELOW+TIBETAN LETTER RA TIBETAN SUBJOINED LETTER LA=SEMANTIC BELOW+TIBETAN LETTER LA TIBETAN SUBJOINED LETTER SHA=SEMANTIC BELOW+TIBETAN LETTER SHA TIBETAN SUBJOINED LETTER SSA=SEMANTIC BELOW+TIBETAN LETTER SSA TIBETAN SUBJOINED LETTER SA=SEMANTIC BELOW+TIBETAN LETTER SA TIBETAN SUBJOINED LETTER HA=SEMANTIC BELOW+TIBETAN LETTER HA
A glyph may be moved vertically relative to the baseline, without being changed in size or orientation. We can use this idea to ``decompose´´ many characters which share the same glyph, but at different positions. We take the most central one as the base form. The raised one is then considered to be stacked ABOVE an invisible character of height 1ex. (We use ZERO WIDTH NO-BREAK SPACE for this.) The lowered one is considered to be stacked below the same character. The idea that a space character might have height is a little strange, but it allows to to ``reuse´´ many glyph designs, for even more characters.
CENTRELINE OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+CENTRELINE LOW LINE DASHED OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+DASHED LOW LINE MODIFIER LETTER LEFT HALF RING=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+MODIFIER LETTER CENTRED LEFT HALF RING MODIFIER LETTER LOW ACUTE ACCENT=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+ACUTE ACCENT MODIFIER LETTER LOW GRAVE ACCENT=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+GRAVE ACCENT MODIFIER LETTER LOW MACRON=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+MACRON MODIFIER LETTER RIGHT HALF RING=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+MODIFIER LETTER CENTRED RIGHT HALF RING OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+LOW LINE WAVY OVERLINE=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+WAVY LOW LINE
Requests that a character be rendered at a smaller size, and with a lower baseline.
Would be used in all characters whose decomposition includes <sub> (there are 15 of these) as well as
GREEK LOWER NUMERAL SIGN=<sub>+MODIFIER LETTER PRIME MODIFIER LETTER LOW VERTICAL LINE=<sub>+VERTICAL LINE
Easy to do algorithmically.
If used but not recognised, the resulting test will be wrong, but still better than if a substitute character was used.
Requests that a character be rendered at a smaller size, and above the baseline. A superscript is equivalent to a raised subscript.
Would be used in all characters whose decomposition includes <super> (there are about 50 of these) as well as:
ASTERISK=<super>+ASTERISK OPERATOR CIRCUMFLEX ACCENT=<super>+UP ARROWHEAD DEGREE SIGN=<super>+RING OPERATOR MODIFIER LETTER VERTICAL LINE=<super>+VERTICAL LINE PRIME=<super>+MODIFIER LETTER PRIME TILDE=<super>+TILDE OPERATOR
The character used as a tilde accent is SMALL TILDE, not this TILDE. The
character TILDE is a mixed-use character, so we may as well make it look
consistent with ASTERISK (which is ``clearly´´ a superscript of some
sort, and ASTERISK OPERATOR is the only possibility) and DEGREE SIGN. The same
applies to circumflex---it appears that MODIFIER LETTER CIRCUMFLEX ACCENT is
the recommended character for making accents, as CIRCUMFLEX ACCENT is very ugly.
The idea that PRIME is a superscript is due to T
Easy to do algorithmically.
If used but not recognised, the resulting test will be wrong, but still better than if a substitute character was used.
Rotates the character through half a turn in its own plane. Equivalent to REVERSED followed by INVERTED, or to ROTATED twice.
Could be used in:
BECAUSE=THEREFORE+SEMANTIC TURNED BLACK DOWN-POINTING TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC TURNED BLACK LEFT-POINTING POINTER=BLACK RIGHT-POINTING POINTER+SEMANTIC TURNED BLACK LEFT-POINTING TRIANGLE=BLACK RIGHT-POINTING TRIANGLE+SEMANTIC TURNED BLACK LOWER LEFT TRIANGLE=BLACK UPPER RIGHT TRIANGLE+SEMANTIC TURNED BOTTOM LEFT CORNER=TOP RIGHT CORNER+SEMANTIC TURNED BOTTOM LEFT CROP=TOP RIGHT CROP+SEMANTIC TURNED CIRCLE WITH LEFT HALF BLACK=CIRCLE WITH RIGHT HALF BLACK+SEMANTIC TURNED CIRCLE WITH LOWER HALF BLACK=CIRCLE WITH UPPER HALF BLACK+SEMANTIC TURNED CONTAINS AS MEMBER=ELEMENT OF+SEMANTIC TURNED CONTAINS AS NORMAL SUBGROUP=NORMAL SUBGROUP OF+SEMANTIC TURNED DESCENDING NODE=ASCENDING NODE+SEMANTIC TURNED DOWN ARROWHEAD=UP ARROWHEAD+SEMANTIC TURNED DOWN TACK=UP TACK+SEMANTIC TURNED DOWNWARDS ARROW FROM BAR=UPWARDS ARROW FROM BAR+SEMANTIC TURNED DOWNWARDS ARROW=UPWARDS ARROW+SEMANTIC TURNED DOWNWARDS DASHED ARROW=UPWARDS DASHED ARROW+SEMANTIC TURNED DOWNWARDS HARPOON WITH BARB RIGHTWARDS=UPWARDS HARPOON WITH BARB LEFTWARDS+SEMANTIC TURNED DOWNWARDS TWO HEADED ARROW=UPWARDS TWO HEADED ARROW+SEMANTIC TURNED ERASE TO THE RIGHT=ERASE TO THE LEFT+SEMANTIC TURNED FOR ALL=LATIN CAPITAL LETTER A+SEMANTIC TURNED FROWN=SMILE+SEMANTIC TURNED GREATER-THAN SIGN=LESS-THAN SIGN+SEMANTIC TURNED INTERSECTION=UNION+SEMANTIC TURNED INVERTED EXCLAMATION MARK=EXCLAMATION MARK+SEMANTIC TURNED INVERTED OHM SIGN=OHM SIGN+SEMANTIC TURNED INVERTED QUESTION MARK=QUESTION MARK+SEMANTIC TURNED LAST QUARTER MOON=FIRST QUARTER MOON+SEMANTIC TURNED LATIN CAPITAL LETTER OPEN O=LATIN CAPITAL LETTER C+SEMANTIC TURNED LATIN CAPITAL LETTER REVERSED E=LATIN CAPITAL LETTER E+SEMANTIC TURNED LATIN SMALL LETTER DOTLESS J WITH STROKE AND HOOK=LATIN SMALL LETTER F WITH HOOK+SEMANTIC TURNED LATIN SMALL LETTER DOTLESS J WITH STROKE=LATIN SMALL LETTER F+SEMANTIC TURNED LATIN SMALL LETTER OPEN O=LATIN SMALL LETTER C+SEMANTIC TURNED LATIN SMALL LETTER SCHWA=LATIN SMALL LETTER E+SEMANTIC TURNED LATIN SMALL LETTER TURNED A=LATIN SMALL LETTER A+SEMANTIC TURNED LATIN SMALL LETTER TURNED ALPHA=LATIN SMALL LETTER ALPHA+SEMANTIC TURNED LATIN SMALL LETTER TURNED DELTA=GREEK SMALL LETTER DELTA+SEMANTIC TURNED LATIN SMALL LETTER TURNED E=LATIN SMALL LETTER E+SEMANTIC TURNED LATIN SMALL LETTER TURNED H=LATIN SMALL LETTER H+SEMANTIC TURNED LATIN SMALL LETTER TURNED K=LATIN SMALL LETTER K+SEMANTIC TURNED LATIN SMALL LETTER TURNED M=LATIN SMALL LETTER M+SEMANTIC TURNED LATIN SMALL LETTER TURNED R WITH LONG LEG=LATIN SMALL LETTER R WITH LONG LEG+SEMANTIC TURNED LATIN SMALL LETTER TURNED R=LATIN SMALL LETTER R+SEMANTIC TURNED LATIN SMALL LETTER TURNED T=LATIN SMALL LETTER T+SEMANTIC TURNED LATIN SMALL LETTER TURNED V=LATIN SMALL LETTER V+SEMANTIC TURNED LATIN SMALL LETTER TURNED W=LATIN SMALL LETTER W+SEMANTIC TURNED LATIN SMALL LETTER TURNED Y=LATIN SMALL LETTER Y+SEMANTIC TURNED LEFT HALF BLACK CIRCLE=RIGHT HALF BLACK CIRCLE+SEMANTIC TURNED LEFT TACK=RIGHT TACK+SEMANTIC TURNED LEFTWARDS ARROW FROM BAR=RIGHTWARDS ARROW FROM BAR+SEMANTIC TURNED LEFTWARDS ARROW TO BAR=RIGHTWARDS ARROW TO BAR+SEMANTIC TURNED LEFTWARDS ARROW WITH TAIL=RIGHTWARDS ARROW WITH TAIL+SEMANTIC TURNED LEFTWARDS ARROW=RIGHTWARDS ARROW+SEMANTIC TURNED LEFTWARDS DASHED ARROW=RIGHTWARDS DASHED ARROW+SEMANTIC TURNED LEFTWARDS HARPOON WITH BARB DOWNWARDS=RIGHTWARDS HARPOON WITH BARB UPWARDS+SEMANTIC TURNED LEFTWARDS PAIRED ARROWS=RIGHTWARDS PAIRED ARROWS+SEMANTIC TURNED LEFTWARDS SQUIGGLE ARROW=RIGHTWARDS SQUIGGLE ARROW+SEMANTIC TURNED LEFTWARDS TRIPLE ARROW=RIGHTWARDS TRIPLE ARROW+SEMANTIC TURNED LEFTWARDS TWO HEADED ARROW=RIGHTWARDS TWO HEADED ARROW+SEMANTIC TURNED LEFTWARDS WAVE ARROW=RIGHTWARDS WAVE ARROW+SEMANTIC TURNED LOGICAL AND=LOGICAL OR+SEMANTIC TURNED LOWER LEFT QUADRANT CIRCULAR ARC=UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC TURNED N-ARY COPRODUCT=N-ARY PRODUCT+SEMANTIC TURNED NABLA=INCREMENT+SEMANTIC TURNED OCR INVERTED FORK=OCR FORK+SEMANTIC TURNED ORIGINAL OF=IMAGE OF+SEMANTIC TURNED RIGHT HALF BLOCK=LEFT HALF BLOCK+SEMANTIC TURNED SMALL CONTAINS AS MEMBER=SMALL ELEMENT OF+SEMANTIC TURNED SOUTH WEST ARROW=NORTH EAST ARROW+SEMANTIC TURNED SQUARE WITH LEFT HALF BLACK=SQUARE WITH RIGHT HALF BLACK+SEMANTIC TURNED SUCCEEDS UNDER RELATION=PRECEDES UNDER RELATION+SEMANTIC TURNED SUPERSET OF=SUBSET OF+SEMANTIC TURNED THERE EXISTS=LATIN CAPITAL LETTER E+SEMANTIC TURNED TURNED CAPITAL F=LATIN CAPITAL LETTER F+SEMANTIC TURNED TURNED GREEK SMALL LETTER IOTA=GREEK SMALL LETTER IOTA+SEMANTIC TURNED TURNED NOT SIGN=NOT SIGN+SEMANTIC TURNED UNDERTIE=CHARACTER TIE+SEMANTIC TURNED
There are 2 characters with TURNED in the name (implicitly, anyway) that would not be coded directly with SEMANTIC TURNED: LATIN CAPITAL LETTER TURNED M and LATIN CAPITAL LETTER SCHWA. These are large turned versions of a lower-case character. Maybe they should be decomposed as
LATIN CAPITAL LETTER SCHWA=LATIN SMALL LETTER E+SEMANTIC LARGE+SEMANTIC TURNED LATIN CAPITAL LETTER TURNED M=LATIN SMALL LETTER M+SEMANTIC LARGE+SEMANTIC TURNED
---or maybe these characters have a completely different origin?
This is glyph modification. It´s a bit of a miscellany, but it has
sound antecedents (e g, in T
CURVED STEM PARAGRAPH SIGN ORNAMENT=PILCROW SIGN+SEMANTIC VARIANT CYRILLIC CAPITAL LETTER BASHKIR KA=CYRILLIC CAPITAL LETTER KA+SEMANTIC VARIANT CYRILLIC CAPITAL LETTER GHE WITH UPTURN=CYRILLIC CAPITAL LETTER GHE+SEMANTIC VARIANT CYRILLIC CAPITAL LETTER ROUND OMEGA=CYRILLIC CAPITAL LETTER O+SEMANTIC VARIANT CYRILLIC CAPITAL LETTER STRAIGHT U=CYRILLIC CAPITAL LETTER U+SEMANTIC VARIANT CYRILLIC SMALL LETTER BASHKIR KA=CYRILLIC SMALL LETTER KA+SEMANTIC VARIANT CYRILLIC SMALL LETTER GHE WITH UPTURN=CYRILLIC SMALL LETTER GHE+SEMANTIC VARIANT CYRILLIC SMALL LETTER ROUND OMEGA=CYRILLIC SMALL LETTER OMEGA+SEMANTIC VARIANT CYRILLIC SMALL LETTER STRAIGHT U=CYRILLIC SMALL LETTER U+SEMANTIC VARIANT EIGHT POINTED PINWHEEL STAR=PINWHEEL STAR+SEMANTIC VARIANT FLORAL HEART=BLACK HEART SUIT+SEMANTIC VARIANT GREEK BETA SYMBOL=GREEK SMALL LETTER BETA+SEMANTIC VARIANT GREEK KAPPA SYMBOL=GREEK SMALL LETTER KAPPA+SEMANTIC VARIANT GREEK LUNATE SIGMA SYMBOL=GREEK SMALL LETTER SIGMA+SEMANTIC VARIANT GREEK PHI SYMBOL=GREEK SMALL LETTER PHI+SEMANTIC VARIANT GREEK PI SYMBOL=GREEK SMALL LETTER PI+SEMANTIC VARIANT GREEK RHO SYMBOL=GREEK SMALL LETTER RHO+SEMANTIC VARIANT GREEK THETA SYMBOL=GREEK SMALL LETTER THETA+SEMANTIC VARIANT GREEK UPSILON WITH HOOK SYMBOL=GREEK CAPITAL LETTER UPSILON+SEMANTIC VARIANT HANGUL CHOSEONG CHITUEUMCHIEUCH=HANGUL CHOSEONG CHIEUCH+SEMANTIC VARIANT HANGUL CHOSEONG CHITUEUMCIEUC=HANGUL CHOSEONG CIEUC+SEMANTIC VARIANT HANGUL CHOSEONG CHITUEUMSIOS=HANGUL CHOSEONG SIOS+SEMANTIC VARIANT HANGUL CHOSEONG PANSIOS=HANGUL CHOSEONG SIOS+SEMANTIC VARIANT HANGUL JONGSEONG PANSIOS=HANGUL JONGSEONG SIOS+SEMANTIC VARIANT HEBREW LETTER ALTERNATIVE AYIN=HEBREW LETTER AYIN+SEMANTIC VARIANT HEBREW LETTER ALTERNATIVE PLUS SIGN=PLUS SIGN+SEMANTIC VARIANT INVERTED LAZY S=TILDE OPERATOR+SEMANTIC VARIANT LATIN CAPITAL LETTER B WITH TOPBAR=CYRILLIC CAPITAL LETTER BE+SEMANTIC VARIANT LATIN CAPITAL LETTER OPEN E=LATIN CAPITAL LETTER E+SEMANTIC VARIANT LATIN LETTER STRETCHED C=LATIN CAPITAL LETTER C+SEMANTIC VARIANT LATIN SMALL LETTER B WITH TOPBAR=CYRILLIC SMALL LETTER BE+SEMANTIC VARIANT LATIN SMALL LETTER LONG S=LATIN SMALL LETTER S+SEMANTIC VARIANT LATIN SMALL LETTER R WITH FISHHOOK=LATIN SMALL LETTER R+SEMANTIC VARIANT MODIFIER LETTER TRIANGULAR COLON=COLON+SEMANTIC VARIANT ORNATE LEFT PARENTHESIS=LEFT PARENTHESIS+SEMANTIC VARIANT ORNATE RIGHT PARENTHESIS=RIGHT PARENTHESIS+SEMANTIC VARIANT PARTIAL DIFFERENTIAL=LATIN SMALL LETTER D+SEMANTIC VARIANT SMALL ELEMENT OF=GREEK SMALL LETTER EPSILON+SEMANTIC VARIANT TIBETAN MARK RIN CHEN SPUNGS SHAD=TIBETAN MARK SHAD+SEMANTIC VARIANT TIGHT TRIFOLIATE SNOWFLAKE=SNOWFLAKE+SEMANTIC VARIANT
(Despite its name, LETTER ROUND OMEGA is a variant of LETTER O.) There are also some mathematical symbols that are best considered as glyph variants.
BROKEN BAR=VERTICAL LINE+SEMANTIC VARIANT CURLY LOGICAL AND=LOGICAL AND+SEMANTIC VARIANT CURLY LOGICAL OR=LOGICAL OR+SEMANTIC VARIANT DIVISION SLASH=SOLIDUS+SEMANTIC VARIANT PRECEDES=LESS-THAN SIGN+SEMANTIC VARIANT RATIO=COLON+SEMANTIC VARIANT SET MINUS=REVERSE SOLIDUS+SEMANTIC VARIANT SQUARE CAP=INTERSECTION+SEMANTIC VARIANT SQUARE CUP=UNION+SEMANTIC VARIANT SQUARE IMAGE OF=SUBSET OF+SEMANTIC VARIANT SQUARE ORIGINAL OF=SUPERSET OF+SEMANTIC VARIANT SUCCEEDS=GREATER-THAN SIGN+SEMANTIC VARIANT WHITE SQUARE WITH ROUNDED CORNERS=WHITE SQUARE+SEMANTIC VARIANT
Decomposition of the CURLY and SQUARE mathematical symbols as variants may seem capricious, but it is correct in 2 ways:
Cannot be done algorithmically: either you have a variant glyph, or you don´t.
Falling back to the base form is likely to give good results, except in specialised fields, so this is a desirable decomposition to encode.
We assume that the ordinary state for a character is to be ``black´´, as this is the colour of ink. Some characters---normally those with large solid regions---also exist in ``white´´ variants. This is a request for those characters to be used. Many characters have the word ``black´´ in their name. We just ignore this (except in those cases where it means HEAVY), claiming that it carries no semantic value apart from emphasis.
The following characters are white variants of others:
BLACK CENTRE WHITE STAR=OPEN CENTRE BLACK STAR+SEMANTIC WHITE BLACK DIAMOND MINUS WHITE X=MULTIPLICATION SIGN+COMBINING ENCLOSING DIAMOND+SEMANTIC WHITE BLACK SMILING FACE=WHITE SMILING FACE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT EIGHT=DIGIT EIGHT+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT FIVE=DIGIT FIVE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT FOUR=DIGIT FOUR+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT NINE=DIGIT NINE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT ONE=DIGIT ONE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT SEVEN=DIGIT SEVEN+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT SIX=DIGIT SIX+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT THREE=DIGIT THREE+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED DIGIT TWO=DIGIT TWO+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DINGBAT NEGATIVE CIRCLED NUMBER TEN=START GROUP+DIGIT ONE+DIGIT ZERO+POP DIRECTIONAL FORMATTING+COMBINING ENCLOSING CIRCLE+SEMANTIC WHITE DOWNWARDS WHITE ARROW=DOWNWARDS ARROW+SEMANTIC WHITE INVERSE BULLET=BULLET+COMBINING ENCLOSING SQUARE+SEMANTIC WHITE INVERSE WHITE CIRCLE=WHITE CIRCLE+COMBINING ENCLOSING SQUARE+SEMANTIC WHITE LEFT WHITE CORNER BRACKET=LEFT CORNER BRACKET+SEMANTIC WHITE LEFT WHITE LENTICULAR BRACKET=LEFT BLACK LENTICULAR BRACKET+SEMANTIC WHITE LEFT WHITE SQUARE BRACKET=LEFT SQUARE BRACKET+SEMANTIC WHITE LEFT WHITE TORTOISE SHELL BRACKET=LEFT TORTOISE SHELL BRACKET+SEMANTIC WHITE LEFTWARDS WHITE ARROW=LEFTWARDS ARROW+SEMANTIC WHITE RIGHT WHITE CORNER BRACKET=RIGHT CORNER BRACKET+SEMANTIC WHITE RIGHT WHITE LENTICULAR BRACKET=RIGHT BLACK LENTICULAR BRACKET+SEMANTIC WHITE RIGHT WHITE SQUARE BRACKET=RIGHT SQUARE BRACKET+SEMANTIC WHITE RIGHT WHITE TORTOISE SHELL BRACKET=RIGHT TORTOISE SHELL BRACKET+SEMANTIC WHITE RIGHTWARDS WHITE ARROW=RIGHTWARDS ARROW+SEMANTIC WHITE UPWARDS WHITE ARROW FROM BAR=UPWARDS ARROW FROM BAR+SEMANTIC WHITE UPWARDS WHITE ARROW=UPWARDS ARROW+SEMANTIC WHITE WHITE BULLET=BULLET+SEMANTIC WHITE WHITE CHESS BISHOP=BLACK CHESS BISHOP+SEMANTIC WHITE WHITE CHESS KING=BLACK CHESS KING+SEMANTIC WHITE WHITE CHESS KNIGHT=BLACK CHESS KNIGHT+SEMANTIC WHITE WHITE CHESS PAWN=BLACK CHESS PAWN+SEMANTIC WHITE WHITE CHESS QUEEN=BLACK CHESS QUEEN+SEMANTIC WHITE WHITE CHESS ROOK=BLACK CHESS ROOK+SEMANTIC WHITE WHITE CIRCLE=BLACK CIRCLE+SEMANTIC WHITE WHITE CLUB SUIT=BLACK CLUB SUIT+SEMANTIC WHITE WHITE DIAMOND SUIT=BLACK DIAMOND SUIT+SEMANTIC WHITE WHITE DIAMOND=BLACK DIAMOND+SEMANTIC WHITE WHITE DOWN-POINTING SMALL TRIANGLE=BLACK DOWN-POINTING SMALL TRIANGLE+SEMANTIC WHITE WHITE DOWN-POINTING TRIANGLE=BLACK DOWN-POINTING TRIANGLE+SEMANTIC WHITE WHITE FLORETTE=BLACK FLORETTE+SEMANTIC WHITE WHITE FOUR POINTED STAR=BLACK FOUR POINTED STAR+SEMANTIC WHITE WHITE HEART SUIT=BLACK HEART SUIT+SEMANTIC WHITE WHITE LEFT POINTING INDEX=BLACK LEFT POINTING INDEX+SEMANTIC WHITE WHITE LEFT-POINTING POINTER=BLACK LEFT-POINTING POINTER+SEMANTIC WHITE WHITE LEFT-POINTING SMALL TRIANGLE=BLACK LEFT-POINTING SMALL TRIANGLE+SEMANTIC WHITE WHITE LEFT-POINTING TRIANGLE=BLACK LEFT-POINTING TRIANGLE+SEMANTIC WHITE WHITE NIB=BLACK NIB+SEMANTIC WHITE WHITE PARALLELOGRAM=BLACK PARALLELOGRAM+SEMANTIC WHITE WHITE RECTANGLE=BLACK RECTANGLE+SEMANTIC WHITE WHITE RIGHT POINTING INDEX=BLACK RIGHT POINTING INDEX+SEMANTIC WHITE WHITE RIGHT-POINTING POINTER=BLACK RIGHT-POINTING POINTER+SEMANTIC WHITE WHITE RIGHT-POINTING SMALL TRIANGLE=BLACK RIGHT-POINTING SMALL TRIANGLE+SEMANTIC WHITE WHITE RIGHT-POINTING TRIANGLE=BLACK RIGHT-POINTING TRIANGLE+SEMANTIC WHITE WHITE SCISSORS=BLACK SCISSORS+SEMANTIC WHITE WHITE SMALL SQUARE=BLACK SMALL SQUARE+SEMANTIC WHITE WHITE SPADE SUIT=BLACK SPADE SUIT+SEMANTIC WHITE WHITE SQUARE=BLACK SQUARE+SEMANTIC WHITE WHITE STAR=BLACK STAR+SEMANTIC WHITE WHITE SUN WITH RAYS=BLACK SUN WITH RAYS+SEMANTIC WHITE WHITE TELEPHONE=BLACK TELEPHONE+SEMANTIC WHITE WHITE UP-POINTING SMALL TRIANGLE=BLACK UP-POINTING SMALL TRIANGLE+SEMANTIC WHITE WHITE UP-POINTING TRIANGLE=BLACK UP-POINTING TRIANGLE+SEMANTIC WHITE WHITE VERTICAL RECTANGLE=BLACK VERTICAL RECTANGLE+SEMANTIC WHITE
but not BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW, NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW, WHITE UP POINTING INDEX, because there are no black forms.
For SMILING FACE, we derive the black character from the white, which looks the ``wrong way round´´; but that´s because we assume it´s really (*)SMILING FEATURES+COMBINING ENCLOSING CIRCLE. The interaction of these is described later.
This could be done algorithmically, but it requires clever image processing capability: software could do something like surrounding the character with a thin black line, and then invert the interior of the region so delineated. It might be sufficient to convey the concept just to exchange black and white in a character cell, though this wouldn´t work if an attempt was made to use extended runs of white text.
I assume in the above that if a character in a COMBINING ENCLOSING CIRCLE is made white, the character itself becomes white, and the space between it and the circle becomes black.
It could be argued that there is no need for a ``double-struck´´ semantic, because the double-struck characters are just white versions of heavy ones: (*)SEMANTIC DOUBLE-STRUCK=SEMANTIC HEAVY+SEMANTIC WHITE? I am not going to argue that here though.
If used but not interpreted, unlikely to result in misinterpretation: a black symbol is likely to be a good stand-in for a white one.
There are many other cases where 2 characters have been encoded separately because they differ in some way other than visual appearance. We give decompositions for these, since the Atomic Theory is concerned foremost with rendering.
We take a rigorous approach to this; we do not merge, e g, HYPHEN, MINUS, EN DASH or EM DASH, because they require separate consideration when designing a font. The Atomic Theory is universal: it is not a way of providing fallback glyphs for renderers that do not have them---that is just a consequence of its expression of the (perfectly real) relationships between characters. It is designed to be used by the highest-quality renderers without any compromise in the visual appearance of the output.
APL FUNCTIONAL SYMBOL ALPHA=GREEK SMALL LETTER ALPHA APL FUNCTIONAL SYMBOL IOTA=GREEK SMALL LETTER IOTA APL FUNCTIONAL SYMBOL OMEGA=GREEK SMALL LETTER OMEGA APL FUNCTIONAL SYMBOL RHO=GREEK SMALL LETTER RHO ARMENIAN COMMA=GRAVE ACCENT ARMENIAN EMPHASIS MARK=ACUTE ACCENT ARMENIAN FULL STOP=COLON ARMENIAN MODIFIER LETTER LEFT HALF RING=MODIFIER LETTER LEFT HALF RING ASSERTION=LEFT TACK COMPLEMENT=LATIN LETTER STRETCHED C DEVANAGARI ABBREVIATION SIGN=RING OPERATOR DIGIT EIGHT FULL STOP=DIGIT EIGHT+FULL STOP DIGIT FIVE FULL STOP=DIGIT FIVE+FULL STOP DIGIT FOUR FULL STOP=DIGIT FOUR+FULL STOP DIGIT NINE FULL STOP=DIGIT NINE+FULL STOP DIGIT ONE FULL STOP=DIGIT ONE+FULL STOP DIGIT SEVEN FULL STOP=DIGIT SEVEN+FULL STOP DIGIT SIX FULL STOP=DIGIT SIX+FULL STOP DIGIT THREE FULL STOP=DIGIT THREE+FULL STOP DIGIT TWO FULL STOP=DIGIT TWO+FULL STOP DITTO MARK=DOUBLE PRIME DOT OPERATOR=MIDDLE DOT DOUBLE PRIME QUOTATION MARK=DOUBLE PRIME END OF PROOF=BLACK SQUARE GREEK DASIA=SINGLE HIGH-REVERSED-9 QUOTATION MARK GREEK DIALYTIKA AND OXIA=DIAERESIS+COMBINING VERTICAL LINE ABOVE GREEK DIALYTIKA AND PERISPOMENI=DIAERESIS+COMBINING GREEK PERISPOMENI GREEK DIALYTIKA AND VARIA=DIAERESIS+COMBINING GRAVE ACCENT GREEK DIALYTIKA TONOS=DIAERESIS+COMBINING VERTICAL LINE ABOVE GREEK KORONIS=RIGHT SINGLE QUOTATION MARK GREEK NUMERAL SIGN=PRIME GREEK PERISPOMENI=SMALL TILDE GREEK PSILI=RIGHT SINGLE QUOTATION MARK GREEK TONOS=MODIFIER LETTER VERTICAL LINE GREEK YPOGEGRAMMENI=GREEK SMALL LETTER IOTA+SEMANTIC SUBSCRIPT HYPHEN-MINUS=EN DASH HYPHENATION POINT=MIDDLE DOT IDEOGRAPHIC NUMBER ZERO=LARGE CIRCLE INCREMENT=GREEK CAPITAL LETTER DELTA KATAKANA MIDDLE DOT=MIDDLE DOT KATAKANA-HIRAGANA PROLONGED SOUND MARK=EM DASH LATIN CAPITAL LETTER AFRICAN D=LATIN CAPITAL LETTER D WITH STROKE LATIN CAPITAL LETTER ETH=LATIN CAPITAL LETTER D WITH STROKE LATIN CAPITAL LETTER ESH=GREEK CAPITAL LETTER SIGMA LATIN CAPITAL LETTER GAMMA=LATIN SMALL LETTER GAMMA+SEMANTIC LARGE LATIN CAPITAL LETTER IOTA=LATIN SMALL LETTER IOTA+SEMANTIC LARGE LATIN CAPITAL LETTER UPSILON=LATIN SMALL LETTER UPSILON+SEMANTIC LARGE LATIN LETTER BILABIAL CLICK=FISHEYE LATIN LETTER DENTAL CLICK=VERTICAL LINE LATIN LETTER RETROFLEX CLICK=EXCLAMATION MARK LATIN SMALL LETTER A WITH RIGHT HALF RING=LATIN SMALL LETTER A+MODIFIER LETTER RIGHT HALF RING LATIN SMALL LETTER ETH=LATIN SMALL LETTER D+SEMANTIC VARIANT+COMBINING SHORT SOLIDUS OVERLAY LATIN SMALL LETTER GAMMA=GREEK SMALL LETTER GAMMA LATIN SMALL LETTER IOTA=GREEK SMALL LETTER IOTA LATIN SMALL LETTER OPEN E=GREEK SMALL LETTER EPSILON LATIN SMALL LETTER PHI=GREEK SMALL LETTER PHI LATIN SMALL LETTER RAMS HORN=LATIN SMALL LETTER GAMMA+SEMANTIC SMALL LATIN SMALL LETTER UPSILON=GREEK SMALL LETTER UPSILON LEFT CORNER BRACKET=LEFT CEILING MEDIUM VERTICAL BAR=BLACK VERTICAL RECTANGLE MODELS=TRUE MODIFIER LETTER ACUTE ACCENT=ACUTE ACCENT MODIFIER LETTER APOSTROPHE=RIGHT SINGLE QUOTATION MARK MODIFIER LETTER DOUBLE PRIME=DOUBLE PRIME MODIFIER LETTER DOWN ARROWHEAD=DOWN ARROWHEAD MODIFIER LETTER GLOTTAL STOP=LATIN LETTER GLOTTAL STOP MODIFIER LETTER GRAVE ACCENT=GRAVE ACCENT MODIFIER LETTER HALF TRIANGULAR COLON=BLACK DOWN-POINTING SMALL TRIANGLE MODIFIER LETTER LEFT ARROWHEAD=LESS-THAN SIGN MODIFIER LETTER MACRON=MACRON MODIFIER LETTER REVERSED COMMA=SINGLE HIGH-REVERSED-9 QUOTATION MARK MODIFIER LETTER REVERSED GLOTTAL STOP=LATIN LETTER PHARYNGEAL VOICED FRICATIVE MODIFIER LETTER RIGHT ARROWHEAD=GREATER-THAN SIGN MODIFIER LETTER TURNED COMMA=LEFT SINGLE QUOTATION MARK MODIFIER LETTER UP ARROWHEAD=UP ARROWHEAD NUMBER EIGHTEEN FULL STOP=DIGIT ONE+DIGIT EIGHT+FULL STOP NUMBER ELEVEN FULL STOP=DIGIT ONE+DIGIT ONE+FULL STOP NUMBER FIFTEEN FULL STOP=DIGIT ONE+DIGIT FIVE+FULL STOP NUMBER FOURTEEN FULL STOP=DIGIT ONE+DIGIT FOUR+FULL STOP NUMBER NINETEEN FULL STOP=DIGIT ONE+DIGIT NINE+FULL STOP NUMBER SEVENTEEN FULL STOP=DIGIT ONE+DIGIT SEVEN+FULL STOP NUMBER SIXTEEN FULL STOP=DIGIT ONE+DIGIT SIX+FULL STOP NUMBER TEN FULL STOP=DIGIT ONE+DIGIT ZERO+FULL STOP NUMBER THIRTEEN FULL STOP=DIGIT ONE+DIGIT THREE+FULL STOP NUMBER TWELVE FULL STOP=DIGIT ONE+DIGIT TWO+FULL STOP NUMBER TWENTY FULL STOP=DIGIT TWO+DIGIT ZERO+FULL STOP OHM SIGN=GREEK CAPITAL LETTER OMEGA RIGHT CORNER BRACKET=RIGHT FLOOR SINGLE LOW-9 QUOTATION MARK=COMMA STAR OPERATOR=ARABIC FIVE POINTED STAR TIBETAN SIGN RDEL DKAR GNYIS=TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL DKAR GCIG TIBETAN SIGN RDEL DKAR GSUM=TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL DKAR GCIG TIBETAN SIGN RDEL DKAR RDEL NAG=TIBETAN SIGN RDEL DKAR GCIG+TIBETAN SIGN RDEL NAG GCIG TIBETAN SIGN RDEL NAG GNYIS=TIBETAN SIGN RDEL NAG GCIG+TIBETAN SIGN RDEL NAG GCIG TIBETAN VOWEL SIGN VOCALIC LL=TIBETAN SUBJOINED LETTER LA+TIBETAN VOWEL SIGN AA+TIBETAN VOWEL SIGN REVERSED I TIBETAN VOWEL SIGN VOCALIC RR=TIBETAN SUBJOINED LETTER RA+TIBETAN VOWEL SIGN AA+TIBETAN VOWEL SIGN REVERSED I WAVE DASH=TILDE OPERATOR
These diacritics should take the comma accent, not the cedilla form (except where the design of the cedilla is such that it may serve as either accent---i e, an unatttached cedilla with a shallow curve). These characters are misnamed in Unicode due to an early failure to distinguish between the 2 accent marks.
LATIN CAPITAL LETTER G WITH CEDILLA=LATIN CAPITAL LETTER G+COMBINING COMMA BELOW LATIN CAPITAL LETTER K WITH CEDILLA=LATIN CAPITAL LETTER K+COMBINING COMMA BELOW LATIN CAPITAL LETTER L WITH CEDILLA=LATIN CAPITAL LETTER L+COMBINING COMMA BELOW LATIN CAPITAL LETTER N WITH CEDILLA=LATIN CAPITAL LETTER N+COMBINING COMMA BELOW LATIN CAPITAL LETTER R WITH CEDILLA=LATIN CAPITAL LETTER R+COMBINING COMMA BELOW LATIN SMALL LETTER G WITH CEDILLA=LATIN SMALL LETTER G+COMBINING COMMA BELOW LATIN SMALL LETTER K WITH CEDILLA=LATIN SMALL LETTER K+COMBINING COMMA BELOW LATIN SMALL LETTER L WITH CEDILLA=LATIN SMALL LETTER L+COMBINING COMMA BELOW LATIN SMALL LETTER N WITH CEDILLA=LATIN SMALL LETTER N+COMBINING COMMA BELOW LATIN SMALL LETTER R WITH CEDILLA=LATIN SMALL LETTER R+COMBINING COMMA BELOW
This decomposition is missing for historical reasons.
CYRILLIC CAPITAL LETTER OMEGA WITH TITLO=CYRILLIC CAPITAL LETTER OMEGA+COMBINING CYRILLIC TITLO CYRILLIC SMALL LETTER OMEGA WITH TITLO=CYRILLIC SMALL LETTER OMEGA+COMBINING CYRILLIC TITLO
The following are frankly silly, though visually appealing.
LATIN SMALL LETTER I=LATIN SMALL LETTER DOTLESS I+COMBINING DOT ABOVE LATIN SMALL LETTER J=LATIN SMALL LETTER DOTLESS J+COMBINING DOT ABOVE
Having done this, we are then obliged to respecify most existing decompositions with LATIN SMALL LETTER I/J to use the dotless form. This lets rendering software ignore the requirement of ``dot removal´´ when drawing these characters.
LATIN SMALL LETTER I WITH ACUTE=LATIN SMALL LETTER DOTLESS I+COMBINING ACUTE ACCENT LATIN SMALL LETTER I WITH BREVE=LATIN SMALL LETTER DOTLESS I+COMBINING BREVE LATIN SMALL LETTER I WITH CARON=LATIN SMALL LETTER DOTLESS I+COMBINING CARON LATIN SMALL LETTER I WITH CIRCUMFLEX=LATIN SMALL LETTER DOTLESS I+COMBINING CIRCUMFLEX ACCENT LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE=LATIN SMALL LETTER DOTLESS I+COMBINING DIAERESIS+COMBINING ACUTE ACCENT LATIN SMALL LETTER I WITH DIAERESIS=LATIN SMALL LETTER DOTLESS I+COMBINING DIAERESIS LATIN SMALL LETTER I WITH DOUBLE GRAVE=LATIN SMALL LETTER DOTLESS I+COMBINING DOUBLE GRAVE ACCENT LATIN SMALL LETTER I WITH GRAVE=LATIN SMALL LETTER DOTLESS I+COMBINING GRAVE ACCENT LATIN SMALL LETTER I WITH HOOK ABOVE=LATIN SMALL LETTER DOTLESS I+SEMANTIC ABOVE+START GROUP+LATIN LETTER GLOTTAL STOP+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING LATIN SMALL LETTER I WITH INVERTED BREVE=LATIN SMALL LETTER DOTLESS I+COMBINING INVERTED BREVE LATIN SMALL LETTER I WITH MACRON=LATIN SMALL LETTER DOTLESS I+COMBINING MACRON LATIN SMALL LETTER I WITH TILDE=LATIN SMALL LETTER DOTLESS I+COMBINING TILDE LATIN SMALL LETTER J WITH CARON=LATIN SMALL LETTER DOTLESS J+COMBINING CARON LATIN SMALL LETTER J WITH CIRCUMFLEX=LATIN SMALL LETTER DOTLESS J+COMBINING CIRCUMFLEX ACCENT
Everything marked as <circle> (there are 197 of these) should be modified to a canonical decomposition involving COMBINING ENCLOSING CIRCLE, as well as
CIRCLED ASTERISK OPERATOR=<circle>+ASTERISK OPERATOR CIRCLED DASH=<circle>+EN DASH CIRCLED DIVISION SLASH=<circle>+DIVISION SLASH CIRCLED DOT OPERATOR=<circle>+DOT OPERATOR CIRCLED EQUALS=<circle>+EQUALS SIGN CIRCLED HEAVY WHITE RIGHTWARDS ARROW=RIGHTWARDS WHITE ARROW+SEMANTIC HEAVY+COMBINING ENCLOSING CIRCLE CIRCLED MINUS=<circle>+MINUS SIGN CIRCLED PLUS=<circle>+PLUS SIGN CIRCLED POSTAL MARK=<circle>+POSTAL MARK CIRCLED RING OPERATOR=<circle>+RING OPERATOR CIRCLED TIMES=<circle>+MULTIPLICATION SIGN CIRCLED WHITE STAR=<circle>+WHITE STAR COMMERCIAL AT=LATIN SMALL LETTER A+SEMANTIC SCRIPT+COMBINING ENCLOSING CIRCLE COPYRIGHT SIGN=LATIN CAPITAL LETTER C+COMBINING ENCLOSING CIRCLE+SEMANTIC SUPERSCRIPT REGISTERED SIGN=LATIN CAPITAL LETTER R+COMBINING ENCLOSING CIRCLE+SEMANTIC SUPERSCRIPT SOUND RECORDING COPYRIGHT=LATIN CAPITAL LETTER P+COMBINING ENCLOSING CIRCLE+SEMANTIC SUPERSCRIPT
Just used for 5 characters.
SQUARED DOT OPERATOR=DOT OPERATOR+COMBINING ENCLOSING SQUARE SQUARED MINUS=MINUS SIGN+COMBINING ENCLOSING SQUARE SQUARED PLUS=PLUS SIGN+COMBINING ENCLOSING SQUARE SQUARED TIMES=MULTIPLICATION SIGN+COMBINING ENCLOSING SQUARE IDEOGRAPHIC HALF FILL SPACE=SALTIRE+COMBINING ENCLOSING SQUARE
Characters marked as <square> are not enclosed in a square, they are just rendered as is. I suppose that if <square> was replaced by <compat> throughout, or just deleted (thereby making the composition canonical), no-one would notice. This would add 194 canonical decompositions.
This is used to ``cross things out´´, and also in
EMPTY SET=DIGIT ZERO+COMBINING LONG SOLIDUS OVERLAY NOT TILDE=TILDE OPERATOR+COMBINING LONG SOLIDUS OVERLAY RESPONSE=LATIN CAPITAL LETTER R+COMBINING LONG SOLIDUS OVERLAY VERSICLE=LATIN CAPITAL LETTER V+COMBINING LONG SOLIDUS OVERLAY
Seen in
WHITE SQUARE WITH VERTICAL BISECTING LINE=WHITE SQUARE+COMBINING LONG VERTICAL LINE OVERLAY
The following decomposition is missing for historical reasons only.
LATIN SMALL LETTER T WITH PALATAL HOOK=LATIN SMALL LETTER T+COMBINING PALATALIZED HOOK BELOW
The following are missing in order to ensure the rendering process does not apply the hook to the wrong leg of the letter n (which would produce a lower case eng). We can simply note that a renderer had better get it right.
LATIN CAPITAL LETTER N WITH LEFT HOOK=LATIN CAPITAL LETTER N+COMBINING PALATALIZED HOOK BELOW LATIN SMALL LETTER N WITH LEFT HOOK=LATIN SMALL LETTER N+COMBINING PALATALIZED HOOK BELOW
Some decompositions involving this character are also missing, for historical reasons.
LATIN CAPITAL LETTER T WITH RETROFLEX HOOK=LATIN CAPITAL LETTER T+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER D WITH TAIL=LATIN SMALL LETTER D+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER EZH WITH TAIL=LATIN SMALL LETTER EZH+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER L WITH RETROFLEX HOOK=LATIN SMALL LETTER L+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER N WITH RETROFLEX HOOK=LATIN SMALL LETTER N+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER R WITH TAIL=LATIN SMALL LETTER R+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER S WITH HOOK=LATIN SMALL LETTER S+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER SQUAT REVERSED ESH=LATIN SMALL LETTER REVERSED R WITH FISHHOOK+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER T WITH RETROFLEX HOOK=LATIN SMALL LETTER T+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER TURNED R WITH HOOK=LATIN SMALL LETTER TURNED R+COMBINING RETROFLEX HOOK BELOW LATIN SMALL LETTER Z WITH RETROFLEX HOOK=LATIN SMALL LETTER Z+COMBINING RETROFLEX HOOK BELOW
The decomposition of LATIN SMALL LETTER EZH WITH TAIL is based on appearance, but that´s allowed for combining characters.
This should be used to decompose
CONTOUR INTEGRAL=INTEGRAL+COMBINING RING OVERLAY SURFACE INTEGRAL=DOUBLE INTEGRAL+COMBINING RING OVERLAY VOLUME INTEGRAL=TRIPLE INTEGRAL+COMBINING RING OVERLAY
Also related are
ANTICLOCKWISE CONTOUR INTEGRAL=INTEGRAL+COMBINING ANTICLOCKWISE RING OVERLAY CLOCKWISE CONTOUR INTEGRAL=INTEGRAL+COMBINING CLOCKWISE RING OVERLAY CLOCKWISE INTEGRAL=INTEGRAL+COMBINING CLOCKWISE ARROW ABOVE
The Tibetan half-numbers are overprints with something that looks like like a long stroke---I assume the curve and hook of the stroke are to do with the font, not semantic.
LATIN CAPITAL LETTER H WITH STROKE=LATIN CAPITAL LETTER H+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF EIGHT=TIBETAN DIGIT EIGHT+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF FIVE=TIBETAN DIGIT FIVE+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF FOUR=TIBETAN DIGIT FOUR+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF NINE=TIBETAN DIGIT NINE+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF ONE=TIBETAN DIGIT ONE+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF SEVEN=TIBETAN DIGIT SEVEN+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF SIX=TIBETAN DIGIT SIX+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF THREE=TIBETAN DIGIT THREE+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF TWO=TIBETAN DIGIT TWO+COMBINING LONG STROKE OVERLAY TIBETAN DIGIT HALF ZERO=TIBETAN DIGIT ZERO+COMBINING LONG STROKE OVERLAY
Many of the characters described as ``with stroke´´ could be provided with decompositions using this character. The list is
BLANK SYMBOL=LATIN SMALL LETTER B+COMBINING SHORT SOLIDUS OVERLAY LATIN CAPITAL LETTER L WITH STROKE=LATIN CAPITAL LETTER L+COMBINING SHORT SOLIDUS OVERLAY LATIN CAPITAL LETTER O WITH STROKE AND ACUTE=LATIN CAPITAL LETTER O WITH ACUTE+COMBINING SHORT SOLIDUS OVERLAY LATIN CAPITAL LETTER O WITH STROKE=LATIN CAPITAL LETTER O+COMBINING SHORT SOLIDUS OVERLAY LATIN SMALL LETTER L WITH STROKE=LATIN SMALL LETTER L+COMBINING SHORT SOLIDUS OVERLAY LATIN SMALL LETTER LAMBDA WITH STROKE=GREEK SMALL LETTER LAMDA+COMBINING SHORT SOLIDUS OVERLAY LATIN SMALL LETTER O WITH STROKE AND ACUTE=LATIN SMALL LETTER O WITH ACUTE+COMBINING SHORT SOLIDUS OVERLAY LATIN SMALL LETTER O WITH STROKE=LATIN SMALL LETTER O+COMBINING SHORT SOLIDUS OVERLAY LEFT RIGHT ARROW WITH STROKE=LEFT RIGHT ARROW+COMBINING SHORT SOLIDUS OVERLAY LEFT RIGHT DOUBLE ARROW WITH STROKE=LEFT RIGHT DOUBLE ARROW+COMBINING SHORT SOLIDUS OVERLAY LEFTWARDS ARROW WITH STROKE=LEFTWARDS ARROW+COMBINING SHORT SOLIDUS OVERLAY LEFTWARDS DOUBLE ARROW WITH STROKE=LEFTWARDS DOUBLE ARROW+COMBINING SHORT SOLIDUS OVERLAY ORTHODOX CROSS=CROSS OF LORRAINE+COMBINING SHORT SOLIDUS OVERLAY RIGHTWARDS ARROW WITH STROKE=RIGHTWARDS ARROW+COMBINING SHORT SOLIDUS OVERLAY RIGHTWARDS DOUBLE ARROW WITH STROKE=RIGHTWARDS DOUBLE ARROW+COMBINING SHORT SOLIDUS OVERLAY
I suppose making this suggestion would result in howls of outrage from people whose alphabets contain these characters, as, e g, ``O WITH STROKE´´ is a letter in its own right, not a composed character, in these alphabets. There are 3 points in favour of making it a composite character though
Most other characters ``with stroke´´ are encoded with COMBINING SHORT STROKE OVERLAY.
The situation here is similar to the one for COMBINING SHORT SOLIDUS OVERLAY: a lot of characters described as ``with stroke´´, ``with bar´´, ``with quill´´, ``barred´´ or ``bar´´ could be provided with decompositions using this character.
CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS=CYRILLIC CAPITAL LETTER BARRED O+COMBINING DIAERESIS CYRILLIC CAPITAL LETTER BARRED O=CYRILLIC CAPITAL LETTER O+COMBINING SHORT STROKE OVERLAY CYRILLIC CAPITAL LETTER GHE WITH STROKE=CYRILLIC CAPITAL LETTER GHE+COMBINING SHORT STROKE OVERLAY CYRILLIC CAPITAL LETTER KA WITH STROKE=CYRILLIC CAPITAL LETTER KA+COMBINING SHORT STROKE OVERLAY CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE=CYRILLIC CAPITAL LETTER STRAIGHT U+COMBINING SHORT STROKE OVERLAY CYRILLIC CAPITAL LETTER YAT=CYRILLIC CAPITAL LETTER SOFT SIGN+COMBINING SHORT STROKE OVERLAY CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS=CYRILLIC SMALL LETTER BARRED O+COMBINING DIAERESIS CYRILLIC SMALL LETTER BARRED O=CYRILLIC SMALL LETTER O+COMBINING SHORT STROKE OVERLAY CYRILLIC SMALL LETTER GHE WITH STROKE=CYRILLIC SMALL LETTER GHE+COMBINING SHORT STROKE OVERLAY CYRILLIC SMALL LETTER KA WITH STROKE=CYRILLIC SMALL LETTER KA+COMBINING SHORT STROKE OVERLAY CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE=CYRILLIC SMALL LETTER STRAIGHT U+COMBINING SHORT STROKE OVERLAY CYRILLIC SMALL LETTER YAT=CYRILLIC SMALL LETTER SOFT SIGN+COMBINING SHORT STROKE OVERLAY LATIN CAPITAL LETTER D WITH STROKE=LATIN CAPITAL LETTER D+COMBINING SHORT STROKE OVERLAY LATIN CAPITAL LETTER G WITH STROKE=LATIN CAPITAL LETTER G+COMBINING SHORT STROKE OVERLAY LATIN CAPITAL LETTER I WITH STROKE=LATIN CAPITAL LETTER I+COMBINING SHORT STROKE OVERLAY LATIN CAPITAL LETTER T WITH STROKE=LATIN CAPITAL LETTER T+COMBINING SHORT STROKE OVERLAY LATIN CAPITAL LETTER Z WITH STROKE=LATIN CAPITAL LETTER Z+COMBINING SHORT STROKE OVERLAY LATIN LETTER GLOTTAL STOP WITH STROKE=LATIN LETTER GLOTTAL STOP+COMBINING SHORT STROKE OVERLAY LATIN LETTER TWO WITH STROKE=DIGIT TWO+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER B WITH STROKE=LATIN SMALL LETTER B+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER BARRED O=LATIN SMALL LETTER O+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER D WITH STROKE=LATIN SMALL LETTER D+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER G WITH STROKE=LATIN SMALL LETTER G+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER H WITH STROKE=LATIN SMALL LETTER H+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER I WITH STROKE=LATIN SMALL LETTER I+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER L WITH BAR=LATIN SMALL LETTER L+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER T WITH STROKE=LATIN SMALL LETTER T+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER U BAR=LATIN SMALL LETTER U+COMBINING SHORT STROKE OVERLAY LATIN SMALL LETTER Z WITH STROKE=LATIN SMALL LETTER Z+COMBINING SHORT STROKE OVERLAY LEFT SQUARE BRACKET WITH QUILL=LEFT SQUARE BRACKET+COMBINING SHORT STROKE OVERLAY RIGHT SQUARE BRACKET WITH QUILL=RIGHT SQUARE BRACKET+COMBINING SHORT STROKE OVERLAY
Also, some characters have a ``double stroke´´.
DOWNWARDS ARROW WITH DOUBLE STROKE=DOWNWARDS ARROW+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING LATIN LETTER ALVEOLAR CLICK=VERTICAL LINE+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING UPWARDS ARROW WITH DOUBLE STROKE=UPWARDS ARROW+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
Although some algorithmic sofistikashun would be required to get the bar in exactly the right place, in practice it might be well enough to just go ahead and overprint it, with maybe a few special cases for instances where it is in an unusual position (e g, LATIN SMALL LETTER G WITH STROKE).
This might have a few uses apart from the 4 cyrillic characters ``with vertical stroke´´ which could be composed from this character.
CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE=CYRILLIC CAPITAL LETTER CHE+COMBINING SHORT VERTICAL LINE OVERLAY CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE=CYRILLIC CAPITAL LETTER KA+COMBINING SHORT VERTICAL LINE OVERLAY CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE=CYRILLIC SMALL LETTER CHE+COMBINING SHORT VERTICAL LINE OVERLAY CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE=CYRILLIC SMALL LETTER KA+COMBINING SHORT VERTICAL LINE OVERLAY
Used in
LATIN CAPITAL LETTER O WITH MIDDLE TILDE=LATIN CAPITAL LETTER O+COMBINING TILDE OVERLAY LATIN SMALL LETTER L WITH MIDDLE TILDE=LATIN SMALL LETTER L+COMBINING TILDE OVERLAY
This seems to be another case where some decompositions have been omitted in order to ensure that the rendering process does not put the mark in the wrong place. Caveat implementor!
LATIN SMALL LETTER N WITH LONG RIGHT LEG=LATIN SMALL LETTER N+COMBINING VERTICAL LINE BELOW LATIN SMALL LETTER R WITH LONG LEG=LATIN SMALL LETTER R+COMBINING VERTICAL LINE BELOW LATIN SMALL LETTER TURNED M WITH LONG LEG=LATIN SMALL LETTER TURNED M+COMBINING VERTICAL LINE BELOW
This is an existing character, but we give it more precise semantics by specifying that it lies between 1 character or group on the left, and 1 on the right. In other words, it is ``binary´´, just like SEMANTICS LIGATURE, COMPOSE, ABOVE and BELOW. It is used in all the decompositions marked with <fraction> (there are 16 of these), and the following:
ACCOUNT OF=LATIN SMALL LETTER A+FRACTION SLASH+LATIN SMALL LETTER C ADDRESSED TO THE SUBJECT=LATIN SMALL LETTER A+FRACTION SLASH+LATIN SMALL LETTER S CADA UNA=LATIN SMALL LETTER C+FRACTION SLASH+LATIN SMALL LETTER U CARE OF=LATIN SMALL LETTER C+FRACTION SLASH+LATIN SMALL LETTER O FRACTION NUMERATOR ONE=DIGIT ONE+FRACTION SLASH PER MILLE SIGN=DIGIT ZERO+FRACTION SLASH+START GROUP+DIGIT ZERO+DIGIT ZERO+POP DIRECTIONAL FORMATTING PER TEN THOUSAND SIGN=DIGIT ZERO+FRACTION SLASH+START GROUP+DIGIT ZERO+DIGIT ZERO+DIGIT ZERO+POP DIRECTIONAL FORMATTING PERCENT SIGN=DIGIT ZERO+FRACTION SLASH+DIGIT ZERO VULGAR FRACTION FIVE EIGHTHS=DIGIT FIVE+FRACTION SLASH+DIGIT EIGHT VULGAR FRACTION FIVE SIXTHS=DIGIT FIVE+FRACTION SLASH+DIGIT SIX VULGAR FRACTION FOUR FIFTHS=DIGIT FOUR+FRACTION SLASH+DIGIT FIVE VULGAR FRACTION ONE EIGHTH=DIGIT ONE+FRACTION SLASH+DIGIT EIGHT VULGAR FRACTION ONE FIFTH=DIGIT ONE+FRACTION SLASH+DIGIT FIVE VULGAR FRACTION ONE HALF=DIGIT ONE+FRACTION SLASH+DIGIT TWO VULGAR FRACTION ONE QUARTER=DIGIT ONE+FRACTION SLASH+DIGIT FOUR VULGAR FRACTION ONE SIXTH=DIGIT ONE+FRACTION SLASH+DIGIT SIX VULGAR FRACTION ONE THIRD=DIGIT ONE+FRACTION SLASH+DIGIT THREE VULGAR FRACTION SEVEN EIGHTHS=DIGIT SEVEN+FRACTION SLASH+DIGIT EIGHT VULGAR FRACTION THREE EIGHTHS=DIGIT THREE+FRACTION SLASH+DIGIT EIGHT VULGAR FRACTION THREE FIFTHS=DIGIT THREE+FRACTION SLASH+DIGIT FIVE VULGAR FRACTION THREE QUARTERS=DIGIT THREE+FRACTION SLASH+DIGIT FOUR VULGAR FRACTION TWO FIFTHS=DIGIT TWO+FRACTION SLASH+DIGIT FIVE VULGAR FRACTION TWO THIRDS=DIGIT TWO+FRACTION SLASH+DIGIT THREE
A sophisticated rendering agent is explicitly allowed to stack the top and bottom of a fraction over each other (maybe varying their size as well), and use a horizontal or oblique rule to represent the division. This is because a decomposition like VULGAR FRACTION ONE QUARTER=DIGIT ONE+FRACTION SLASH+DIGIT FOUR is canonical, so it is permitted (but not required) to use a special glyph, such as would be present in a Latin-1 font.
Some Hebrew characters are used in mathematical text. These have obvious decompositions which should be encoded. Doing this will enable mathematicians to use any other Hebrew characters as symbols (by using the decomposition) without needing to get them encoded in the U C S first.
ALEF SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER ALEF+POP DIRECTIONAL FORMATTING BET SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER BET+POP DIRECTIONAL FORMATTING DALET SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER DALET+POP DIRECTIONAL FORMATTING GIMEL SYMBOL=LEFT-TO-RIGHT OVERRIDE+HEBREW LETTER GIMEL+POP DIRECTIONAL FORMATTING
All compatibility decompositions involving <isolated>, <initial>, <medial> and <final> could be replaced by canonical one involving the characters which already exist for this purpose.
My knowledge of the languages that use these characters is almost 0, but I think the following are required to work according to the existing standard.
Any character with a compatibility decomposition including <isolated> gets a canonical decomposition by deleting the <isolated>, and adding ZERO WIDTH NON-JOINER´s at the start and end.
Any character with a compatibility decomposition including <initial> gets a canonical decomposition by deleting the <initial>, and adding a ZERO WIDTH NON-JOINER at the start, and a ZERO WIDTH JOINER at the end.
Any character with a compatibility decomposition including <medial> gets a canonical decomposition by deleting the <medial>, and adding a ZERO WIDTH JOINER at the start, and a ZERO WIDTH JOINER at the end.
Any character with a compatibility decomposition including <final> gets a canonical decomposition by deleting the <final>, and adding a ZERO WIDTH JOINER at the start, and a ZERO WIDTH NON-JOINER at the end.
There are also a few other characters that can be treated in this way. Although there are 5 special FINAL forms for Hebrew, these are not really required: it is just as easy for a user interface to insert a ZERO WIDTH JOINER+HEBREW LETTER KAF+ZERO WIDTH NON-JOINER as it is to insert a HEBREW LETTER FINAL KAF, for whatever reason, and both have to work anyway. We also encode GREEK SMALL LETTER FINAL SIGMA in the same way, in case a renderer chooses to treat this combination specially. (If it does, users must be ready to write a word like Eros as GREEK CAPITAL LETTER ETA+GREEK SMALL LETTER RHO+GREEK SMALL LETTER OMICRON+GREEK SMALL LETTER SIGMA+ZERO WIDTH JOINER, unless they really do want to see the final form of sigma.) Admittedly, a typical user might be very surprised to see a final character changing shape before their eyes, but then, that is what it´s for.
There certainly seems little excuse to omit the decompositions in languages like Arabic and Hebrew where it is expected.
ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA INITIAL FORM=<initial>+ARABIC LETTER ALEF MAKSURA ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA MEDIAL FORM=<medial>+ARABIC LETTER ALEF MAKSURA ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA FINAL FORM=<final>+ARABIC LETTER YEH WITH HAMZA ABOVE+ARABIC LETTER ALEF MAKSURA ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM=<initial>+ARABIC LETTER YEH WITH HAMZA ABOVE+ARABIC LETTER ALEF MAKSURA ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM=<isolated>+ARABIC LETTER YEH WITH HAMZA ABOVE+ARABIC LETTER ALEF MAKSURA GREEK SMALL LETTER FINAL SIGMA=<final>+GREEK SMALL LETTER SIGMA HEBREW LETTER FINAL KAF=<final>+HEBREW LETTER KAF HEBREW LETTER FINAL MEM=<final>+HEBREW LETTER MEM HEBREW LETTER FINAL NUN=<final>+HEBREW LETTER NUN HEBREW LETTER FINAL PE=<final>+HEBREW LETTER PE HEBREW LETTER FINAL TSADI=<final>+HEBREW LETTER TSADI
We also have to replace the decompositions for the combining marks (starting with SPACE), or we get unnecessary extra space characters.
ARABIC DAMMA ISOLATED FORM=<isolated>+ARABIC DAMMA ARABIC DAMMATAN ISOLATED FORM=<isolated>+ARABIC DAMMATAN ARABIC FATHA ISOLATED FORM=<isolated>+ARABIC FATHA ARABIC FATHATAN ISOLATED FORM=<isolated>+ARABIC FATHATAN ARABIC KASRA ISOLATED FORM=<isolated>+ARABIC KASRA ARABIC KASRATAN ISOLATED FORM=<isolated>+ARABIC KASRATAN ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC DAMMA ARABIC LIGATURE SHADDA WITH DAMMA MEDIAL FORM=<medial>+ARABIC SHADDA+ARABIC DAMMA ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC DAMMATAN ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC FATHA ARABIC LIGATURE SHADDA WITH FATHA MEDIAL FORM=<medial>+ARABIC SHADDA+ARABIC FATHA ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC KASRA ARABIC LIGATURE SHADDA WITH KASRA MEDIAL FORM=<medial>+ARABIC SHADDA+ARABIC KASRA ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC KASRATAN ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM=<isolated>+ARABIC SHADDA+ARABIC LETTER SUPERSCRIPT ALEF ARABIC SHADDA ISOLATED FORM=<isolated>+ARABIC SHADDA ARABIC SUKUN ISOLATED FORM=<isolated>+ARABIC SUKUN
Many characters are described as ``with hook´´ or ``with middle hook´´, but no combining form of this mark is encoded. This is probably because the position of the hook moves around a lot depending on which character is to receive it, and because there are a few different forms of hook, 3 of which are encoded separately and were considered above. The fact that the hook moves around should be seen as a rendering problem, easily solved by a repository of precomposed glyphs for the cases that are actually used.
If there was to be a COMBINING HOOK character, the characters that use it would be
CYRILLIC CAPITAL LETTER EN WITH HOOK=CYRILLIC CAPITAL LETTER EN+COMBINING HOOK CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK=CYRILLIC CAPITAL LETTER GHE+COMBINING HOOK CYRILLIC CAPITAL LETTER KA WITH HOOK=CYRILLIC CAPITAL LETTER KA+COMBINING HOOK CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK=CYRILLIC CAPITAL LETTER PE+COMBINING HOOK CYRILLIC SMALL LETTER EN WITH HOOK=CYRILLIC SMALL LETTER EN+COMBINING HOOK CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK=CYRILLIC SMALL LETTER GHE+COMBINING HOOK CYRILLIC SMALL LETTER KA WITH HOOK=CYRILLIC SMALL LETTER KA+COMBINING HOOK CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK=CYRILLIC SMALL LETTER PE+COMBINING HOOK EIGHTH NOTE=QUARTER NOTE+COMBINING HOOK LATIN CAPITAL LETTER B WITH HOOK=LATIN CAPITAL LETTER B+COMBINING HOOK LATIN CAPITAL LETTER C WITH HOOK=LATIN CAPITAL LETTER C+COMBINING HOOK LATIN CAPITAL LETTER D WITH HOOK=LATIN CAPITAL LETTER D+COMBINING HOOK LATIN CAPITAL LETTER F WITH HOOK=LATIN CAPITAL LETTER F+COMBINING HOOK LATIN CAPITAL LETTER G WITH HOOK=LATIN CAPITAL LETTER G+COMBINING HOOK LATIN CAPITAL LETTER K WITH HOOK=LATIN CAPITAL LETTER K+COMBINING HOOK LATIN CAPITAL LETTER P WITH HOOK=LATIN CAPITAL LETTER P+COMBINING HOOK LATIN CAPITAL LETTER T WITH HOOK=LATIN CAPITAL LETTER T+COMBINING HOOK LATIN CAPITAL LETTER Y WITH HOOK=LATIN CAPITAL LETTER Y+COMBINING HOOK LATIN SMALL LETTER B WITH HOOK=LATIN SMALL LETTER B+COMBINING HOOK LATIN SMALL LETTER C WITH HOOK=LATIN SMALL LETTER C+COMBINING HOOK LATIN SMALL LETTER D WITH HOOK=LATIN SMALL LETTER D+COMBINING HOOK LATIN SMALL LETTER F WITH HOOK=LATIN SMALL LETTER F+COMBINING HOOK LATIN SMALL LETTER G WITH HOOK=LATIN SMALL LETTER G+COMBINING HOOK LATIN SMALL LETTER H WITH HOOK=LATIN SMALL LETTER H+COMBINING HOOK LATIN SMALL LETTER K WITH HOOK=LATIN SMALL LETTER K+COMBINING HOOK LATIN SMALL LETTER M WITH HOOK=LATIN SMALL LETTER M+COMBINING HOOK LATIN SMALL LETTER P WITH HOOK=LATIN SMALL LETTER P+COMBINING HOOK LATIN SMALL LETTER Q WITH HOOK=LATIN SMALL LETTER Q+COMBINING HOOK LATIN SMALL LETTER T WITH HOOK=LATIN SMALL LETTER T+COMBINING HOOK LATIN SMALL LETTER Y WITH HOOK=LATIN SMALL LETTER Y+COMBINING HOOK LEFTWARDS ARROW WITH HOOK=LEFTWARDS ARROW+COMBINING HOOK RIGHTWARDS ARROW WITH HOOK=RIGHTWARDS ARROW+COMBINING HOOK
It´s odd that although there is a LATIN SMALL LETTER HENG WITH HOOK, there is no LATIN SMALL LETTER HENG. It should be represented as a ligature of h and eng, and that gives us
LATIN SMALL LETTER HENG WITH HOOK=LATIN SMALL LETTER H+SEMANTIC LIGATURE+LATIN SMALL LETTER ENG+COMBINING HOOK
The case of characters ``with curl´´ is similar to those ``with hook´´, in that the curl moves around depending on the character being modified. But if there was a combining curl, it would be used for 12 characters, if we also follow the principal of Occam´s Razor and include crossed-tail, belted, looped and closed characters in this set, as is justified by their visual appearance.
LATIN LETTER REVERSED ESH LOOP=LATIN SMALL LETTER ESH+SEMANTIC REVERSED+COMBINING CURL LATIN SMALL LETTER C WITH CURL=LATIN SMALL LETTER C+COMBINING CURL LATIN SMALL LETTER CLOSED OMEGA=GREEK SMALL LETTER OMEGA+COMBINING CURL LATIN SMALL LETTER CLOSED OPEN E=LATIN SMALL LETTER OPEN E+COMBINING CURL LATIN SMALL LETTER ESH WITH CURL=LATIN SMALL LETTER ESH+COMBINING CURL LATIN SMALL LETTER EZH WITH CURL=LATIN SMALL LETTER EZH+COMBINING CURL LATIN SMALL LETTER J WITH CROSSED-TAIL=LATIN SMALL LETTER J+COMBINING CURL LATIN SMALL LETTER L WITH BELT=LATIN SMALL LETTER L+COMBINING CURL LATIN SMALL LETTER Z WITH CURL=LATIN SMALL LETTER Z+COMBINING CURL LEFTWARDS ARROW WITH LOOP=LEFTWARDS ARROW+COMBINING CURL RIGHTWARDS ARROW WITH LOOP=RIGHTWARDS ARROW+COMBINING CURL SCRIPT SMALL G=LATIN SMALL LETTER G+SEMANTIC SCRIPT+COMBINING CURL
SCRIPT SMALL G is here because the only difference between it and LATIN SMALL LETTER SCRIPT G (at least, as they appear in The Book) is the fact that the descender crosses itself.
These characters are described as ``with descender´´. The visual appearance of the descender is variable.
CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER=CYRILLIC CAPITAL LETTER ABKHASIAN CHE+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER CHE WITH DESCENDER=CYRILLIC CAPITAL LETTER CHE+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER EN WITH DESCENDER=CYRILLIC CAPITAL LETTER EN+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER ES WITH DESCENDER=CYRILLIC CAPITAL LETTER ES+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER HA WITH DESCENDER=CYRILLIC CAPITAL LETTER HA+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER KA WITH DESCENDER=CYRILLIC CAPITAL LETTER KA+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER TE WITH DESCENDER=CYRILLIC CAPITAL LETTER TE+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER ZE WITH DESCENDER=CYRILLIC CAPITAL LETTER ZE+COMBINING CYRILLIC DESCENDER CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER=CYRILLIC CAPITAL LETTER ZHE+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER=CYRILLIC SMALL LETTER ABKHASIAN CHE+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER CHE WITH DESCENDER=CYRILLIC SMALL LETTER CHE+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER EN WITH DESCENDER=CYRILLIC SMALL LETTER EN+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER ES WITH DESCENDER=CYRILLIC SMALL LETTER ES+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER HA WITH DESCENDER=CYRILLIC SMALL LETTER HA+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER KA WITH DESCENDER=CYRILLIC SMALL LETTER KA+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER TE WITH DESCENDER=CYRILLIC SMALL LETTER TE+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER ZE WITH DESCENDER=CYRILLIC SMALL LETTER ZE+COMBINING CYRILLIC DESCENDER CYRILLIC SMALL LETTER ZHE WITH DESCENDER=CYRILLIC SMALL LETTER ZHE+COMBINING CYRILLIC DESCENDER
This can be used to build up the following character.
VERTICAL KANA REPEAT WITH VOICED SOUND MARK=VERTICAL KANA REPEAT MARK+COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
This character does the same job for table layout and HORIZONTAL TABULATION that LINE SEPARATOR does for vertical layout and LINE FEED, namely, defining a character to ``unambiguously represent this semantic´´. A renderer should arrange all lines within a paragraph (as delimited by PARAGRAPH SEPARATOR characters) that contain this character so that the columns so marked line up vertically. Apart from that, the character has zero width, so adjacent columns of the table will be contiguous. (They could still contain space characters though ...)
Tables defined in this way are logically unable to nest, so the complexity of a Unicode renderer that implements COLUMN SEPARATOR is much, much less than an H T M L browser, which might have to recursively format tables of tables of ... of tables. All it has to do is split the paragraph into lines and then the lines into columns, padding shorter columns out to the lengths of the longer ones.
When placed between 2 glyphs, this character causes them to be placed closer together than if they were simply written one after the other. The characters may or may not remain distinct: the spacing between them may be removed entirely, causing them to touch. There is no expectation that multiple uses of NEGATIVE SPACE can be used to move back through running text. This is not BACKSPACE!
The question of whether to prefer NEGATIVE SPACE over SEMANTIC AFTER is not very obvious. It is used where some extra ``jostling together´´ appears to be called for.
APPROXIMATELY EQUAL TO OR THE IMAGE OF=DOT ABOVE+NEGATIVE SPACE+EQUALS SIGN+NEGATIVE SPACE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+DOT ABOVE+POP DIRECTIONAL FORMATTING BETWEEN=LEFT PARENTHESIS+NEGATIVE SPACE+RIGHT PARENTHESIS COLON EQUALS=COLON+NEGATIVE SPACE+EQUALS SIGN DOUBLE INTEGRAL=INTEGRAL+NEGATIVE SPACE+INTEGRAL DOUBLE SUBSET=SUBSET OF+NEGATIVE SPACE+SUBSET OF DOUBLE SUPERSET=SUPERSET OF+NEGATIVE SPACE+SUPERSET OF IMAGE OF OR APPROXIMATELY EQUAL TO=ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+DOT ABOVE+NEGATIVE SPACE+EQUALS SIGN+NEGATIVE SPACE+DOT ABOVE MUCH GREATER-THAN=GREATER-THAN SIGN+NEGATIVE SPACE+GREATER-THAN SIGN MUCH LESS-THAN=LESS-THAN SIGN+NEGATIVE SPACE+LESS-THAN SIGN RIGHTWARDS ARROW TO BAR=RIGHTWARDS ARROW+NEGATIVE SPACE+VERTICAL STROKE TRIPLE INTEGRAL=INTEGRAL+NEGATIVE SPACE+INTEGRAL+NEGATIVE SPACE+INTEGRAL VERY MUCH GREATER-THAN=GREATER-THAN SIGN+NEGATIVE SPACE+GREATER-THAN SIGN+NEGATIVE SPACE+GREATER-THAN SIGN VERY MUCH LESS-THAN=LESS-THAN SIGN+NEGATIVE SPACE+LESS-THAN SIGN+NEGATIVE SPACE+LESS-THAN SIGN
It seems sensible to provide the following as well, where the spacing goes in the other direction.
HORIZONTAL ELLIPSIS=FULL STOP+THIN SPACE+FULL STOP+THIN SPACE+FULL STOP MIDLINE HORIZONTAL ELLIPSIS=MIDDLE DOT+THIN SPACE+MIDDLE DOT+THIN SPACE+MIDDLE DOT TWO DOT LEADER=FULL STOP+THIN SPACE+FULL STOP
Easy to do algorithmically.
Unlikely to be very productive in forming new characters, as it´s
easier to just write a character twice, and there is visually little difference.
But has ``prior art´´ in T
If used but not recognised, quite likely to cause the resulting text to be misinterpreted.
Since we know that the currency symbols were invented as typographic variants of existing characters, it seems a good idea to encode this. Then (a) software with no glyph can generate an acceptable alternative and (b) when a new currency is invented, a symbol can be given to it without needing to go through a standardisation process. I suggest that
POUND SIGN=LATIN CAPITAL LETTER L+SEMANTIC SCRIPT+COMBINING SHORT STROKE OVERLAY
is historically right, right by current usage, and gives a result that will be understandable to an English national if there is no better glyph available (namely, `L´). Other currency symbols should be treated the same way (including the symbol for Euro which looks to me like LATIN SMALL LETTER C+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING---no doubt, any such suggestion would give the European Commission a collective fit ...).
CENT SIGN=LATIN SMALL LETTER C+COMBINING LONG VERTICAL LINE OVERLAY COLON SIGN=LATIN CAPITAL LETTER C+SEMANTIC OVERPRINT+START GROUP+SOLIDUS+SOLIDUS+POP DIRECTIONAL FORMATTING CRUZEIRO SIGN=LATIN CAPITAL LETTER C+SEMANTIC OVERPRINT+LATIN SMALL LETTER R DOLLAR SIGN=LATIN CAPITAL LETTER S+COMBINING LONG VERTICAL LINE OVERLAY DONG SIGN=LATIN SMALL LETTER D+COMBINING LOW LINE+COMBINING SHORT STROKE OVERLAY EURO SIGN=LATIN SMALL LETTER C+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING EURO-CURRENCY SIGN=LATIN CAPITAL LETTER C+SEMANTIC LIGATURE+LATIN CAPITAL LETTER E FRENCH FRANC SIGN=LATIN CAPITAL LETTER F+SEMANTIC OVERPRINT+LATIN SMALL LETTER R LIRA SIGN=LATIN CAPITAL LETTER L+SEMANTIC SCRIPT+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING MILL SIGN=LATIN SMALL LETTER M+COMBINING SHORT SOLIDUS OVERLAY NAIRA SIGN=LATIN CAPITAL LETTER N+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING PESETA SIGN=LATIN CAPITAL LETTER P+COMBINING SHORT STROKE OVERLAY RUPEE SIGN=LATIN CAPITAL LETTER R+COMBINING SHORT STROKE OVERLAY+SEMANTIC LIGATURE+LATIN SMALL LETTER S THAI CURRENCY SYMBOL BAHT=LATIN CAPITAL LETTER B+COMBINING LONG SOLIDUS OVERLAY WON SIGN=LATIN CAPITAL LETTER W+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING YEN SIGN=LATIN CAPITAL LETTER Y+SEMANTIC OVERPRINT+START GROUP+NON-BREAKING HYPHEN+SEMANTIC ABOVE+NON-BREAKING HYPHEN+POP DIRECTIONAL FORMATTING
It seems futile to deny history and claim that these are fully-formed characters in their own right. This doesn´t deny anyone the right to design specialised glyphs, if they wish. There is no reason to change current practice, just to systematise it.
According to the Unicode Standard, a rendering agent is allowed some discretion in its approach to line breaking. For instance, it may choose to break lines at HYPHEN; but it doesn´t have to.
Since such a lot is specified about line breaking, it would be interesting to specify it completely.
Let a space character have one or more of the following properties:
| A certain natural width. |
| It may allow line breaks. |
| It may be flexible, in the sense that extra space can be inserted there for justification. |
| It may be a word-constituent. |
To be able to analyse spacing characters in a way consistent with the rest of the Atomic Theory, we need to ascribe these properties to certain characters, and build the rest up from them.
The most natural choice of atoms seems to be:
| THIN SPACE | Has a width of 1/6en, but otherwise just like any other character (no special line-breaking property, not flexible). |
| ZERO WIDTH SPACE | Marks a place where a line may be broken. |
| HAIR SPACE | Marks a place where justification may be inserted. |
| ZERO WIDTH JOINER | Continues a word: the characters before and after it (if letters) are not considered to be in final or initial position. |
Then we can express all other spacing characters in terms of these. We go
along with T
EM QUAD=EM SPACE+ZERO WIDTH SPACE EM SPACE=EN SPACE+EN SPACE EN QUAD=EN SPACE+ZERO WIDTH SPACE EN SPACE=FOUR-PER-EM SPACE+FOUR-PER-EM SPACE FOUR-PER-EM SPACE=THIN SPACE+THIN SPACE+THIN SPACE HYPHEN=NON-BREAKING HYPHEN+ZERO WIDTH SPACE SIX-PER-EM SPACE=THIN SPACE+THIN SPACE SPACE=NO-BREAK SPACE+ZERO WIDTH SPACE+HAIR SPACE THREE-PER-EM SPACE=SIX-PER-EM SPACE+SIX-PER-EM SPACE ZERO WIDTH NON-JOINER=ZERO WIDTH NO-BREAK SPACE
Since lines are only broken at ZERO WIDTH SPACE, we also have
TIBETAN MARK DELIMITER TSHEG BSTAR= TIBETAN MARK INTERSYLLABIC TSHEG=TIBETAN MARK DELIMITER TSHEG BSTAR+ZERO WIDTH SPACE
We also have to delete a few existing decompositions for this to work.
HAIR SPACE= NO-BREAK SPACE= NON-BREAKING HYPHEN= THIN SPACE= ZERO WIDTH JOINER= ZERO WIDTH SPACE=
HAIR SPACE can also be used in tables (using the COLUMN SEPARATOR character) to mark places where extra spacing can be added. This can provide control over right justification or centring of cell contents.
The widths of NO-BREAK SPACE (the typical interword gap), FIGURE SPACE (a
digit) and PUNCTUATION SPACE (a FULL STOP) are not given here as they are at the
separate discretion of a font designer, rather than being a certain specific
width (even in ens). We follow T
FIGURE SPACE=DIGIT ZERO+SEMANTIC PHANTOM PUNCTUATION SPACE=FULL STOP+SEMANTIC PHANTOM
(Unicode provides compatibility decompositions that assume they are all the same width.)
So, an ordinary SPACE is NO-BREAK SPACE+ZERO WIDTH SPACE+HAIR SPACE, which means
Who´d have thought it was so complicated?
This sounds a bit harsh, but some of the existing characters are present purely for compatibility purposes, and are so specialised that a general-purpoe renderer should never see them, can make no real sense of them, and so needs no glyphs for them.
So we just throw them away (while preserving them in user data, of course).
BOTTOM HALF INTEGRAL=REPLACEMENT CHARACTER COMBINING DOUBLE TILDE LEFT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER COMBINING DOUBLE TILDE RIGHT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER COMBINING LIGATURE LEFT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER COMBINING LIGATURE RIGHT HALF=SEMANTIC OVERPRINT+REPLACEMENT CHARACTER LOWER HALF INVERSE WHITE CIRCLE=REPLACEMENT CHARACTER TOP HALF INTEGRAL=REPLACEMENT CHARACTER UPPER HALF INVERSE WHITE CIRCLE=REPLACEMENT CHARACTER VERTICAL KANA REPEAT MARK LOWER HALF=REPLACEMENT CHARACTER VERTICAL KANA REPEAT MARK UPPER HALF=REPLACEMENT CHARACTER VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF=REPLACEMENT CHARACTER
Most of the Hebrew accents and points are easily built up from other characters.
HEBREW ACCENT DEHI=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+START GROUP+LOWER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT GERESH MUQDAM=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+UPPER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT GERESH=SEMANTIC ABOVE+START GROUP+UPPER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT GERSHAYIM=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+START GROUP+UPPER LEFT QUADRANT CIRCULAR ARC+UPPER LEFT QUADRANT CIRCULAR ARC+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT ILUY=SEMANTIC ABOVE+START GROUP+RIGHT FLOOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT MAHAPAKH=SEMANTIC BELOW+START GROUP+LESS-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT MERKHA KEFULA=SEMANTIC BELOW+START GROUP+START GROUP+LOWER RIGHT QUADRANT CIRCULAR ARC+LOWER RIGHT QUADRANT CIRCULAR ARC+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT MERKHA=SEMANTIC BELOW+START GROUP+LOWER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT MUNAH=SEMANTIC BELOW+START GROUP+RIGHT FLOOR+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT OLE=SEMANTIC ABOVE+START GROUP+LESS-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT PASHTA=SEMANTIC AFTER+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT QADMA=SEMANTIC ABOVE+START GROUP+UPPER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT REVIA=SEMANTIC ABOVE+START GROUP+BLACK DIAMOND+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT SEGOL=COMBINING DIAERESIS+COMBINING DOT ABOVE HEBREW ACCENT TEVIR=SEMANTIC BELOW+START GROUP+START GROUP+LOWER RIGHT QUADRANT CIRCULAR ARC+SEMANTIC OVERPRINT+MIDDLE DOT+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT TIPEHA=SEMANTIC BELOW+START GROUP+LOWER LEFT QUADRANT CIRCULAR ARC+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT YETIV=SEMANTIC BEFORE+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC BELOW+START GROUP+LESS-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT ZAQEF GADOL=SEMANTIC ABOVE+START GROUP+START GROUP+VERTICAL STROKE+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW ACCENT ZAQEF QATAN=COMBINING DOT ABOVE+COMBINING DOT ABOVE HEBREW ACCENT ZARQA=SEMANTIC ABOVE+INVERTED LAZY S HEBREW ACCENT ZINOR=SEMANTIC AFTER+START GROUP+ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+INVERTED LAZY S+POP DIRECTIONAL FORMATTING HEBREW MARK MASORA CIRCLE=COMBINING RING ABOVE HEBREW MARK UPPER DOT=COMBINING DOT ABOVE HEBREW POINT DAGESH OR MAPIQ=SEMANTIC OVERPRINT+MIDDLE DOT HEBREW POINT HATAF PATAH=SEMANTIC BELOW+START GROUP+MACRON+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING HEBREW POINT HATAF QAMATS=SEMANTIC BELOW+START GROUP+DOWN TACK+SEMANTIC SMALL+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING HEBREW POINT HATAF SEGOL=SEMANTIC BELOW+START GROUP+DIAERESIS+COMBINING DOT BELOW+SEMANTIC BEFORE+COLON+POP DIRECTIONAL FORMATTING HEBREW POINT HIRIQ=COMBINING DOT BELOW HEBREW POINT HOLAM=SEMANTIC AFTER+DOT ABOVE HEBREW POINT JUDEO-SPANISH VARIKA=COMBINING BREVE BELOW HEBREW POINT METEG=COMBINING VERTICAL LINE BELOW HEBREW POINT PATAH=COMBINING MACRON BELOW HEBREW POINT QAMATS=COMBINING DOWN TACK BELOW HEBREW POINT QUBUTS=SEMANTIC BELOW+START GROUP+DOWN RIGHT DIAGONAL ELLIPSIS+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING HEBREW POINT RAFE=COMBINING MACRON HEBREW POINT SEGOL=COMBINING DIAERESIS BELOW+COMBINING DOT BELOW HEBREW POINT SHEVA=COMBINING DOT BELOW+COMBINING DOT BELOW HEBREW POINT SHIN DOT=SEMANTIC BEFORE+DOT ABOVE HEBREW POINT SIN DOT=SEMANTIC AFTER+DOT ABOVE HEBREW POINT TSERE=COMBINING DIAERESIS BELOW HEBREW PUNCTUATION GERESH=PRIME HEBREW PUNCTUATION GERSHAYIM=DOUBLE PRIME HEBREW PUNCTUATION MAQAF=MACRON HEBREW PUNCTUATION PASEQ=VERTICAL LINE HEBREW PUNCTUATION SOF PASUQ=COLON
So many Arabic characters are compositions involving the 29 letters of the alphabet that we may as well just take them in order and see what happens.
ARABIC DAMMA=COMBINING COMMA ABOVE ARABIC DAMMATAN=ARABIC DAMMA+ARABIC DAMMA ARABIC EMPTY CENTRE HIGH STOP=COMBINING RING ABOVE ARABIC EMPTY CENTRE LOW STOP=COMBINING RING BELOW ARABIC FATHA=COMBINING ACUTE ACCENT ARABIC FATHATAN=ARABIC FATHA+ARABIC FATHA ARABIC FULL STOP=NON-BREAKING HYPHEN ARABIC KASRA=COMBINING GRAVE ACCENT ARABIC KASRATAN=ARABIC KASRA+ARABIC KASRA ARABIC LETTER AIN WITH THREE DOTS ABOVE=ARABIC LETTER AIN+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER ALEF WITH HAMZA ABOVE=ARABIC LETTER ALEF+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER ALEF WITH HAMZA BELOW=ARABIC LETTER ALEF+SEMANTIC BELOW+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER ALEF WITH MADDA ABOVE=ARABIC LETTER ALEF+ARABIC SMALL HIGH MADDA ARABIC LETTER ALEF WITH WAVY HAMZA ABOVE=ARABIC LETTER ALEF+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+SEMANTIC VARIANT+POP DIRECTIONAL FORMATTING ARABIC LETTER ALEF WITH WAVY HAMZA BELOW=ARABIC LETTER ALEF+SEMANTIC BELOW+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+SEMANTIC VARIANT+POP DIRECTIONAL FORMATTING ARABIC LETTER BEEH=ARABIC LETTER DOTLESS BEH+COMBINING DOT BELOW+COMBINING DOT BELOW ARABIC LETTER BEH=ARABIC LETTER DOTLESS BEH+COMBINING DOT BELOW ARABIC LETTER BEHEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS BELOW+COMBINING DIAERESIS BELOW ARABIC LETTER DAD=ARABIC LETTER SAD+COMBINING DOT ABOVE ARABIC LETTER DAHAL=ARABIC LETTER DAL+COMBINING DIAERESIS ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH=ARABIC LETTER DAL+COMBINING DOT BELOW+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER DAL WITH DOT BELOW=ARABIC LETTER DAL+COMBINING DOT BELOW ARABIC LETTER DAL WITH FOUR DOTS ABOVE=ARABIC LETTER DAL+COMBINING DIAERESIS+COMBINING DIAERESIS ARABIC LETTER DAL WITH RING=ARABIC LETTER DAL+COMBINING RING BELOW ARABIC LETTER DAL WITH THREE DOTS ABOVE DOWNWARDS=ARABIC LETTER DAL+COMBINING DOT ABOVE+COMBINING DIAERESIS ARABIC LETTER DDAHAL=ARABIC LETTER DAL+COMBINING DIAERESIS BELOW ARABIC LETTER DDAL=ARABIC LETTER DAL+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER DUL=ARABIC LETTER DAL+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER DYEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+COLON ARABIC LETTER E=ARABIC LETTER FARSI YEH+COMBINING DOT BELOW+COMBINING DOT BELOW ARABIC LETTER FEH WITH DOT BELOW=ARABIC LETTER FEH+COMBINING DOT BELOW ARABIC LETTER FEH WITH DOT MOVED BELOW=ARABIC LETTER DOTLESS FEH+COMBINING DOT BELOW ARABIC LETTER FEH WITH THREE DOTS BELOW=ARABIC LETTER DOTLESS FEH+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW ARABIC LETTER FEH=ARABIC LETTER DOTLESS FEH+COMBINING DOT ABOVE ARABIC LETTER GAF WITH RING=ARABIC LETTER GAF+COMBINING RING OVERLAY ARABIC LETTER GAF WITH THREE DOTS ABOVE=ARABIC LETTER GAF+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER GAF WITH TWO DOTS BELOW=ARABIC LETTER GAF+COMBINING DIAERESIS BELOW ARABIC LETTER GAF=ARABIC LETTER KEHEH+ARABIC FATHA ARABIC LETTER GHAIN=ARABIC LETTER AIN+COMBINING DOT ABOVE ARABIC LETTER GUEH=ARABIC LETTER GAF+COMBINING DOT BELOW+COMBINING DOT BELOW ARABIC LETTER HAH WITH HAMZA ABOVE=ARABIC LETTER HAH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER HAH WITH THREE DOTS ABOVE=ARABIC LETTER HAH+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE=ARABIC LETTER HAH+COMBINING DOT ABOVE+COMBINING DOT ABOVE ARABIC LETTER HEH DOACHASHMEE=ARABIC LETTER HEH+SEMANTIC VARIANT ARABIC LETTER HEH GOAL WITH HAMZA ABOVE=ARABIC LETTER HEH GOAL+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER HEH WITH YEH ABOVE=ARABIC LETTER AE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER HIGH HAMZA ALEF=ARABIC LETTER ALEF+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA ARABIC LETTER HIGH HAMZA WAW=ARABIC LETTER WAW+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA ARABIC LETTER HIGH HAMZA YEH=ARABIC LETTER YEH+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA ARABIC LETTER HIGH HAMZA=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER JEEM=ARABIC LETTER HAH+SEMANTIC OVERPRINT+MIDDLE DOT ARABIC LETTER JEH=ARABIC LETTER REH+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER KAF WITH DOT ABOVE=ARABIC LETTER KAF+COMBINING DOT ABOVE ARABIC LETTER KAF WITH RING=ARABIC LETTER KEHEH+COMBINING RING OVERLAY ARABIC LETTER KAF WITH THREE DOTS BELOW=ARABIC LETTER KAF+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW ARABIC LETTER KHAH=ARABIC LETTER HAH+COMBINING DOT ABOVE ARABIC LETTER KIRGHIZ OE=ARABIC LETTER WAW+COMBINING SHORT STROKE OVERLAY ARABIC LETTER KIRGHIZ YU=ARABIC LETTER WAW+COMBINING CIRCUMFLEX ACCENT ARABIC LETTER LAM WITH DOT ABOVE=ARABIC LETTER LAM+COMBINING DOT ABOVE ARABIC LETTER LAM WITH SMALL V=ARABIC LETTER LAM+COMBINING CARON ARABIC LETTER LAM WITH THREE DOTS ABOVE=ARABIC LETTER LAM+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER NG=ARABIC LETTER KAF+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER NGOEH=ARABIC LETTER GAF+COMBINING DIAERESIS ARABIC LETTER NOON WITH RING=ARABIC LETTER NOON+COMBINING RING BELOW ARABIC LETTER NOON WITH THREE DOTS ABOVE=ARABIC LETTER NOON GHUNNA+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER NOON=ARABIC LETTER NOON GHUNNA+COMBINING DOT ABOVE ARABIC LETTER NYEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+START GROUP+MIDDLE DOT+MIDDLE DOT+POP DIRECTIONAL FORMATTING ARABIC LETTER OE=ARABIC LETTER WAW+COMBINING CARON ARABIC LETTER PEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW ARABIC LETTER PEHEH=ARABIC LETTER DOTLESS FEH+COMBINING DIAERESIS+COMBINING DIAERESIS ARABIC LETTER QAF WITH DOT ABOVE=ARABIC LETTER DOTLESS QAF+COMBINING DOT ABOVE ARABIC LETTER QAF WITH THREE DOTS ABOVE=ARABIC LETTER DOTLESS QAF+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER QAF=ARABIC LETTER DOTLESS QAF+COMBINING DIAERESIS ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE=ARABIC LETTER REH+COMBINING DOT BELOW+SEMANTIC OVERPRINT+MIDDLE DOT ARABIC LETTER REH WITH DOT BELOW=ARABIC LETTER REH+COMBINING DOT BELOW ARABIC LETTER REH WITH FOUR DOTS ABOVE=ARABIC LETTER REH+COMBINING DIAERESIS+COMBINING DIAERESIS ARABIC LETTER REH WITH RING=ARABIC LETTER REH+COMBINING RING BELOW ARABIC LETTER REH WITH SMALL V BELOW=ARABIC LETTER REH+COMBINING CARON BELOW ARABIC LETTER REH WITH SMALL V=ARABIC LETTER REH+COMBINING CARON ARABIC LETTER REH WITH TWO DOTS ABOVE=ARABIC LETTER REH+COMBINING DIAERESIS ARABIC LETTER RNOON=ARABIC LETTER NOON GHUNNA+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER RREH=ARABIC LETTER REH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER SAD WITH THREE DOTS ABOVE=ARABIC LETTER SAD+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER SAD WITH TWO DOTS BELOW=ARABIC LETTER SAD+COMBINING DIAERESIS BELOW ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE=ARABIC LETTER SEEN+COMBINING DOT ABOVE+COMBINING DOT BELOW ARABIC LETTER SEEN WITH THREE DOTS BELOW AND THREE DOTS ABOVE=ARABIC LETTER SEEN+COMBINING DIAERESIS+COMBINING DOT ABOVE+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW ARABIC LETTER SEEN WITH THREE DOTS BELOW=ARABIC LETTER SEEN+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW ARABIC LETTER SHEEN=ARABIC LETTER SEEN+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER SUPERSCRIPT ALEF=SEMANTIC ABOVE+START GROUP+ARABIC LETTER ALEF+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER SWASH KAF=ARABIC LETTER KEHEH+SEMANTIC VARIANT ARABIC LETTER TAH WITH THREE DOTS ABOVE=ARABIC LETTER TAH+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER TCHEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+START GROUP+START GROUP+MIDDLE DOT+MIDDLE DOT+POP DIRECTIONAL FORMATTING+SEMANTIC BELOW+MIDDLE DOT+POP DIRECTIONAL FORMATTING ARABIC LETTER TCHEHEH=ARABIC LETTER HAH+SEMANTIC OVERPRINT+START GROUP+COLON+COLON+POP DIRECTIONAL FORMATTING ARABIC LETTER TEH MARBUTA GOAL=ARABIC LETTER HEH GOAL+COMBINING DIAERESIS ARABIC LETTER TEH MARBUTA=ARABIC LETTER AE+COMBINING DIAERESIS ARABIC LETTER TEH WITH RING=ARABIC LETTER TEH+COMBINING RING BELOW ARABIC LETTER TEH WITH THREE DOTS ABOVE DOWNWARDS=ARABIC LETTER DOTLESS BEH+COMBINING DOT ABOVE+COMBINING DIAERESIS ARABIC LETTER TEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS ARABIC LETTER TEHEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS+COMBINING DIAERESIS ARABIC LETTER THAL=ARABIC LETTER DAL+COMBINING DOT ABOVE ARABIC LETTER THEH=ARABIC LETTER DOTLESS BEH+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER TTEH=ARABIC LETTER DOTLESS BEH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER TAH+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER TTEHEH=ARABIC LETTER DOTLESS BEH+COMBINING DOT ABOVE+COMBINING DOT ABOVE ARABIC LETTER U WITH HAMZA ABOVE=ARABIC LETTER U+SEMANTIC BEFORE+ARABIC LETTER HIGH HAMZA ARABIC LETTER U=ARABIC LETTER WAW+ARABIC DAMMA ARABIC LETTER VE=ARABIC LETTER WAW+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER VEH=ARABIC LETTER DOTLESS FEH+COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC LETTER WAW WITH HAMZA ABOVE=ARABIC LETTER WAW+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER WAW WITH RING=ARABIC LETTER WAW+COMBINING RING OVERLAY ARABIC LETTER WAW WITH TWO DOTS ABOVE=ARABIC LETTER WAW+COMBINING DIAERESIS ARABIC LETTER YEH BARREE WITH HAMZA ABOVE=ARABIC LETTER YEH BARREE+SEMANTIC VARIANT+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER YEH BARREE=ARABIC LETTER FARSI YEH+SEMANTIC VARIANT ARABIC LETTER YEH WITH HAMZA ABOVE=ARABIC LETTER YEH+SEMANTIC ABOVE+START GROUP+ARABIC LETTER HAMZA+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC LETTER YEH WITH SMALL V=ARABIC LETTER FARSI YEH+COMBINING CARON ARABIC LETTER YEH WITH TAIL=ARABIC LETTER FARSI YEH+COMBINING HOOK ARABIC LETTER YEH WITH THREE DOTS BELOW=ARABIC LETTER FARSI YEH+COMBINING DIAERESIS BELOW+COMBINING DOT BELOW ARABIC LETTER YEH=ARABIC LETTER ALEF MAKSURA+COMBINING DIAERESIS BELOW ARABIC LETTER YU=ARABIC LETTER WAW+ARABIC LETTER SUPERSCRIPT ALEF ARABIC LETTER ZAH=ARABIC LETTER TAH+COMBINING DOT ABOVE ARABIC LETTER ZAIN=ARABIC LETTER REH+COMBINING DOT ABOVE ARABIC PERCENT SIGN=MIDDLE DOT+FRACTION SLASH+MIDDLE DOT ARABIC QUESTION MARK=QUESTION MARK+SEMANTIC REVERSED ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE=COMBINING DOT ABOVE ARABIC SMALL HIGH DOTLESS HEAD OF KHAH=SEMANTIC ABOVE+START GROUP+GREATER-THAN SIGN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH JEEM=SEMANTIC ABOVE+START GROUP+ARABIC LETTER JEEM+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH LAM ALEF=SEMANTIC ABOVE+START GROUP+START GROUP+ARABIC LETTER LAM+ARABIC LETTER ALEF+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA=SEMANTIC ABOVE+START GROUP+START GROUP+ARABIC LETTER QAF+ARABIC LETTER LAM+ARABIC LETTER ALEF MAKSURA+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA=SEMANTIC ABOVE+START GROUP+START GROUP+ARABIC LETTER SAD+ARABIC LETTER LAM+ARABIC LETTER ALEF MAKSURA+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH MEEM INITIAL FORM=SEMANTIC ABOVE+START GROUP+START GROUP+ZERO WIDTH NON-JOINER+ARABIC LETTER MEEM+ZERO WIDTH JOINER+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH MEEM ISOLATED FORM=SEMANTIC ABOVE+START GROUP+START GROUP+ZERO WIDTH NON-JOINER+ARABIC LETTER MEEM+ZERO WIDTH NON-JOINER+POP DIRECTIONAL FORMATTING+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH NOON=SEMANTIC ABOVE+START GROUP+ARABIC LETTER NOON+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH ROUNDED ZERO=COMBINING DOT ABOVE ARABIC SMALL HIGH SEEN=SEMANTIC ABOVE+START GROUP+ARABIC LETTER SEEN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL HIGH THREE DOTS=COMBINING DIAERESIS+COMBINING DOT ABOVE ARABIC SMALL HIGH YEH=SEMANTIC ABOVE+START GROUP+ARABIC LETTER YEH BARREE+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL LOW MEEM=SEMANTIC BELOW+START GROUP+ARABIC LETTER MEEM+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL LOW SEEN=SEMANTIC BELOW+START GROUP+ARABIC LETTER SEEN+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL WAW=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER WAW+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC SMALL YEH=ZERO WIDTH NO-BREAK SPACE+SEMANTIC ABOVE+START GROUP+ARABIC LETTER YEH BARREE+SEMANTIC SMALL+POP DIRECTIONAL FORMATTING ARABIC TATWEEL=ZERO WIDTH JOINER+EN DASH+ZERO WIDTH JOINER ARABIC THOUSANDS SEPARATOR=COMMA EXTENDED ARABIC-INDIC DIGIT EIGHT=ARABIC-INDIC DIGIT EIGHT+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT FIVE=ARABIC-INDIC DIGIT FIVE+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT FOUR=ARABIC-INDIC DIGIT FOUR+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT NINE=ARABIC-INDIC DIGIT NINE+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT ONE=ARABIC-INDIC DIGIT ONE+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT SEVEN=ARABIC-INDIC DIGIT SEVEN+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT SIX=ARABIC-INDIC DIGIT SIX+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT THREE=ARABIC-INDIC DIGIT THREE+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT TWO=ARABIC-INDIC DIGIT TWO+SEMANTIC VARIANT EXTENDED ARABIC-INDIC DIGIT ZERO=ARABIC-INDIC DIGIT ZERO+SEMANTIC VARIANT
There is structure within each of the Indic scripts which is described in the Unicode standard, but not specified in the code charts: any letter which is a vowel can be represented as a letter A followed by the corresponding vowel sign. This is completely hopeless from the point of simplifying rendering---in fact, it complicates it, as the glyphs bear little visual relationship. But the structure is there, and should be dealt with correctly.
BENGALI LETTER AA=BENGALI LETTER A+BENGALI VOWEL SIGN AA BENGALI LETTER AI=BENGALI LETTER A+BENGALI VOWEL SIGN AI BENGALI LETTER AU=BENGALI LETTER A+BENGALI VOWEL SIGN AU BENGALI LETTER E=BENGALI LETTER A+BENGALI VOWEL SIGN E BENGALI LETTER I=BENGALI LETTER A+BENGALI VOWEL SIGN I BENGALI LETTER II=BENGALI LETTER A+BENGALI VOWEL SIGN II BENGALI LETTER O=BENGALI LETTER A+BENGALI VOWEL SIGN O BENGALI LETTER U=BENGALI LETTER A+BENGALI VOWEL SIGN U BENGALI LETTER UU=BENGALI LETTER A+BENGALI VOWEL SIGN UU BENGALI LETTER VOCALIC L=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC L BENGALI LETTER VOCALIC LL=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC LL BENGALI LETTER VOCALIC R=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC R BENGALI LETTER VOCALIC RR=BENGALI LETTER A+BENGALI VOWEL SIGN VOCALIC RR DEVANAGARI LETTER AA=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN AA DEVANAGARI LETTER AI=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN AI DEVANAGARI LETTER AU=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN AU DEVANAGARI LETTER CANDRA E=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN CANDRA E DEVANAGARI LETTER CANDRA O=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN CANDRA E DEVANAGARI LETTER E=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN E DEVANAGARI LETTER I=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN I DEVANAGARI LETTER II=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN II DEVANAGARI LETTER NGA=DEVANAGARI LETTER DDA+SEMANTIC AFTER+MIDDLE DOT DEVANAGARI LETTER O=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN O DEVANAGARI LETTER SHORT E=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN SHORT E DEVANAGARI LETTER SHORT O=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN SHORT E DEVANAGARI LETTER U=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN U DEVANAGARI LETTER UU=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN UU DEVANAGARI LETTER VOCALIC L=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC L DEVANAGARI LETTER VOCALIC LL=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC LL DEVANAGARI LETTER VOCALIC R=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC R DEVANAGARI LETTER VOCALIC RR=DEVANAGARI LETTER A+DEVANAGARI VOWEL SIGN VOCALIC RR GUJARATI LETTER AA=GUJARATI LETTER A+GUJARATI VOWEL SIGN AA GUJARATI LETTER AI=GUJARATI LETTER A+GUJARATI VOWEL SIGN AI GUJARATI LETTER AU=GUJARATI LETTER A+GUJARATI VOWEL SIGN AU GUJARATI LETTER E=GUJARATI LETTER A+GUJARATI VOWEL SIGN E GUJARATI LETTER I=GUJARATI LETTER A+GUJARATI VOWEL SIGN I GUJARATI LETTER II=GUJARATI LETTER A+GUJARATI VOWEL SIGN II GUJARATI LETTER O=GUJARATI LETTER A+GUJARATI VOWEL SIGN O GUJARATI LETTER U=GUJARATI LETTER A+GUJARATI VOWEL SIGN U GUJARATI LETTER UU=GUJARATI LETTER A+GUJARATI VOWEL SIGN UU GUJARATI LETTER VOCALIC R=GUJARATI LETTER A+GUJARATI VOWEL SIGN VOCALIC R GUJARATI LETTER VOCALIC RR=GUJARATI LETTER A+GUJARATI VOWEL SIGN VOCALIC RR GUJARATI VOWEL CANDRA E=GUJARATI LETTER A+GUJARATI VOWEL SIGN CANDRA E GUJARATI VOWEL CANDRA O=GUJARATI LETTER A+GUJARATI VOWEL SIGN CANDRA O GURMUKHI LETTER AA=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN AA GURMUKHI LETTER AI=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN AI GURMUKHI LETTER AU=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN AU GURMUKHI LETTER EE=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN EE GURMUKHI LETTER I=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN I GURMUKHI LETTER II=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN II GURMUKHI LETTER OO=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN OO GURMUKHI LETTER U=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN U GURMUKHI LETTER UU=GURMUKHI LETTER A+GURMUKHI VOWEL SIGN UU KANNADA LETTER AA=KANNADA LETTER A+KANNADA VOWEL SIGN AA KANNADA LETTER AI=KANNADA LETTER A+KANNADA VOWEL SIGN AI KANNADA LETTER AU=KANNADA LETTER A+KANNADA VOWEL SIGN AU KANNADA LETTER E=KANNADA LETTER A+KANNADA VOWEL SIGN E KANNADA LETTER EE=KANNADA LETTER A+KANNADA VOWEL SIGN EE KANNADA LETTER I=KANNADA LETTER A+KANNADA VOWEL SIGN I KANNADA LETTER II=KANNADA LETTER A+KANNADA VOWEL SIGN II KANNADA LETTER O=KANNADA LETTER A+KANNADA VOWEL SIGN O KANNADA LETTER OO=KANNADA LETTER A+KANNADA VOWEL SIGN OO KANNADA LETTER U=KANNADA LETTER A+KANNADA VOWEL SIGN U KANNADA LETTER UU=KANNADA LETTER A+KANNADA VOWEL SIGN UU KANNADA LETTER VOCALIC R=KANNADA LETTER A+KANNADA VOWEL SIGN VOCALIC R KANNADA LETTER VOCALIC RR=KANNADA LETTER A+KANNADA VOWEL SIGN VOCALIC RR MALAYALAM LETTER AA=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN AA MALAYALAM LETTER AI=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN AI MALAYALAM LETTER AU=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN AU MALAYALAM LETTER E=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN E MALAYALAM LETTER EE=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN EE MALAYALAM LETTER I=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN I MALAYALAM LETTER II=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN II MALAYALAM LETTER O=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN O MALAYALAM LETTER OO=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN OO MALAYALAM LETTER U=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN U MALAYALAM LETTER UU=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN UU MALAYALAM LETTER VOCALIC R=MALAYALAM LETTER A+MALAYALAM VOWEL SIGN VOCALIC R ORIYA LETTER AA=ORIYA LETTER A+ORIYA VOWEL SIGN AA ORIYA LETTER AI=ORIYA LETTER A+ORIYA VOWEL SIGN AI ORIYA LETTER AU=ORIYA LETTER A+ORIYA VOWEL SIGN AU ORIYA LETTER E=ORIYA LETTER A+ORIYA VOWEL SIGN E ORIYA LETTER I=ORIYA LETTER A+ORIYA VOWEL SIGN I ORIYA LETTER II=ORIYA LETTER A+ORIYA VOWEL SIGN II ORIYA LETTER O=ORIYA LETTER A+ORIYA VOWEL SIGN O ORIYA LETTER U=ORIYA LETTER A+ORIYA VOWEL SIGN U ORIYA LETTER UU=ORIYA LETTER A+ORIYA VOWEL SIGN UU ORIYA LETTER VOCALIC R=ORIYA LETTER A+ORIYA VOWEL SIGN VOCALIC R TAMIL LETTER AA=TAMIL LETTER A+TAMIL VOWEL SIGN AA TAMIL LETTER AI=TAMIL LETTER A+TAMIL VOWEL SIGN AI TAMIL LETTER E=TAMIL LETTER A+TAMIL VOWEL SIGN E TAMIL LETTER EE=TAMIL LETTER A+TAMIL VOWEL SIGN EE TAMIL LETTER I=TAMIL LETTER A+TAMIL VOWEL SIGN I TAMIL LETTER II=TAMIL LETTER A+TAMIL VOWEL SIGN II TAMIL LETTER O=TAMIL LETTER A+TAMIL VOWEL SIGN O TAMIL LETTER OO=TAMIL LETTER A+TAMIL VOWEL SIGN OO TAMIL LETTER U=TAMIL LETTER A+TAMIL VOWEL SIGN U TAMIL LETTER UU=TAMIL LETTER A+TAMIL VOWEL SIGN UU TELUGU LETTER AA=TELUGU LETTER A+TELUGU VOWEL SIGN AA TELUGU LETTER AI=TELUGU LETTER A+TELUGU VOWEL SIGN AI TELUGU LETTER AU=TELUGU LETTER A+TELUGU VOWEL SIGN AU TELUGU LETTER E=TELUGU LETTER A+TELUGU VOWEL SIGN E TELUGU LETTER EE=TELUGU LETTER A+TELUGU VOWEL SIGN EE TELUGU LETTER I=TELUGU LETTER A+TELUGU VOWEL SIGN I TELUGU LETTER II=TELUGU LETTER A+TELUGU VOWEL SIGN II TELUGU LETTER O=TELUGU LETTER A+TELUGU VOWEL SIGN O TELUGU LETTER OO=TELUGU LETTER A+TELUGU VOWEL SIGN OO TELUGU LETTER U=TELUGU LETTER A+TELUGU VOWEL SIGN U TELUGU LETTER UU=TELUGU LETTER A+TELUGU VOWEL SIGN UU TELUGU LETTER VOCALIC R=TELUGU LETTER A+TELUGU VOWEL SIGN VOCALIC R TELUGU LETTER VOCALIC RR=TELUGU LETTER A+TELUGU VOWEL SIGN VOCALIC RR
Furthermore, the various Indic scripts are carefully kept in numerical harmony in order to facilitate a simple algorithmic transliteration between them. But unless this structural parallel is explicitly exposed, an ordinary user cannot make use of it (unless the implementor has gone out of its way to give help). Therefore, we introduce a set of semantics to represent the relationship: SEMANTIC BENGALI, SEMANTIC GUJARATI, SEMANTIC GURMUKHI, SEMANTIC KANNADA, SEMANTIC MALAYALAM, SEMANTIC ORIYA, SEMANTIC TAMIL and SEMANTIC TELUGU.
If a font engine uses these in the obvious way, it is possible to switch between 2 Indic scripts by enclosing arbitrary text between START GROUP and POP DIRECTIONAL FORMATTING characters, and appending a SEMANTIC character for the new script. (This assumes that only the last such suggestion has any effect.)
Since Devanagari is the oldest of the scripts, we use that as the basis for decomposition. But in order to maintain cultural neutrality we invent some non-existent Devanagari characters, so that all the scripts share the same structure. We also introduce a semantic for Devanagari itself: if these characters appear ``unadorned´´ by any script suggestion, a rendering agent could decide how to present them based on global user settings (e g, the current locale), rather than just assuming that they are ``real´´ Devanagari.
Amusing engineering side-effects of this machinery include the ability to code (in some sense) a whole new script, by adding a single character to the U C S---and still get some legibility if the character is not understood by the renderer.
So, here´s the list. The boy must be mad ...
BENGALI AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC BENGALI BENGALI DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC BENGALI BENGALI DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC BENGALI BENGALI DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC BENGALI BENGALI DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC BENGALI BENGALI DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC BENGALI BENGALI DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC BENGALI BENGALI DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC BENGALI BENGALI DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC BENGALI BENGALI DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC BENGALI BENGALI DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC BENGALI BENGALI LETTER A=DEVANAGARI LETTER A+SEMANTIC BENGALI BENGALI LETTER BA=DEVANAGARI LETTER BA+SEMANTIC BENGALI BENGALI LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC BENGALI BENGALI LETTER CA=DEVANAGARI LETTER CA+SEMANTIC BENGALI BENGALI LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC BENGALI BENGALI LETTER DA=DEVANAGARI LETTER DA+SEMANTIC BENGALI BENGALI LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC BENGALI BENGALI LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC BENGALI BENGALI LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC BENGALI BENGALI LETTER GA=DEVANAGARI LETTER GA+SEMANTIC BENGALI BENGALI LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC BENGALI BENGALI LETTER HA=DEVANAGARI LETTER HA+SEMANTIC BENGALI BENGALI LETTER JA=DEVANAGARI LETTER JA+SEMANTIC BENGALI BENGALI LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC BENGALI BENGALI LETTER KA=DEVANAGARI LETTER KA+SEMANTIC BENGALI BENGALI LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC BENGALI BENGALI LETTER LA=DEVANAGARI LETTER LA+SEMANTIC BENGALI BENGALI LETTER MA=DEVANAGARI LETTER MA+SEMANTIC BENGALI BENGALI LETTER NA=DEVANAGARI LETTER NA+SEMANTIC BENGALI BENGALI LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC BENGALI BENGALI LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC BENGALI BENGALI LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC BENGALI BENGALI LETTER PA=DEVANAGARI LETTER PA+SEMANTIC BENGALI BENGALI LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC BENGALI BENGALI LETTER SA=DEVANAGARI LETTER SA+SEMANTIC BENGALI BENGALI LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC BENGALI BENGALI LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC BENGALI BENGALI LETTER TA=DEVANAGARI LETTER TA+SEMANTIC BENGALI BENGALI LETTER THA=DEVANAGARI LETTER THA+SEMANTIC BENGALI BENGALI LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC BENGALI BENGALI LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC BENGALI BENGALI LETTER YA=DEVANAGARI LETTER YA+SEMANTIC BENGALI BENGALI SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING BENGALI SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING BENGALI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING BENGALI SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC BENGALI BENGALI SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC BENGALI+POP DIRECTIONAL FORMATTING BENGALI VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC BENGALI BENGALI VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC BENGALI BENGALI VOWEL SIGN E=DEVANAGARI VOWEL SIGN E+SEMANTIC BENGALI BENGALI VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC BENGALI BENGALI VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC BENGALI BENGALI VOWEL SIGN O=DEVANAGARI VOWEL SIGN O+SEMANTIC BENGALI BENGALI VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC BENGALI BENGALI VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC BENGALI BENGALI VOWEL SIGN VOCALIC L=DEVANAGARI VOWEL SIGN VOCALIC L+SEMANTIC BENGALI BENGALI VOWEL SIGN VOCALIC LL=DEVANAGARI VOWEL SIGN VOCALIC LL+SEMANTIC BENGALI BENGALI VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC BENGALI BENGALI VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC BENGALI GUJARATI DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC GUJARATI GUJARATI DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC GUJARATI GUJARATI DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC GUJARATI GUJARATI DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC GUJARATI GUJARATI DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC GUJARATI GUJARATI DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC GUJARATI GUJARATI DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC GUJARATI GUJARATI DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC GUJARATI GUJARATI DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC GUJARATI GUJARATI DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC GUJARATI GUJARATI LETTER A=DEVANAGARI LETTER A+SEMANTIC GUJARATI GUJARATI LETTER BA=DEVANAGARI LETTER BA+SEMANTIC GUJARATI GUJARATI LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC GUJARATI GUJARATI LETTER CA=DEVANAGARI LETTER CA+SEMANTIC GUJARATI GUJARATI LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC GUJARATI GUJARATI LETTER DA=DEVANAGARI LETTER DA+SEMANTIC GUJARATI GUJARATI LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC GUJARATI GUJARATI LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC GUJARATI GUJARATI LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC GUJARATI GUJARATI LETTER GA=DEVANAGARI LETTER GA+SEMANTIC GUJARATI GUJARATI LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC GUJARATI GUJARATI LETTER HA=DEVANAGARI LETTER HA+SEMANTIC GUJARATI GUJARATI LETTER JA=DEVANAGARI LETTER JA+SEMANTIC GUJARATI GUJARATI LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC GUJARATI GUJARATI LETTER KA=DEVANAGARI LETTER KA+SEMANTIC GUJARATI GUJARATI LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC GUJARATI GUJARATI LETTER LA=DEVANAGARI LETTER LA+SEMANTIC GUJARATI GUJARATI LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC GUJARATI GUJARATI LETTER MA=DEVANAGARI LETTER MA+SEMANTIC GUJARATI GUJARATI LETTER NA=DEVANAGARI LETTER NA+SEMANTIC GUJARATI GUJARATI LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC GUJARATI GUJARATI LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC GUJARATI GUJARATI LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC GUJARATI GUJARATI LETTER PA=DEVANAGARI LETTER PA+SEMANTIC GUJARATI GUJARATI LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC GUJARATI GUJARATI LETTER RA=DEVANAGARI LETTER RA+SEMANTIC GUJARATI GUJARATI LETTER SA=DEVANAGARI LETTER SA+SEMANTIC GUJARATI GUJARATI LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC GUJARATI GUJARATI LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC GUJARATI GUJARATI LETTER TA=DEVANAGARI LETTER TA+SEMANTIC GUJARATI GUJARATI LETTER THA=DEVANAGARI LETTER THA+SEMANTIC GUJARATI GUJARATI LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC GUJARATI GUJARATI LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC GUJARATI GUJARATI LETTER VA=DEVANAGARI LETTER VA+SEMANTIC GUJARATI GUJARATI LETTER YA=DEVANAGARI LETTER YA+SEMANTIC GUJARATI GUJARATI OM=DEVANAGARI OM+SEMANTIC GUJARATI GUJARATI SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING GUJARATI SIGN AVAGRAHA=DEVANAGARI SIGN AVAGRAHA+SEMANTIC GUJARATI GUJARATI SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING GUJARATI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING GUJARATI SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC GUJARATI GUJARATI SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC GUJARATI+POP DIRECTIONAL FORMATTING GUJARATI VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC GUJARATI GUJARATI VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC GUJARATI GUJARATI VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC GUJARATI GUJARATI VOWEL SIGN CANDRA E=DEVANAGARI VOWEL SIGN CANDRA E+SEMANTIC GUJARATI GUJARATI VOWEL SIGN CANDRA O=DEVANAGARI VOWEL SIGN CANDRA O+SEMANTIC GUJARATI GUJARATI VOWEL SIGN E=DEVANAGARI VOWEL SIGN E+SEMANTIC GUJARATI GUJARATI VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC GUJARATI GUJARATI VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC GUJARATI GUJARATI VOWEL SIGN O=DEVANAGARI VOWEL SIGN O+SEMANTIC GUJARATI GUJARATI VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC GUJARATI GUJARATI VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC GUJARATI GUJARATI VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC GUJARATI GUJARATI VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC GUJARATI GURMUKHI ADDAK=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING GURMUKHI DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC GURMUKHI GURMUKHI DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC GURMUKHI GURMUKHI DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC GURMUKHI GURMUKHI DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC GURMUKHI GURMUKHI DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC GURMUKHI GURMUKHI DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC GURMUKHI GURMUKHI DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC GURMUKHI GURMUKHI DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC GURMUKHI GURMUKHI DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC GURMUKHI GURMUKHI DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC GURMUKHI GURMUKHI LETTER A=DEVANAGARI LETTER A+SEMANTIC GURMUKHI GURMUKHI LETTER BA=DEVANAGARI LETTER BA+SEMANTIC GURMUKHI GURMUKHI LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC GURMUKHI GURMUKHI LETTER CA=DEVANAGARI LETTER CA+SEMANTIC GURMUKHI GURMUKHI LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC GURMUKHI GURMUKHI LETTER DA=DEVANAGARI LETTER DA+SEMANTIC GURMUKHI GURMUKHI LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC GURMUKHI GURMUKHI LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC GURMUKHI GURMUKHI LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC GURMUKHI GURMUKHI LETTER GA=DEVANAGARI LETTER GA+SEMANTIC GURMUKHI GURMUKHI LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC GURMUKHI GURMUKHI LETTER HA=DEVANAGARI LETTER HA+SEMANTIC GURMUKHI GURMUKHI LETTER JA=DEVANAGARI LETTER JA+SEMANTIC GURMUKHI GURMUKHI LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC GURMUKHI GURMUKHI LETTER KA=DEVANAGARI LETTER KA+SEMANTIC GURMUKHI GURMUKHI LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC GURMUKHI GURMUKHI LETTER LA=DEVANAGARI LETTER LA+SEMANTIC GURMUKHI GURMUKHI LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC GURMUKHI GURMUKHI LETTER MA=DEVANAGARI LETTER MA+SEMANTIC GURMUKHI GURMUKHI LETTER NA=DEVANAGARI LETTER NA+SEMANTIC GURMUKHI GURMUKHI LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC GURMUKHI GURMUKHI LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC GURMUKHI GURMUKHI LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC GURMUKHI GURMUKHI LETTER PA=DEVANAGARI LETTER PA+SEMANTIC GURMUKHI GURMUKHI LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC GURMUKHI GURMUKHI LETTER RA=DEVANAGARI LETTER RA+SEMANTIC GURMUKHI GURMUKHI LETTER SA=DEVANAGARI LETTER SA+SEMANTIC GURMUKHI GURMUKHI LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC GURMUKHI GURMUKHI LETTER TA=DEVANAGARI LETTER TA+SEMANTIC GURMUKHI GURMUKHI LETTER THA=DEVANAGARI LETTER THA+SEMANTIC GURMUKHI GURMUKHI LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC GURMUKHI GURMUKHI LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC GURMUKHI GURMUKHI LETTER VA=DEVANAGARI LETTER VA+SEMANTIC GURMUKHI GURMUKHI LETTER YA=DEVANAGARI LETTER YA+SEMANTIC GURMUKHI GURMUKHI SIGN BINDI=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING GURMUKHI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING GURMUKHI SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC GURMUKHI GURMUKHI TIPPI=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC TURNED+SEMANTIC GURMUKHI+POP DIRECTIONAL FORMATTING GURMUKHI VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC GURMUKHI GURMUKHI VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC GURMUKHI KANNADA AI LENGTH MARK=DEVANAGARI AI LENGTH MARK+SEMANTIC KANNADA KANNADA DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC KANNADA KANNADA DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC KANNADA KANNADA DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC KANNADA KANNADA DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC KANNADA KANNADA DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC KANNADA KANNADA DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC KANNADA KANNADA DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC KANNADA KANNADA DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC KANNADA KANNADA DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC KANNADA KANNADA DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC KANNADA KANNADA LENGTH MARK=DEVANAGARI LENGTH MARK+SEMANTIC KANNADA KANNADA LETTER A=DEVANAGARI LETTER A+SEMANTIC KANNADA KANNADA LETTER BA=DEVANAGARI LETTER BA+SEMANTIC KANNADA KANNADA LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC KANNADA KANNADA LETTER CA=DEVANAGARI LETTER CA+SEMANTIC KANNADA KANNADA LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC KANNADA KANNADA LETTER DA=DEVANAGARI LETTER DA+SEMANTIC KANNADA KANNADA LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC KANNADA KANNADA LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC KANNADA KANNADA LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC KANNADA KANNADA LETTER FA=DEVANAGARI LETTER FA+SEMANTIC KANNADA KANNADA LETTER GA=DEVANAGARI LETTER GA+SEMANTIC KANNADA KANNADA LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC KANNADA KANNADA LETTER HA=DEVANAGARI LETTER HA+SEMANTIC KANNADA KANNADA LETTER JA=DEVANAGARI LETTER JA+SEMANTIC KANNADA KANNADA LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC KANNADA KANNADA LETTER KA=DEVANAGARI LETTER KA+SEMANTIC KANNADA KANNADA LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC KANNADA KANNADA LETTER LA=DEVANAGARI LETTER LA+SEMANTIC KANNADA KANNADA LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC KANNADA KANNADA LETTER MA=DEVANAGARI LETTER MA+SEMANTIC KANNADA KANNADA LETTER NA=DEVANAGARI LETTER NA+SEMANTIC KANNADA KANNADA LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC KANNADA KANNADA LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC KANNADA KANNADA LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC KANNADA KANNADA LETTER PA=DEVANAGARI LETTER PA+SEMANTIC KANNADA KANNADA LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC KANNADA KANNADA LETTER RA=DEVANAGARI LETTER RA+SEMANTIC KANNADA KANNADA LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC KANNADA KANNADA LETTER SA=DEVANAGARI LETTER SA+SEMANTIC KANNADA KANNADA LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC KANNADA KANNADA LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC KANNADA KANNADA LETTER TA=DEVANAGARI LETTER TA+SEMANTIC KANNADA KANNADA LETTER THA=DEVANAGARI LETTER THA+SEMANTIC KANNADA KANNADA LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC KANNADA KANNADA LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC KANNADA KANNADA LETTER VA=DEVANAGARI LETTER VA+SEMANTIC KANNADA KANNADA LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC KANNADA KANNADA LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC KANNADA KANNADA LETTER YA=DEVANAGARI LETTER YA+SEMANTIC KANNADA KANNADA SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC KANNADA+POP DIRECTIONAL FORMATTING KANNADA SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC KANNADA KANNADA SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC KANNADA+POP DIRECTIONAL FORMATTING KANNADA VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC KANNADA KANNADA VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC KANNADA KANNADA VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC KANNADA KANNADA VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC KANNADA KANNADA VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC KANNADA KANNADA VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC KANNADA KANNADA VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC KANNADA KANNADA VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC KANNADA KANNADA VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC KANNADA KANNADA VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC KANNADA KANNADA VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC KANNADA MALAYALAM AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC MALAYALAM MALAYALAM DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC MALAYALAM MALAYALAM DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC MALAYALAM MALAYALAM DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC MALAYALAM MALAYALAM DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC MALAYALAM MALAYALAM DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC MALAYALAM MALAYALAM DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC MALAYALAM MALAYALAM DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC MALAYALAM MALAYALAM DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC MALAYALAM MALAYALAM DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC MALAYALAM MALAYALAM DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC MALAYALAM MALAYALAM LETTER A=DEVANAGARI LETTER A+SEMANTIC MALAYALAM MALAYALAM LETTER BA=DEVANAGARI LETTER BA+SEMANTIC MALAYALAM MALAYALAM LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC MALAYALAM MALAYALAM LETTER CA=DEVANAGARI LETTER CA+SEMANTIC MALAYALAM MALAYALAM LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC MALAYALAM MALAYALAM LETTER DA=DEVANAGARI LETTER DA+SEMANTIC MALAYALAM MALAYALAM LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC MALAYALAM MALAYALAM LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC MALAYALAM MALAYALAM LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC MALAYALAM MALAYALAM LETTER GA=DEVANAGARI LETTER GA+SEMANTIC MALAYALAM MALAYALAM LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC MALAYALAM MALAYALAM LETTER HA=DEVANAGARI LETTER HA+SEMANTIC MALAYALAM MALAYALAM LETTER JA=DEVANAGARI LETTER JA+SEMANTIC MALAYALAM MALAYALAM LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC MALAYALAM MALAYALAM LETTER KA=DEVANAGARI LETTER KA+SEMANTIC MALAYALAM MALAYALAM LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC MALAYALAM MALAYALAM LETTER LA=DEVANAGARI LETTER LA+SEMANTIC MALAYALAM MALAYALAM LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC MALAYALAM MALAYALAM LETTER LLLA=DEVANAGARI LETTER LLLA+SEMANTIC MALAYALAM MALAYALAM LETTER MA=DEVANAGARI LETTER MA+SEMANTIC MALAYALAM MALAYALAM LETTER NA=DEVANAGARI LETTER NA+SEMANTIC MALAYALAM MALAYALAM LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC MALAYALAM MALAYALAM LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC MALAYALAM MALAYALAM LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC MALAYALAM MALAYALAM LETTER PA=DEVANAGARI LETTER PA+SEMANTIC MALAYALAM MALAYALAM LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC MALAYALAM MALAYALAM LETTER RA=DEVANAGARI LETTER RA+SEMANTIC MALAYALAM MALAYALAM LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC MALAYALAM MALAYALAM LETTER SA=DEVANAGARI LETTER SA+SEMANTIC MALAYALAM MALAYALAM LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC MALAYALAM MALAYALAM LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC MALAYALAM MALAYALAM LETTER TA=DEVANAGARI LETTER TA+SEMANTIC MALAYALAM MALAYALAM LETTER THA=DEVANAGARI LETTER THA+SEMANTIC MALAYALAM MALAYALAM LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC MALAYALAM MALAYALAM LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC MALAYALAM MALAYALAM LETTER VA=DEVANAGARI LETTER VA+SEMANTIC MALAYALAM MALAYALAM LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC MALAYALAM MALAYALAM LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC MALAYALAM MALAYALAM LETTER VOCALIC RR=DEVANAGARI LETTER VOCALIC RR+SEMANTIC MALAYALAM MALAYALAM LETTER YA=DEVANAGARI LETTER YA+SEMANTIC MALAYALAM MALAYALAM SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC MALAYALAM+POP DIRECTIONAL FORMATTING MALAYALAM SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC MALAYALAM MALAYALAM SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING MALAYALAM VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC MALAYALAM MALAYALAM VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC MALAYALAM ORIYA AI LENGTH MARK=DEVANAGARI AI LENGTH MARK+SEMANTIC ORIYA ORIYA AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC ORIYA ORIYA DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC ORIYA ORIYA DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC ORIYA ORIYA DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC ORIYA ORIYA DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC ORIYA ORIYA DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC ORIYA ORIYA DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC ORIYA ORIYA DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC ORIYA ORIYA DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC ORIYA ORIYA DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC ORIYA ORIYA DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC ORIYA ORIYA LETTER A=DEVANAGARI LETTER A+SEMANTIC ORIYA ORIYA LETTER BA=DEVANAGARI LETTER BA+SEMANTIC ORIYA ORIYA LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC ORIYA ORIYA LETTER CA=DEVANAGARI LETTER CA+SEMANTIC ORIYA ORIYA LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC ORIYA ORIYA LETTER DA=DEVANAGARI LETTER DA+SEMANTIC ORIYA ORIYA LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC ORIYA ORIYA LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC ORIYA ORIYA LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC ORIYA ORIYA LETTER GA=DEVANAGARI LETTER GA+SEMANTIC ORIYA ORIYA LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC ORIYA ORIYA LETTER HA=DEVANAGARI LETTER HA+SEMANTIC ORIYA ORIYA LETTER JA=DEVANAGARI LETTER JA+SEMANTIC ORIYA ORIYA LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC ORIYA ORIYA LETTER KA=DEVANAGARI LETTER KA+SEMANTIC ORIYA ORIYA LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC ORIYA ORIYA LETTER LA=DEVANAGARI LETTER LA+SEMANTIC ORIYA ORIYA LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC ORIYA ORIYA LETTER MA=DEVANAGARI LETTER MA+SEMANTIC ORIYA ORIYA LETTER NA=DEVANAGARI LETTER NA+SEMANTIC ORIYA ORIYA LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC ORIYA ORIYA LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC ORIYA ORIYA LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC ORIYA ORIYA LETTER PA=DEVANAGARI LETTER PA+SEMANTIC ORIYA ORIYA LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC ORIYA ORIYA LETTER RA=DEVANAGARI LETTER RA+SEMANTIC ORIYA ORIYA LETTER SA=DEVANAGARI LETTER SA+SEMANTIC ORIYA ORIYA LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC ORIYA ORIYA LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC ORIYA ORIYA LETTER TA=DEVANAGARI LETTER TA+SEMANTIC ORIYA ORIYA LETTER THA=DEVANAGARI LETTER THA+SEMANTIC ORIYA ORIYA LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC ORIYA ORIYA LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC ORIYA ORIYA LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC ORIYA ORIYA LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC ORIYA ORIYA LETTER VOCALIC RR=DEVANAGARI LETTER VOCALIC RR+SEMANTIC ORIYA ORIYA LETTER YA=DEVANAGARI LETTER YA+SEMANTIC ORIYA ORIYA SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING ORIYA SIGN AVAGRAHA=DEVANAGARI SIGN AVAGRAHA+SEMANTIC ORIYA ORIYA SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING ORIYA SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING ORIYA SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC ORIYA ORIYA SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC ORIYA+POP DIRECTIONAL FORMATTING ORIYA VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC ORIYA ORIYA VOWEL SIGN E=DEVANAGARI VOWEL SIGN E+SEMANTIC ORIYA ORIYA VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC ORIYA ORIYA VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC ORIYA ORIYA VOWEL SIGN O=DEVANAGARI VOWEL SIGN O+SEMANTIC ORIYA ORIYA VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC ORIYA ORIYA VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC ORIYA ORIYA VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC ORIYA TAMIL AU LENGTH MARK=DEVANAGARI AU LENGTH MARK+SEMANTIC TAMIL TAMIL DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC TAMIL TAMIL DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC TAMIL TAMIL DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC TAMIL TAMIL DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC TAMIL TAMIL DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC TAMIL TAMIL DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC TAMIL TAMIL DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC TAMIL TAMIL DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC TAMIL TAMIL DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC TAMIL TAMIL LETTER A=DEVANAGARI LETTER A+SEMANTIC TAMIL TAMIL LETTER CA=DEVANAGARI LETTER CA+SEMANTIC TAMIL TAMIL LETTER HA=DEVANAGARI LETTER HA+SEMANTIC TAMIL TAMIL LETTER JA=DEVANAGARI LETTER JA+SEMANTIC TAMIL TAMIL LETTER KA=DEVANAGARI LETTER KA+SEMANTIC TAMIL TAMIL LETTER LA=DEVANAGARI LETTER LA+SEMANTIC TAMIL TAMIL LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC TAMIL TAMIL LETTER LLLA=DEVANAGARI LETTER LLLA+SEMANTIC TAMIL TAMIL LETTER MA=DEVANAGARI LETTER MA+SEMANTIC TAMIL TAMIL LETTER NA=DEVANAGARI LETTER NA+SEMANTIC TAMIL TAMIL LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC TAMIL TAMIL LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC TAMIL TAMIL LETTER NNNA=DEVANAGARI LETTER NNNA+SEMANTIC TAMIL TAMIL LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC TAMIL TAMIL LETTER PA=DEVANAGARI LETTER PA+SEMANTIC TAMIL TAMIL LETTER RA=DEVANAGARI LETTER RA+SEMANTIC TAMIL TAMIL LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC TAMIL TAMIL LETTER SA=DEVANAGARI LETTER SA+SEMANTIC TAMIL TAMIL LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC TAMIL TAMIL LETTER TA=DEVANAGARI LETTER TA+SEMANTIC TAMIL TAMIL LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC TAMIL TAMIL LETTER VA=DEVANAGARI LETTER VA+SEMANTIC TAMIL TAMIL LETTER YA=DEVANAGARI LETTER YA+SEMANTIC TAMIL TAMIL SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC TAMIL+POP DIRECTIONAL FORMATTING TAMIL SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC TAMIL TAMIL SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC TAMIL+POP DIRECTIONAL FORMATTING TAMIL VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC TAMIL TAMIL VOWEL SIGN AI=DEVANAGARI VOWEL SIGN AI+SEMANTIC TAMIL TAMIL VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC TAMIL TAMIL VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC TAMIL TAMIL VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC TAMIL TAMIL VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC TAMIL TAMIL VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC TAMIL TAMIL VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC TAMIL TAMIL VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC TAMIL TAMIL VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC TAMIL TELUGU AI LENGTH MARK=DEVANAGARI AI LENGTH MARK+SEMANTIC TELUGU TELUGU DIGIT EIGHT=DEVANAGARI DIGIT EIGHT+SEMANTIC TELUGU TELUGU DIGIT FIVE=DEVANAGARI DIGIT FIVE+SEMANTIC TELUGU TELUGU DIGIT FOUR=DEVANAGARI DIGIT FOUR+SEMANTIC TELUGU TELUGU DIGIT NINE=DEVANAGARI DIGIT NINE+SEMANTIC TELUGU TELUGU DIGIT ONE=DEVANAGARI DIGIT ONE+SEMANTIC TELUGU TELUGU DIGIT SEVEN=DEVANAGARI DIGIT SEVEN+SEMANTIC TELUGU TELUGU DIGIT SIX=DEVANAGARI DIGIT SIX+SEMANTIC TELUGU TELUGU DIGIT THREE=DEVANAGARI DIGIT THREE+SEMANTIC TELUGU TELUGU DIGIT TWO=DEVANAGARI DIGIT TWO+SEMANTIC TELUGU TELUGU DIGIT ZERO=DEVANAGARI DIGIT ZERO+SEMANTIC TELUGU TELUGU LENGTH MARK=DEVANAGARI LENGTH MARK+SEMANTIC TELUGU TELUGU LETTER A=DEVANAGARI LETTER A+SEMANTIC TELUGU TELUGU LETTER BA=DEVANAGARI LETTER BA+SEMANTIC TELUGU TELUGU LETTER BHA=DEVANAGARI LETTER BHA+SEMANTIC TELUGU TELUGU LETTER CA=DEVANAGARI LETTER CA+SEMANTIC TELUGU TELUGU LETTER CHA=DEVANAGARI LETTER CHA+SEMANTIC TELUGU TELUGU LETTER DA=DEVANAGARI LETTER DA+SEMANTIC TELUGU TELUGU LETTER DDA=DEVANAGARI LETTER DDA+SEMANTIC TELUGU TELUGU LETTER DDHA=DEVANAGARI LETTER DDHA+SEMANTIC TELUGU TELUGU LETTER DHA=DEVANAGARI LETTER DHA+SEMANTIC TELUGU TELUGU LETTER GA=DEVANAGARI LETTER GA+SEMANTIC TELUGU TELUGU LETTER GHA=DEVANAGARI LETTER GHA+SEMANTIC TELUGU TELUGU LETTER HA=DEVANAGARI LETTER HA+SEMANTIC TELUGU TELUGU LETTER JA=DEVANAGARI LETTER JA+SEMANTIC TELUGU TELUGU LETTER JHA=DEVANAGARI LETTER JHA+SEMANTIC TELUGU TELUGU LETTER KA=DEVANAGARI LETTER KA+SEMANTIC TELUGU TELUGU LETTER KHA=DEVANAGARI LETTER KHA+SEMANTIC TELUGU TELUGU LETTER LA=DEVANAGARI LETTER LA+SEMANTIC TELUGU TELUGU LETTER LLA=DEVANAGARI LETTER LLA+SEMANTIC TELUGU TELUGU LETTER MA=DEVANAGARI LETTER MA+SEMANTIC TELUGU TELUGU LETTER NA=DEVANAGARI LETTER NA+SEMANTIC TELUGU TELUGU LETTER NGA=DEVANAGARI LETTER NGA+SEMANTIC TELUGU TELUGU LETTER NNA=DEVANAGARI LETTER NNA+SEMANTIC TELUGU TELUGU LETTER NYA=DEVANAGARI LETTER NYA+SEMANTIC TELUGU TELUGU LETTER PA=DEVANAGARI LETTER PA+SEMANTIC TELUGU TELUGU LETTER PHA=DEVANAGARI LETTER PHA+SEMANTIC TELUGU TELUGU LETTER RA=DEVANAGARI LETTER RA+SEMANTIC TELUGU TELUGU LETTER RRA=DEVANAGARI LETTER RRA+SEMANTIC TELUGU TELUGU LETTER SA=DEVANAGARI LETTER SA+SEMANTIC TELUGU TELUGU LETTER SHA=DEVANAGARI LETTER SHA+SEMANTIC TELUGU TELUGU LETTER SSA=DEVANAGARI LETTER SSA+SEMANTIC TELUGU TELUGU LETTER TA=DEVANAGARI LETTER TA+SEMANTIC TELUGU TELUGU LETTER THA=DEVANAGARI LETTER THA+SEMANTIC TELUGU TELUGU LETTER TTA=DEVANAGARI LETTER TTA+SEMANTIC TELUGU TELUGU LETTER TTHA=DEVANAGARI LETTER TTHA+SEMANTIC TELUGU TELUGU LETTER VA=DEVANAGARI LETTER VA+SEMANTIC TELUGU TELUGU LETTER VOCALIC L=DEVANAGARI LETTER VOCALIC L+SEMANTIC TELUGU TELUGU LETTER VOCALIC LL=DEVANAGARI LETTER VOCALIC LL+SEMANTIC TELUGU TELUGU LETTER YA=DEVANAGARI LETTER YA+SEMANTIC TELUGU TELUGU SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC TELUGU+POP DIRECTIONAL FORMATTING TELUGU SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC TELUGU+POP DIRECTIONAL FORMATTING TELUGU SIGN VIRAMA=DEVANAGARI SIGN VIRAMA+SEMANTIC TELUGU TELUGU SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC TELUGU+POP DIRECTIONAL FORMATTING TELUGU VOWEL SIGN AA=DEVANAGARI VOWEL SIGN AA+SEMANTIC TELUGU TELUGU VOWEL SIGN AU=DEVANAGARI VOWEL SIGN AU+SEMANTIC TELUGU TELUGU VOWEL SIGN E=DEVANAGARI VOWEL SIGN SHORT E+SEMANTIC TELUGU TELUGU VOWEL SIGN EE=DEVANAGARI VOWEL SIGN E+SEMANTIC TELUGU TELUGU VOWEL SIGN I=DEVANAGARI VOWEL SIGN I+SEMANTIC TELUGU TELUGU VOWEL SIGN II=DEVANAGARI VOWEL SIGN II+SEMANTIC TELUGU TELUGU VOWEL SIGN O=DEVANAGARI VOWEL SIGN SHORT O+SEMANTIC TELUGU TELUGU VOWEL SIGN OO=DEVANAGARI VOWEL SIGN O+SEMANTIC TELUGU TELUGU VOWEL SIGN U=DEVANAGARI VOWEL SIGN U+SEMANTIC TELUGU TELUGU VOWEL SIGN UU=DEVANAGARI VOWEL SIGN UU+SEMANTIC TELUGU TELUGU VOWEL SIGN VOCALIC R=DEVANAGARI VOWEL SIGN VOCALIC R+SEMANTIC TELUGU TELUGU VOWEL SIGN VOCALIC RR=DEVANAGARI VOWEL SIGN VOCALIC RR+SEMANTIC TELUGU
Some accents are present in other parts of Unicode, and (provided the script suggestion is taken seriously) can be represented as follows:
DEVANAGARI ACUTE ACCENT=SEMANTIC ABOVE+START GROUP+ACUTE ACCENT+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI DANDA=VERTICAL LINE+SEMANTIC DEVANAGARI DEVANAGARI DOUBLE DANDA=DOUBLE VERTICAL LINE+SEMANTIC DEVANAGARI DEVANAGARI GRAVE ACCENT=SEMANTIC ABOVE+START GROUP+GRAVE ACCENT+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI SIGN ANUSVARA=SEMANTIC ABOVE+START GROUP+DOT ABOVE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI SIGN CANDRABINDU=SEMANTIC ABOVE+START GROUP+BREVE+SEMANTIC ABOVE+DOT ABOVE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI SIGN NUKTA=SEMANTIC BELOW+START GROUP+DOT ABOVE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI SIGN VISARGA=SEMANTIC AFTER+START GROUP+COLON+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI STRESS SIGN ANUDATTA=SEMANTIC BELOW+START GROUP+MACRON+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING DEVANAGARI STRESS SIGN UDATTA=SEMANTIC ABOVE+START GROUP+MODIFIER LETTER VERTICAL LINE+SEMANTIC DEVANAGARI+POP DIRECTIONAL FORMATTING
There is a similar relationship between the Thai and Lao scripts. Again, we provide both SEMANTIC THAI and SEMANTIC LAO, even though we only use 1 of them, so that automatic transliteration can be done in either direction.
LAO CANCELLATION MARK=THAI CHARACTER THANTHAKHAT+SEMANTIC LAO LAO DIGIT EIGHT=THAI DIGIT EIGHT+SEMANTIC LAO LAO DIGIT FIVE=THAI DIGIT FIVE+SEMANTIC LAO LAO DIGIT FOUR=THAI DIGIT FOUR+SEMANTIC LAO LAO DIGIT NINE=THAI DIGIT NINE+SEMANTIC LAO LAO DIGIT ONE=THAI DIGIT ONE+SEMANTIC LAO LAO DIGIT SEVEN=THAI DIGIT SEVEN+SEMANTIC LAO LAO DIGIT SIX=THAI DIGIT SIX+SEMANTIC LAO LAO DIGIT THREE=THAI DIGIT THREE+SEMANTIC LAO LAO DIGIT TWO=THAI DIGIT TWO+SEMANTIC LAO LAO DIGIT ZERO=THAI DIGIT ZERO+SEMANTIC LAO LAO ELLIPSIS=THAI CHARACTER PAIYANNOI+SEMANTIC LAO LAO KO LA=THAI CHARACTER MAIYAMOK+SEMANTIC LAO LAO LETTER BO=THAI CHARACTER BO BAIMAI+SEMANTIC LAO LAO LETTER CO=THAI CHARACTER CHO CHAN+SEMANTIC LAO LAO LETTER DO=THAI CHARACTER DO DEK+SEMANTIC LAO LAO LETTER FO SUNG=THAI CHARACTER FO FAN+SEMANTIC LAO LAO LETTER FO TAM=THAI CHARACTER FO FA+SEMANTIC LAO LAO LETTER HO SUNG=THAI CHARACTER HO HIP+SEMANTIC LAO LAO LETTER HO TAM=THAI CHARACTER HO NOKHUK+SEMANTIC LAO LAO LETTER KHO SUNG=THAI CHARACTER KHO KHAI+SEMANTIC LAO LAO LETTER KHO TAM=THAI CHARACTER KHO KHWAI+SEMANTIC LAO LAO LETTER KO=THAI CHARACTER KO KAI+SEMANTIC LAO LAO LETTER LO LING=THAI CHARACTER RO RUA+SEMANTIC LAO LAO LETTER LO LOOT=THAI CHARACTER LO LING+SEMANTIC LAO LAO LETTER MO=THAI CHARACTER MO MA+SEMANTIC LAO LAO LETTER NGO=THAI CHARACTER NGO NGU+SEMANTIC LAO LAO LETTER NO=THAI CHARACTER NO NU+SEMANTIC LAO LAO LETTER NYO=THAI CHARACTER YO YING+SEMANTIC LAO LAO LETTER O=THAI CHARACTER O ANG+SEMANTIC LAO LAO LETTER PHO SUNG=THAI CHARACTER PHO PHUNG+SEMANTIC LAO LAO LETTER PHO TAM=THAI CHARACTER PHO PHAN+SEMANTIC LAO LAO LETTER PO=THAI CHARACTER PO PLA+SEMANTIC LAO LAO LETTER SO SUNG=THAI CHARACTER SO SUA+SEMANTIC LAO LAO LETTER SO TAM=THAI CHARACTER CHO CHANG+SEMANTIC LAO LAO LETTER THO SUNG=THAI CHARACTER THO THUNG+SEMANTIC LAO LAO LETTER THO TAM=THAI CHARACTER THO THAHAN+SEMANTIC LAO LAO LETTER TO=THAI CHARACTER TO TAO+SEMANTIC LAO LAO LETTER WO=THAI CHARACTER WO WAEN+SEMANTIC LAO LAO LETTER YO=THAI CHARACTER YO YAK+SEMANTIC LAO LAO NIGGAHITA=THAI CHARACTER NIKHAHIT+SEMANTIC LAO LAO TONE MAI CATAWA=THAI CHARACTER MAI CHATTAWA+SEMANTIC LAO LAO TONE MAI EK=THAI CHARACTER MAI EK+SEMANTIC LAO LAO TONE MAI THO=THAI CHARACTER MAI THO+SEMANTIC LAO LAO TONE MAI TI=THAI CHARACTER MAI TRI+SEMANTIC LAO LAO VOWEL SIGN A=THAI CHARACTER SARA A+SEMANTIC LAO LAO VOWEL SIGN AA=THAI CHARACTER SARA AA+SEMANTIC LAO LAO VOWEL SIGN AI=THAI CHARACTER SARA AI MAIMALAI+SEMANTIC LAO LAO VOWEL SIGN AY=THAI CHARACTER SARA AI MAIMUAN+SEMANTIC LAO LAO VOWEL SIGN E=THAI CHARACTER SARA E+SEMANTIC LAO LAO VOWEL SIGN EI=THAI CHARACTER SARA AE+SEMANTIC LAO LAO VOWEL SIGN I=THAI CHARACTER SARA I+SEMANTIC LAO LAO VOWEL SIGN II=THAI CHARACTER SARA II+SEMANTIC LAO LAO VOWEL SIGN MAI KAN=THAI CHARACTER MAI HAN-AKAT+SEMANTIC LAO LAO VOWEL SIGN O=THAI CHARACTER SARA O+SEMANTIC LAO LAO VOWEL SIGN U=THAI CHARACTER SARA U+SEMANTIC LAO LAO VOWEL SIGN UU=THAI CHARACTER SARA UU+SEMANTIC LAO LAO VOWEL SIGN Y=THAI CHARACTER SARA UE+SEMANTIC LAO LAO VOWEL SIGN YY=THAI CHARACTER SARA UEE+SEMANTIC LAO
Also,
THAI CHARACTER SARA AM=ZERO WIDTH NO-BREAK SPACE+THAI CHARACTER NIKHAHIT+THAI CHARACTER SARA A LAO VOWEL SIGN AM=ZERO WIDTH NO-BREAK SPACE+LAO NIGGAHITA+LAO VOWEL SIGN A
There is lots of structure left in the Hangul Jamo block, even after factoring out the compatibility breakdowns and the glyph variants already treated. These 17 glyphs are encoded twice each (once as CHOSEONG, once as JONGSEONG), but they are visually identical. Maybe they need a helpful ``SEMANTIC´´ marker to say what´s happened---for sorting purposes, maybe?---but I don't know enough to say.
HANGUL JONGSEONG KIYEOK=HANGUL CHOSEONG KIYEOK HANGUL JONGSEONG NIEUN=HANGUL CHOSEONG NIEUN HANGUL JONGSEONG TIKEUT=HANGUL CHOSEONG TIKEUT HANGUL JONGSEONG RIEUL=HANGUL CHOSEONG RIEUL HANGUL JONGSEONG MIEUM=HANGUL CHOSEONG MIEUM HANGUL JONGSEONG PIEUP=HANGUL CHOSEONG PIEUP HANGUL JONGSEONG SIOS=HANGUL CHOSEONG SIOS HANGUL JONGSEONG IEUNG=HANGUL CHOSEONG IEUNG HANGUL JONGSEONG CIEUC=HANGUL CHOSEONG CIEUC HANGUL JONGSEONG CHIEUCH=HANGUL CHOSEONG CHIEUCH HANGUL JONGSEONG KHIEUKH=HANGUL CHOSEONG KHIEUKH HANGUL JONGSEONG THIEUTH=HANGUL CHOSEONG THIEUTH HANGUL JONGSEONG PHIEUPH=HANGUL CHOSEONG PHIEUPH HANGUL JONGSEONG HIEUH=HANGUL CHOSEONG HIEUH HANGUL JONGSEONG YESIEUNG=HANGUL CHOSEONG YESIEUNG HANGUL JONGSEONG YEORINHIEUH=HANGUL CHOSEONG YEORINHIEUH HANGUL JUNGSEONG FILLER=HANGUL CHOSEONG FILLER
In the sequence
LATIN CAPITAL LETTER A
SEMANTIC LIGATURE
LATIN CAPITAL LETTER E
SEMANTIC ITALIC
does the effect of the ``italic´´ include both of the characters, or only one?
We answer all questions of this type by giving a formal syntax of the relationships between the characters.
This syntax is not intended to be prescriptive---any sequence of characters can be considered ``valid´´, in some sense---but clearly, the introduction of characters which affect the others around them leads to some questions over how far the effects travel.
Define a binary character BIN as one of the 6 FRACTION SLASH, SEMANTIC OVERPRINT, SEMANTIC ABOVE, SEMANTIC AFTER, SEMANTIC BEFORE, SEMANTIC BELOW. (These can be given a new combining class---say 999---to indicate their special status.) Define a unary character UN as any of the existing combining characters, recognised by having a combining class > 0 but < 999. Define an opener SG as any of the characters START GROUP, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, LEFT-TO-RIGHT OVERRIDE, RIGHT-TO-LEFT OVERRIDE that start a new level of grouping, and a closer PDF as the character POP DIRECTIONAL FORMATTING. Define a base character BASE as anything else.
Then we can define a list of characters list as follows.
list = empty
| primary list
primary = secondary
| secondary unary-primary
secondary = BASE
| SG list PDF
unary-primary = unary-secondary
| unary-secondary unary-primary
unary-secondary = BIN secondary
| UN
binary-primary = BIN
which means we parse the example above as
list
primary
secondary
BASE
LATIN CAPITAL LETTER A
unary-primary
unary-secondary
BIN
SEMANTIC LIGATURE
secondary
BASE
LATIN CAPITAL LETTER E
unary-primary
unary-secondary
UN
SEMANTIC ITALIC
list
empty
and the answer to the question is that the italic applies to the whole ligature. For a ligature of roman A with italic E, you would write
LATIN CAPITAL LETTER A
SEMANTIC LIGATURE
START GROUP
LATIN CAPITAL LETTER E
SEMANTIC ITALIC
POP DIRECTIONAL FORMATTING
The definitions of unary-primary and binary primary are also useful because we can give this rule for decompositions: the decomposition of a base character must be a primary; of a unary character, a unary-primary; and of a binary character, a binary-primary. (Actually, there are none of the last.) This assures us that decompositions will behave syntactically in the same way regardless of whether they are treated as a single character or a character list.
The reason for having a completely left-associative grammar is to allow a renderer to render ``as much as it has´´ at any time. This is obviously important.
This note considers the 6588 letters and symbols encoded in Unicode 2.1, as well as CJK ideographs, and attempts to find and encode as much structure as possible. The result is a list of characters which can be considered ``primitive´´ in some sense. It turns out that there are only 1419 of these.