Family Tree Notation

Jonathan Coxhead , 2001-09-24

This notation was designed to let me describe what information I have about my family in a convenient textual form that can be easily read and written by human beings, but that can also be parsed by a simple machine grammar and used to generate data files for any of the popular genealogical software that exists.

It is applied in 3 documents: my family tree, a family tree of the Greek gods (where it gets a thorough stress-test), and a generic family tree, which (if you can interpret it) defines all those curious genealogical terms such as 2nd cousin, cousin once removed, ½-cousin, double cousin and stepcousin, as well as other (non cousin!) relationships.

The notation is designed to be compact. Instead of representing a tree in the usual form you see in history books---where it might look like this:

            a
            |
      /-----+-----\
      |     |     |
      b     c     d
            |
         /--+--\
         |     |
         e     f
         |
         |
         g

---instead, we use indentation to convey the same information, like this:

      a
         b
         c
            e
               g
            f
         d

So, each couple’s offspring are listed in a vertical column indented below them, in age order where known, interspersed with other “events” (marriage, divorce and name changes). Sets of twins (triplets, ...) are separated by &. This list of events forms a “history”.

<= signifies marriage (or equivalent) in a male history. This symbol is only used for a socially-approved monogamous liaison between 2 people of different gender---there are other symbols for other variants of union.

=> signifies marriage in a female history.

In a document containing multiple family trees, if both sides of a marriage (‘m <= f’ and ‘f => m’) are listed, children are noted on the male side, in accordance with convention. This should not be interpreted as indicating any support of the patrilineal system on my part, or indeed of any particular system of inheritance at all.

<- signifies a liaison not recognised as “marriage” by society; maybe an “affaire du coeur” or “fling”, or a surrogate motherhood. (Obviously, it all depends on which society you’re talking about. Basically, we’re describing extra-marital sex.) This is the symbol for use in a male history.

-> signifies extra-marital sex in a female history. When using <- and ->, normally the offspring are noted on the woman’s side---the opposite convention from <= and =>. This convention exists mainly for reasons of verifiability. (The other reason is one word long: ‘Zeus’.)

The symbols <=, =>, <- and -> each establish a default parent. In the history of a person, their offspring are interspersed with marriages and other liaisons. The parents of a given child are the person being described and the person most recently named in a <=, =>, <- or -> line. The symbol # signifies divorce: the parents of any subsequent children are the most recent liaison other than marriage. Also, the symbol ~ marks the end of a liaison: the parents of subsequent children are the participants in the most recent marriage, ignoring other intervening liaisons.

So, in the tree

      a <= b
         <- c
         d
         e

the parents of d and e are a and c. (This may have further social implications for them, e g, illegitimacy.) But in

      a <= b
         <- c
         d
         ~
         e

the parents of d are the same, but e has parents a and b (because they are married, and the liaison with c is over).

Divorce is not noted if immediately followed by remarriage. The end of a liaison is not noted if immediately followed by a new liaison, or if it would be the last item in a history. (This assumes that it is not legally possible to be married to 2 people at the same time: to extend the notation to polygamous and/or polyandrous societies needs new symbols <+ with meaning ‘takes new wife in addition to existing wives’, +> meaning ‘takes new husband in addition to existing husbands’, and an extension to # to allow it to be followed by the name of the person being divorced. If it’s necessary to distinguish the genetic parents of different children within a polygamous marriage, <- and -> can be used to set the default parent to the correct spouse before the child is noted:

      a <= b
         <+ c
         <- b
         d

Here, a marries b then also c, but the first child d is by b. <+ and +> both set the default parent.)

If a child is noted when no partner is available (for example if the only parent is a virgin, or a divorcé with no lover, or the default parent is the same sex as the other parent), the child is miraculous. This situation is basically a syntax error, in a mundane family tree.

= followed by a name marks a name change to the given name, or, if no name is given, a name change to (or retention of) the name given at birth. Where a wife changes her surname to her husband’s, this is not marked---rather, if she didn’t, it would be marked with =. Again, this is a purely conventional choice for concision, rather than any reflection of my belief of “the right thing to do”.

When followed by a name, ~ has a different meaning: it signifies a same-sex union. We don’t distinguish between social and non-social same-sex union. Such a union is not normally productive, but ~ does establish a new default parent, so the notation doesn’t rule it out. (This is primarily for mythological reasons.) In real life, presumably some sort of surrogacy would be used:

      a ~ b
         <- c
         d

could be used to show the birth of child d to same-sex parents a, b with help from surrogate mother c.

If the same name occurs in different places, it is always assumed to refer to the same person. This is how we cross-reference from 1 family tree to another. If 2 different people have the same name, they are distinguished from each other by a suffix preceded by the symbol ^. The special name ______ indicates a generic unknown person, not the same person wherever it occurs. But ______^1, ______^2 etc do each represent the same person when they occur in different places.

Nicknames or use-names are enclosed in “”, pronunciations in //. If a name other than the first is used, it’s marked with *. Comments are enclosed in (). + marks an early death.

Formal Syntax

In the formal syntax, we use next to separate items in a list, and in and out to signify nesting. There are 2 choices for the concrete representation:

with layout, as above:
next is end of line, in is increase indentation, out is decrease indentation;
or with characters:
next is ‘;’, in is ‘{’, out is ‘}’.

The example tree above has already been shown using layout. When represented with characters, it looks like this

      a
      {
         b;
         c
         {
            e
            {
               g
            };
            f
         };
         d
      }

Here, the layout is acting only as a reading aid, and the tree could equivalently be written as

      a {b; c {e {g}; f}; d}

and the other trees that have been used as examples would appear as

      a <= b {<- c; d; e}
      a <= b {<- c; d; ~; e}
      a <= b {<+ c; <- b; d}
      a ~ b {<- c; d}

a <= b {...} should be thought of as an abbreviation for a {<= b; ...}, and similarly for other non-birth events. (To be precise, the tree ‘person, non-birth event’ is equivalent to the tree ‘person, in, non-birth event, out’; and the tree ‘person, non-birth event, in, event list, out’ is equivalent to ‘person, in, non-birth event, next, event list, out’.)

The syntax shows how to represent adoption: when a parent or couple adopts a child, that child’s name is preceded with a << in the history of the parent. The rules described above are used to determine its new legal parents (which would not include the spouse of the adopting parent, if a liaison is the default parent). The child may have its own birth noted in the history of its natural parents: the last entry in that history should indicate that it has been adopted, by using the symbol >> followed by the name of the new parent (conventionally the father, as noted above) under whose history the adoption appears. These trees are from Irish legend:

      Deichtine => Sualdam mac Roich
         -> Lugh mac Ethnenn
         Setanta
            >> Amairgin
         ~

      Amairgin <= Finnchoem
         Conall Cearnach
         << Setanta = Cú Chulainn
            <- Aoife
            Conlaoch
            <= Emer

Here, Setanta is adopted as the 2nd son of Amairgin and Finnchoem. His natural parents are Deichtine and Lugh mac Ethnenn, though Deichtine was married to Sualdam mac Roich at the time. After adoption, he changed his name to Cú Chulainn. He had 1 son by Aoife, and then married Emer.

It’s semantically necessary to keep track of 3 default parents: the most recently mentioned, the most recently mentioned social, and the most recently mentioned non-social. (TODO Put this is the 2-level grammar.)

      (Grammar)
      start:           tree.
      tree:            person, non-birth event option, history option.
      person:          entry, pronunciation option, nickname option,
                             dates option, title option, affiliation option.
      history:         in, event list, out.
      event:           birth;
                       adopts; (parent adopts a child or children)
                       non-birth event.
      non-birth event: marriage;
                       divorce;
                       liaison;
                       separation;
                       same-sex union;
                       adopted; (child is adopted by a parent)
                       name change.
      birth:           group. (“group” are twins, triplets etc)
      adopts:          ‘<<’, group. (parent adopts the children in “group”)
      group:           tree, more option.
      more:            ‘&’, group.
      marriage:        ‘<=’, entry, date option;
                       ‘=>’, entry, date option;
                       ‘<+’, entry, date option;
                       ‘+>’, entry, date option.
                       (“date” is year of marriage)
      divorce:         ‘#’, entry option, date option.
                       (“entry” must be given if there is more than one
                       spouse. “date” is year divorced)
      liaison:         ‘<-’, entry;
                       ‘->’, entry.
      separation:      ‘~’.
      same-sex union:  ‘~’, entry, date option.
      adopted:         ‘>>’, entry, date option. (child is adopted by “entry”)
      name change:     ‘=’, entry, date option.
                       (“date” is year name was changed)
      pronunciation:   ‘/’, proper word sequence, ‘/’.
      nickname:        ‘\‘’, proper word sequence, ‘\”, pronunciation option.
      dates:           number; number, ‘--’, number; ‘+’.
      date:            number.
      title:           ‘,’, proper word sequence, pronunciation option,
                             dates option.
      affiliation:     ‘[’, proper word sequence, ‘]’.

      (Metarules)
      THING list::     THING; THING, next, THING list.
      THING option::   THING; EMPTY.
      THING sequence:: THING; THING, ws, THING sequence.
      EMPTY::.
      STYLE::          superscript; EMPTY.

      (Lexical syntax)
      (TODO add ws where appropriate, improve lexeme classification (strings?))
      comment:  ‘(’, anything option, ‘)’.
      space:    ‘ ’; comment.
      ws:       space, ws option.
      letter:   ‘a’; ‘à’; ‘á’; ‘â’; ‘ã’; ‘ä’; ‘å’; ‘b’; ‘c’; ‘ç’; ‘d’; ‘e’; ‘è’;
                ‘é’; ‘ê’; ‘ë’; ‘f’; ‘g’; ‘h’; ‘i’; ‘ì’; ‘í’; ‘î’; ‘ï’; ‘j’; ‘k’;
                ‘l’; ‘m’; ‘n’; ‘ñ’; ‘o’; ‘ò’; ‘ó’; ‘ô’; ‘õ’; ‘ö’; ‘p’; ‘q’; ‘r’;
                ‘s’; ‘ß’; ‘t’; ‘u’; ‘ù’; ‘ú’; ‘û’; ‘ü’; ‘v’; ‘w’; ‘x’; ‘y’; ‘ý’;
                ‘ÿ’; ‘z’; ‘æ’; ‘ð’; ‘ø’; ‘þ’; ‘-’; ‘'’;
                ‘A’; ‘À’; ‘Á’; ‘Â’; ‘Ã’; ‘Ä’; ‘Å’; ‘B’; ‘C’; ‘Ç’; ‘D’; ‘E’; ‘È’;
                ‘É’; ‘Ê’; ‘Ë’; ‘F’; ‘G’; ‘H’; ‘I’; ‘Ì’; ‘Í’; ‘Î’; ‘Ï’; ‘J’; ‘K’;
                ‘L’; ‘M’; ‘N’; ‘Ñ’; ‘O’; ‘Ò’; ‘Ó’; ‘Ô’; ‘Õ’; ‘Ö’; ‘P’; ‘Q’; ‘R’;
                ‘S’; ‘T’; ‘U’; ‘Ù’; ‘Ú’; ‘Û’; ‘Ü’; ‘V’; ‘W’; ‘X’; ‘Y’; ‘Ý’; ‘Z’;
                ‘Æ’; ‘Ð’; ‘Ø’; ‘Þ’.
      digit:    ‘0’; ‘1’; ‘2’; ‘3’; ‘4’; ‘5’; ‘6’; ‘7’; ‘8’; ‘9’.
      superscript digit: ‘¹’; ‘²’; ‘³’.
      glyph:    letter; STYLE digit; space;
                ‘!’; ‘#’; ‘$’; ‘%’; ‘&’; ‘*’; ‘+’; ‘,’; ‘.’; ‘/’; ‘:’;
                ‘<’; ‘=’; ‘>’; ‘?’; ‘@’; ‘^’; ‘_’; ‘|’; ‘~’; ‘¡’; ‘¢’; ‘£’;
                ‘¤’; ‘¥’; ‘¦’; ‘§’; ‘¨’; ‘©’; ‘ª’; ‘«’; ‘¬’; ‘®’; ‘¯’; ‘°’; ‘±’;
                ‘\’; ‘µ’; ‘¶’; ‘·’; ‘¸’; ‘º’; ‘»’; ‘¼’; ‘½’; ‘¾’; ‘¿’; ‘×’; ‘÷’.
      proper word:  letter, proper word option.
      word:         proper word; ‘______’.
      STYLE number: STYLE digit, STYLE number option.
      proper entry: proper word;
                    word, suffix;
                    word, ws, word sequence, suffix option. (A ‘proper entry’ is
                          the unique identifier of a person.)
      entry:        proper entry; ‘______’.
      suffix:       ‘^’, number; superscript number. (‘^12’ is the same as ‘¹²’)
      anything:     glyph, anything option.