H T M L is a markup language. It is not intended to convey details of specific layout, though it is often used in an inappropriate way. In other words, it´s not WYSIWYG, but rather WYSINWAES (``what you see is not what anyone else sees´´ :-).
The lowest level of a page is the characters from which it is made up. Pages are written in a character set called Latin-1. This includes all the ordinary characters on your keyboard, plus lots of characters that are harder to generate. Here is a complete table: the easiest way to use the second half of the table is probably going to be for you to keep this file somewhere safe, and cut-and-paste the characters from it. If the characters don´t match their descriptions, then that´s because your mail software or editor is displaying this file using the wrong character set---you might be able to change this somewhere, in a system setup file, or in the editor options settings.
Characters Brief descriptions
---------- ----- ------------
! " # $ % & ' sp excl quot hash dollar percent amp squot
( ) * + , - . / lparen rparen star plus comma hyphen stop slash
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
8 9 : ; < = > ? 8 9 colon semi lt eq gt quest
@ A B C D E F G at A B C D E F G
H I J K L M N O H I J K L M N O
P Q R S T U V W P Q R S T U V W
X Y Z [ \ ] ^ _ X Y Z lbrack backslash rbrack circ lowline
` a b c d e f g grave a b c d e f g
h i j k l m n o h i j k l m n o
p q r s t u v w p q r s t u v w
x y z { | } ~ x y z lbrace vbar rbrace tilde
  ¡ ¢ £ ¤ ¥ ¦ § nbsp iexcl cent pound curren yen brvbar sect
¨ © ª « ¬ ® ¯ uml copy ordf laquo not shy reg macr
° ± ² ³ ´ µ ¶ · deg plusmn sup2 sup3 acute micro para middot
¸ ¹ º » ¼ ½ ¾ ¿ cedil sup1 ordm raquo frac14 frac12 frac34 iquest
À Á Â Ã Ä Å Æ Ç Agrave Aacute Acirc Atilde Auml Aring AElig Ccedil
È É Ê Ë Ì Í Î Ï Egrave Eacute Ecirc Euml Igrave Iacute Icirc Iuml
Ð Ñ Ò Ó Ô Õ Ö × ETH Ntilde Ograve Oacute Ocirc Otilde Ouml times
Ø Ù Ú Û Ü Ý Þ ß Oslash Ugrave Uacute Ucirc Yacute THORN szlig
à á â ã ä å æ ç agrave aacute acirc atilde auml aring aelig ccedil
è é ê ë ì í î ï egrave eacute ecirc euml igrave iacute icirc iuml
ð ñ ò ó ô õ ö ÷ eth ntilde ograve oacute ocirc otilde ouml divide
ø ù ú û ü ý þ ÿ oslash ugrave uacute ucirc yacute thorn yuml
Also, multiple spaces and line breaks in your source file are ignored: 2 spaces are as good as 1, and the same for line breaks. There are other ways to do layout.
So far so good.
The next problem is that some characters have special meanings:
| < | starts a things called a tag |
| > | closes it |
| & | starts a thing called a character entity |
| " | is used for quoting values (inside a tag---not special outside) |
This means that if you want to put these characters in your page, you have to refer to them using their character entity names:
| < | < |
| > | > |
| & | & |
| " | " |
If these are immediately followed by a letter or digit, they must be separated by a semicolon. So to put `a<b´ in your page, you´d write a<b. There is also a character   (``non-breaking space´´) which looks like a space, but doesn´t allow line breaks.
You might think that a markup language designed for world-wide interchange of documents would supply all the characters in common use in written English. In fact, this is not the case: H T M L has no way of specifying quotation marks (either single or double) or the various kinds of dashes used to write English text---there are 3: the hyphen (used to make compound words), the en dash (used in numeric ranges 1--5 and to link names of people in joint authorship: the Coxhead--Duckhawk Y-Corp page), and the em dash, or parenthetical dash. There is also the minus sign, if you are going to have to write formulæ, which looks different again. In any book you might see the world over, these would all look different, but in H T M L there is just 1 character, hyphen-minus (-) to stand in for all 3 (or 4).
So this is why I use grave accents (`) for open quotation marks, and acute accents (´) for close quotation marks and apostrophes, and pairs of the same for double quotation marks. I think this looks a lot nicer than the single quote character (') for both. (Don´t try to do this in e-mail or a word-processor though---the character sets are (probably) different.) I also use the ``TEX´´ convention of a single hyphen-minus character (-) for a hyphen, 2 (--) for an en dash, and 3 (---) for an em dash.
When you´ve got the text in there, you´re going to need to do some markup. There are 2 levels of markup: inline and block-level. Inline markup controls running text, like font changes, size changes, and anchors or links (very important, those!). Block-level markup controls the structure of the document as a whole.
All markup (with a few exceptions) consists of tags in pairs: an opening tag, and a closing tag. Tags are enclosed in angle brackets. For example, the opening tag for emphasised text (rendered in italics by a browser) is <em>, and the closing one is </em> (they always follow this pattern), so to get the word `forbidden´ in italics you would use <em>forbidden</em>.
Here´s a shortened list of inline markup tags:
| tt | typewriter text (often used for e-mail addresses) |
| u | underline |
| s | |
| big | larger font |
| small | smaller font |
| em | emphasis |
| strong | strong emphasis |
| a | anchor |
| sub | subscript |
| sup | superscript |
These matching pairs must nest properly: it´s not fair trying to go <big><em>large italics</big></em>, because you´ve tried to ``close the brackets´´ in the wrong order.
There are also 2 inline tags that do not have corresponding closing tags:
| img | image |
| br | line break |
br is self-explanatory, but img requires a bit more explanation, as does a.
Some tags have attributes, and the attributes have values. The set of attributes that a tag has is fixed. The A tag has attributes
| name | names a location in the document |
| href | specifies the page to fetch when the anchor is selected |
The way to specify the value of the attribute is inside the angle brackets of the tag, in the form `attribute = value´. Here is a <a href = "http://www.doves.demon.co.uk/ycorp">link</a> to the Champions page. (I assume you know what a U R L looks like. You might not know that you can specify a jump to a place within a page by appending a name after a hash mark, as in http://www.doves.demon.co.uk/ycorp#david. U R L´s are case-sensitive, unlike almost everything else in H T M L.) You also use A tags for local links (links within the same page) by marking the destination with a name attribute: <a name = david>destination</a> and the link to it with an href and a hash mark: <a href = "#david">jump to destination</a>. Also, you can miss the host name (//www.doves.demon.co.uk) out of the href if it´s the host the page is on, and the directory (/ycorp) if it´s the same one that the page is in. Finally, if you just give a directory name (as in /ycorp), the server will look for a page called index.html in that directory.
If a value consists of just letters, digits, hyphens and full stops, you don´t need the quotation marks; otherwise, you do.
Nearly done with inline tags ... img is used to put images in a page. It has attributes
| src | U R L of image to embed |
| alt | description for text only browsers (it´s kind to include this) |
| align | top, middle, bottom, left or right |
| height | suggested height in pixels |
| width | suggested width in pixels |
If you specify a height or width that´s different from the real height or width of the image you´re including, browsers will scale it to fit. The picture of Y-Corporation on our page is done by <img width = 438 height = 617 src = ycorp.jpg alt = Y-Corporation>. The align attribute is a bit funny. If you specify top, middle or bottom, you get an ``inline image´´, layed out rather like a big character, aligned with the line of text it´s on. If you specify left or right, you get a ``floating image´´, and the text is layed out down by the side of it. I´ve used both of these in the Y-Corporation page.
br specifies a line break. It has an attribute clear with possible values left, right or both which you can use to space down to below a left or right aligned image; otherwise, it just starts a new line.
So, all we need now is the block-level markup. Here´s a list of tags:
| p | paragraph |
| h1 | highest-level heading |
| h2 | ... |
| h3 | ... |
| h4 | ... |
| h5 | ... |
| h6 | lowest-level heading |
| ul | unordered list |
| ol | ordered list |
| pre | preformatted text |
| dl | definition list |
| table | table |
There´s also
| hr | horizontal rule |
which doesn´t have an end tag. Most of the rest have an align attribute, which can have values left, center or right (you have to spell `center´ like that, or it doesn´t work :-( ).
Hopefully, p and h1--6 are obvious. Most text in your page will be in one of these. pre is used for preformatted text: it causes a change to a fixed-width font (just like tt), but it also causes spaces and line breaks to be preserved. This lets you line up columns in a fairly basic sort of way. It is largely superseded by the table tag, but still useful for some things (e g, poetry, programming languages).
ul and ol are basically the same (one displays a list of numbered items, the other is bulleted). You enclose each list item separately inside li tags, so a whole list might look like this:
<ol><li>Item 1</li><li>Item 2</li><li>Item 3</li></ol>
li doesn´t mean anything outside ul or ol.
dl is similar, but for ``definition lists.´´ You enclose the terms being defined in dt, and the definitions in dd:
<dl>
<dt>Brick</dt>
<dd>A character that uses strength</dd>
<dt>Martial Artist</dt>
<dd>A character that uses skill & dexterity</dd>
<dt>Jack Craft</dt>
<dd>A character that just runs away</dd>
</dl>
The layout that you use in the source file is completely irrelevant, but you can use it to make the structure obvious, as in this example.
Lastly, table. A table consists of a series of rows, tr, each of which is composed of a list of either table data td or table headers th. (The th are displayed in bold or something similar.) You can also throw in a caption:
<table>
<caption>The unit matrix</caption>
<tr><td>1·0</td><td>0·0</td></tr>
<tr><td>0·0</td><td>1·0</td></tr>
</table>
There are various refinements to do with cell padding, cell spacing, border width and alignment, but now you know enough to experiment for yourself!
All the text that makes up the document should be enclosed in body tags. Then, a head section should be added, before the body. This normally just includes the title, enclosed in title tags (no markup is allowed in the title!). Everything so far is then included in html tags, and finally a ``document type definition´´ is added. This says which version of H T M L you are using, and for the version described here (version 4·0), it should be
<!doctype html public "-//W3C//DTD HTML 4.0//EN">
So, the whole thing is like this:
<!doctype html public "-//W3C//DTD HTML 4.0//EN">
<html>
<head>
<title>...</title>
</head>
<body>
...
</body>
</html>
The body tag has attributes as follows:
| lang | language of the document---English is `en´ |
| bgcolor | background colour (note spelling!) |
| text | text colour |
| link | colour for links |
| vlink | colour for visited links |
| alink | colour for selected links |
| background | U R L for an image to tile the background with |
In fact, you can miss out a lot of these extras, as I do: the <html>...</html> is not needed, because it´s obvious the whole thing is H T M L, and the <head>...</head> is not needed, because title is only valid inside a head, so the browser can work it out. The body tag is useful, though, because it lets you specify the colour scheme you prefer. (If you specify any colour, specify them all, or else you can easily end up with a black-on-black page or something equally silly.)
Those are the parts of H T M L that I use. There are a lot more to do with embedded sound and pictures, Java programmes, dynamic pages, style sheets, frames and lots more junk, but for ordinary Web publishing, that´s all you need!