Some More HTML

Recap

HTML represents the components of a document by describing them, using "tags"
A document has two main parts, a <HEAD> containing `meta-data' (information about the data or page itself)
and a <BODY> containing the actual content
The layout of the HTML has no relation to the way that the web page will be set out in the browser: you could put all the HTML on one long single line - or put every word on a separate one
You should structure your source code to make it easy to understand and follow what is going on
`Physical mark-up' describes how something should look; `logical mark-up' describes what it means
You should by now have worked through the introductory topics covered in the NCSA guide, such as basic markup, lists, linking, images and tables

Absolute and Relative URLs

We have already looked at URLs, and seen that they take the form protocol://machine.domain:port/path/file#anchor. For example, one of your pages might have the URL http://www.brunel.ac.uk/~eg00xwp/page1.html. This is an example of an `absolute' URL - it unambiguously indicates a unique document on the whole planet.

Of course, giving URLs in full every time you make a link on your site is both unwieldy and makes them difficult to change later on. You may have noticed that when we introduced HTML linking we used a much shorter form; instead of the full <A HREF="http://www.brunel.ac.uk/~eg00xwp/page2.html"> we simply had <A HREF="page2.html">. The latter is a `relative' URL - it explains to the browser how to amend the current URL to get the new one:

if the relative URL starts with a single "/" then it is assumed that it gives the path and file name on the same machine as the current document. For example, if Xena wants to make a link to Buffy's page then she can use just <A HREF="/~eg00bvs/page1.html">.
otherwise, the relative URL is assumed to be a file to be found at the same path (i.e. in the same directory or "folder") as the current document, as with the <A HREF="page2.html"> example. As the new URL is just tacked on to the old path, we can include subdirectories (sub-folders) in relative URLs: <A HREF="images/image1.gif"> might be the same as <A HREF="http://www.brunel.ac.uk/~eg00bvs/images/image1.gif">. This image would be stored on the Unix filesystem at ~eg00bvs/webhome/images/image1.gif.

Relative URLs don't uniquely define a single document, but instead indicate its location relative to the current page. Consider what happens if Xena "borrows" Buffy's page1.html and puts it in her own directory: <IMG SRC="image1.gif"> is now equivalent to <IMG SRC="/~eg00xwp/image1.gif"> instead of <IMG SRC="/~eg00bvs/image1.gif">. This is why if you save a single HTML file off the web you usually lose all the images. On the other hand, as your webhome directory becomes cluttered through the year you may want to tidy it up by putting various projects in separate subfolders: if you've used relative URLS you can easily move the files around (or on to a floppy) without having to go and change all the links.

There's no right answer to which you should use: this depends on the application (hint - for now the short form is probably easier). There's also an added complication: the BASE HREF tag.

The `BASE HREF` Tag

Wouldn't it be great if you could define the URL used to interpret relative URLs ? Well, now you can:

<BASE HREF="http://www.brunel.ac.uk/~eg00xwp/">

The BASE HREF tag goes inside the <HEAD> section of an HTML page. All relative tags are resolved using this path, rather than the actual URL of the current document. The trailing "/" is needed to avoid confusing some browsers.

Unfortunately, I find this tag tends to combine the disadvantages of both relative and absolute URLs, rather than their advantages. In practice most of the donkey work is now done by website creation packages anyway.

More on Using Images

So far we've only displayed images. We can also turn an image into a clickable link, simply by putting it inside the A HREF tag:

<A HREF="page2.html"><IMG SRC="image1.gif"></A>

In fact, everything between the <A HREF> and the </A> becomes part of the same link: you can make the entire webpage one long link if you want to.

Clickable images are often used to connect to a larger version of the same image. To do this we use the resize feature of Photoshop to create a small version (of the relevant area) of the image, called a `thumbnail', to use on the page. If we don't need to display a caption or other text with the large image we can make a link directly to the large image itself (a `stand-alone image' that has the browser window all to itself) instead of embedding it in another HTML file:

<A HREF="image1big.gif" TITLE="A BIG picture of my favourite digit"><IMG SRC="image1thumb.gif"></A>

You have already seen the WIDTH and HEIGHT attributes used with the IMG tag. You can also use these to display "thumbnails", but remember that you then make the viewer download a great big file only to see a piddly graphic. These attributes have a more important use: the browser can only lay out the text of your webpage once it knows how big any images are. If it has to wait for the image to be delivered from a slow server then the viewer will be left looking at a blank screen. Including WIDTH and HEIGHT in IMG tags lets the browser start displaying the text as soon as it arrives, leaving gaps to put the images in later. [Tables have a similar problem: the browser can't work out how to display a table until it receives all the data, so making your whole page into one large table can be a bad idea on a slow server]

The other attribute that should always be included in IMG is ALT, which lets you add a text description of your image. This is an all round hero: not only does it mean that people who can't see images can know what's going on, but it also allows the browser to display something useful while waiting for the image to arrive, and if the image can't be downloaded for some reason (a "broken link") then again the viewer at least gets some useful information, AND on a complex page it can actually help you find the link that doesn't work...

<IMG SRC="image1.gif" WIDTH="400" HEIGHT="300" ALT="A picture of my thumb">

Comments

A web browser will completely ignore anything in a stream of HTML after a . This lets you embed notes of how and why you did something within the HTML code itself without it appearing in the browser, e.g.

<TABLE> <CAPTION>Table 1: monthly sales figures for this year</CAPTION>  ... </TABLE>

Remember that comments are only ignored by the browser, not the server - they are still sent to the viewer's computer and are visible using the View Source feature.

Character Entities

Posh words for symbols (such as a © ). Since the < and > symbols are used to denote the tags when coding HTML, if you need to actually display them you can't just stick them in the text without confusing the browser. Instead of "<" you must type < and instead of a ">" you must type >. As you can see, the symbol is just represented by an ampersand (the "and symbol" &) immediately followed by the symbol name and a semicolon. Since a & now indicates a symbol, & must be used for an ampersand, and " should be used to display quotation marks (i.e. ").

Another useful entity is the non-breaking space  . This is displayed on the screen as a normal space, but behaves differently: text joined by a non-breaking space won't be split by the browser at the end of each line, while a series of  s can be used to push text apart.

There is a huge list of other entities available, both symbols (©: ©) and accented characters (&Eacute, é: É, é). A useful list is included in the HTML beginner's guide (Web edition), and continually expanding definitive lists available on the Web. Note that character entities are CASE SENSITIVE!

Brief Introduction to Cookies

This is something of an aside, as we won't actually cover creating and reading cookies in this module. The disappointingly inedible cookies in question provide a mechanism by which the web server can associate information with a particular "client" - you may recall them being the subject of heated debate a while ago. Cookies are simply pieces of information that the server asks your browser to keep on your computer. From then on, whenever the browser requests a page from a given set of servers it automatically includes this stored data with the request. Potentially this can be used to track an individual's use of the web.

First lets look at the mechanism in slightly more detail. When your browser requests a certain page, the web server will send the page back accompanied by a request for your browser to store some data as one or more cookie. A `cookie' has three main parts: it has a name, so that the server knows what to do with it when it gets it back; it has a value or content, which is the actual data to be stored, in text form; and it has some control information, notably a domain to which this particular cookie is to be sent.

If the browser accepts the cookie (and it may be set up not to or refused by the (human) user in person) then it will store it somewhere on the user's computer, such as in a file called cookies.txt. If you've used the centrally provided version of Netscape Navigator (v 7), then you can see any cookies you've picked up in H:\NS7\gobbledygook\cookies.txt. Whenever the browser then requests a page from any server in the specified domain it then includes the cookie (the name and data) in with that request.

Clearly cookies are a very useful mechanism - they make "shopping carts" for on-line shopping much easier to implement, for example. They also cannot access any information that isn't in the cookie file. They can however be used in ways that people find disturbing as they have implications for their privacy. Some simple rules for using cookies:

Avoid them where you don't need them
Keep them small - make them a key to your database rather the data itself
Be wary of storing "interesting" data as plain text - consider a simple encryption scheme
Keep them to yourself - within your own domain
Remember that the browser may not accept them, or they may not apply to the current user - add a validity check

Simple Frames

Introduction

Frames provide a mechanism for splitting up the browser window into a series of separate areas. The contents of each area, or `frame', are themselves an HTML document that can be changed independently of the others. The original idea was to allow those parts of a web site that were the same for a large number of webpages to be kept separate, so they only needed to be downloaded once, while only the material that actually changed would need to be downloaded as the browser surfed from page to page.

The main flaw with frames is precisely that the content of each frame is a separate, complete document; this means that the whole page as seen cannot be referenced with a single URL, making it impossible for the user to bookmark or e-mail to friend, whereas if one of the sub-documents is found directly (say from a search engine) then the user will see it without any of the intended contextual, or even navigational information. These problems can be minimised by careful site design.

The second difficulty dates right back to when they were originally introduced: a lot of sites appeared that might have looked fine using Navigator, but only gave a blank screen in any other browser (OK, so a few bothered to put up a message like "You can't see this site as it uses frames - please download Netscape"). Although this was basically due to the conceit and laziness of the site authors, it created a lot of bad feeling that rubbed off on the use of frames themselves. Of course, non-frames capable browsers are still extant (notably Lynx and some TV set-top boxes) so you'll need to cater for them, which generally implies a second, no-frames version of part of your site, and of course this also has to be kept up to date and synchronised with the frames version.

The last problem has been mostly overcome: some older browser versions didn't properly integrate the frames into the history mechanism, so that you couldn't properly undo and redo surfing through framed pages, but these have been largely fixed in current versions. Again, this is a problem that is easily exacerbated through poor design.

Basic Frames

A webpage constructed using frames simply consists of a set of HTML documents each constrained to only use a certain part of the overall browser window. The documents themselves are mostly just standard HTML, similar to that which we have been discussing so far. The mechanism by which they are allocated frames is a new type of web page, the `frameset page', which doesn't actually contain any concrete information itself but just provides formatting information to the browser:

<HTML> <FRAMESET COLS="80,25%,*"> <FRAME SRC="page1.html" NAME="first"> <FRAME SRC="page2.html" NAME="second"> <FRAME SRC="page3.html" NAME="third"> </FRAMESET> </HTML>

This example will divide the screen into three columns, and display page1.html in the left-hand one, page2.html in the middle and page3.html on the right. The work is mostly done by the FRAMESET tag: COLS indicates that it should divide the screen into vertical columns (ROWS for horizontal rows) and there then follow three different ways of specifying the desired column width (or row depth - start from the top of the browser window). We can either specify the size in pixels, which is good for including images, or as a percentage of the available space, which allows the layout to scale as the browser window is resized. The "*" just stands for "all the remaining space". Having set up the frames we need to display something in each in turn. The operation of the FRAME tag should be obvious: SRC indicates the URL of an object to be displayed in the frame (if the object is bigger than the allowed space the browser will automatically provide scroll bars) and NAME allows us to assign a unique name to each frame (we'll see how to use this later).

Since a FRAMESET can only use one of ROWS or COLS, we must set up more complex arrangements of frames by nesting FRAMESET tags. If we want page2.html and page3.html displayed next to each other in a thin strip below page1.html we could use

<HTML> <FRAMESET ROWS="*,50"> <FRAME SRC="page1.html" NAME="first"> <FRAMESET COLS="50%,50%"> <FRAME SRC="page2.html" NAME="second"> <FRAME SRC="page3.html" NAME="third"> </FRAMESET> </FRAMESET> </HTML>

Now that we can set up any desired static set of frames all we need is a way to manipulate the contents. This is done through a simple extension to the A HREF tag: in any of the subdocuments we can add the TARGET attribute to a hyperlink to have the new page displayed in the frame named as the TARGET. For example, if page1.html includes

<A HREF="page2.html" TARGET="third">Snap!</A>

then clicking on "Snap!" will replace page3.html with page2.html in the frame "third". As well as the frame names defined in the frameset page, there are some standard frames pre-defined (they begin with an underscore):

TARGET="_SELF" makes the new document load into the same frame as the current document (i.e. that containing the link) - this is the default behaviour
TARGET="_PARENT" makes the new document load into the whole area covered by the current FRAMESET, getting rid of any other "sub-frames"
TARGET="_TOP" makes the new document load into the whole browser window, removing all frames
TARGET="_BLANK" makes the new document load into a blank window (i.e. a new browser window). The frames remain unaltered in the old window.

Note that frame names are case sensitive.

So now we can make Netscape do also sorts of wild and wacky things, but what happens if we open the frameset page above in a non-frames capable browser? Sure enough, it's a blank screen. The NOFRAMES tag should be used to help the users of such browsers:

<HTML> <FRAMESET ROWS="*,50"> <FRAME SRC="page1.html" NAME="first"> <FRAMESET COLS="50%,50%"> <FRAME SRC="page2.html" NAME="second"> <FRAME SRC="page3.html" NAME="third"> </FRAMESET> </FRAMESET> <NOFRAMES> You can't see this site as it uses frames - please download Netscape... Ow! Stop hitting me! </NOFRAMES> </HTML>

Frames capable browsers simply ignore anything enclosed by the NOFRAMES tags, while other browsers just display this material - in this example just a curt message. In fact the NOFRAMES tag can include the BODY tag as used in normal pages, and hence pretty well anything else you might want to do, though in practice there rarely seems to be little more than a link to the no-frames version of the site.

<HTML> <FRAMESET ROWS="*,50"> ... </FRAMESET> <NOFRAMES> <BODY> <H1>Welcome to a Frameless Site</H1> ... </BODY> </NOFRAMES> </HTML>

References

The NCSA Beginner's Guide to HTML: http://archive.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimerAll.html
IndexDOT Html at http://www.blooberry.com/indexdot/ is a convenient reference resource to all the HTML tags.
Netscape's preliminary specification for cookies can be found at http://www.netscape.com/newsref/std/cookie_spec.html
The World-Wide Web Consortium (W3C) at http://www.w3.org/ is the definitive source of information about HTML and other Web issues such as accessibility for users with disabilities. It can be a battle finding the simple material in amongst the heavyweight technical discussion, though.
S. Spainhour and R. Eckstein: "Webmaster in a Nutshell" 3^rd Edition, O'Reilly ISBN: 0 596 00357 9 (2002)
A reference text covering HTML, CSS, JavaScript and server-side issues.

J.J. Nebrensky 18/01/2006

Back

Some More HTML

Recap

Absolute and Relative URLs

The `BASE HREF` Tag

More on Links

Linking to Document Subsections

Adding Titles to Links

More on Using Images

Comments

Character Entities

Brief Introduction to Cookies

Simple Frames

Introduction

Basic Frames

References

Some More HTML

Recap

Absolute and Relative URLs

The BASE HREF Tag

More on Links

Linking to Document Subsections

Adding Titles to Links

More on Using Images

Comments

Character Entities

Brief Introduction to Cookies

Simple Frames

Introduction

Basic Frames

References

The `BASE HREF` Tag