<HEAD>
containing
`meta-data' (information about the data or page
itself)<BODY>
containing the actual contentWe have already looked at URLs, and seen that they take the
form protocol://machine.domain:port/path/file#anchor
. For example,
one of your pages might have the URL http://www.brunel.ac.uk/~eg00xwp/page1.html.
This is an example of an `absolute' URL - it unambiguously
indicates a unique document on the whole planet.
Of course, giving URLs in full every time you make a link on your site
is both unwieldy and makes them difficult to change later on. You may
have noticed that when we introduced HTML linking we used a much shorter
form; instead of the full <A HREF="http://www.brunel.ac.uk/~eg00xwp/page2.html">
we simply had <A HREF="page2.html">
. The
latter is a `relative' URL - it explains to the browser how to amend the current URL to get the new one:
/
" then it is assumed that
it gives the path and file name on the same machine as the current document.
For example, if Xena wants to make a link to Buffy's page then she can
use just <A HREF="/~eg00bvs/page1.html">
.
<A HREF="page2.html">
example. As the new URL is just tacked on to the old path, we can include
subdirectories (sub-folders) in relative URLs:
<A HREF="images/image1.gif">
might be the
same as <A HREF="http://www.brunel.ac.uk/~eg00bvs/images/image1.gif">
.
This image would be stored on the Unix filesystem at ~eg00bvs/webhome/images/image1.gif.
Relative URLs don't uniquely define a single document, but instead
indicate its location relative to the current page. Consider what happens if Xena "borrows" Buffy's
page1.html and puts it in her own directory:
<IMG SRC="image1.gif">
is now equivalent to
<IMG SRC="/~eg00xwp/image1.gif">
instead of
<IMG SRC="/~eg00bvs/image1.gif">
. This is why if
you save a single HTML file off the web you usually lose all the images.
On the other hand, as your webhome directory becomes cluttered through
the year you may want to tidy it up by putting various projects in
separate subfolders: if you've used relative URLS you can easily move the
files around (or on to a floppy) without having to go and change all the
links.
There's no right answer to which you should use: this depends on the
application (hint - for now the short form is probably easier).
There's also an added complication: the BASE HREF
tag.
BASE HREF
TagWouldn't it be great if you could define the URL used to interpret relative URLs ? Well, now you can:
<BASE HREF="http://www.brunel.ac.uk/~eg00xwp/">
The BASE HREF
tag goes inside the <HEAD>
section of an HTML page. All relative tags are resolved using this path, rather
than the actual URL of the current document. The trailing "/
" is needed
to avoid confusing some browsers.
Unfortunately, I find this tag tends to combine the disadvantages of both relative and absolute URLs, rather than their advantages. In practice most of the donkey work is now done by website creation packages anyway.
If you are linking to a long webpage then you may want to take people directly to the interesting part, rather than just to the top. There are two ways to identify sub-sections in an HTML document.
The traditional method is to use the A NAME
tag to mark a
convenient section of text as a named `bookmark', which can act
as a destination for a hyperlink:
... and some <A NAME="anchor">interesting facts</A> include...
This will invisibly make the words "interesting facts" into a bookmark named "anchor".
In newer browsers and revisions of HTML, an increasing number of tags can
also accept the ID
attribute, which has a similar effect:
<H2 ID="bookmark">Linking to Document Subsections</H2>
So far we have just set up the targets, we still have to link to them.
We do this simply by adding a "#
" followed by the bookmark name
to the end of the URL, e.g.
<A HREF="page2.html#anchor">Some interesting facts</A>
The name you use for a bookmark must be a single "word" (it can
include numbers) and must be unique within a single webpage. To link to a
bookmark elsewhere in the same page, just use a relative URL with no file
name: <A HREF="#bookmark">
It's a good idea to get into the habit of adding a TITLE
to your links:
<A HREF="page2.html#anchor" TITLE="Interesting tutorial (Audio file)">Some interesting facts</A>
Newer graphical browsers can display the title as a cute little pop-up when you move the mouse pointer over the link. It can also provide useful navigational information for users, especially if you provide concise, detailed information A link title is particularly important when linking to a standalone image or sound file.
So far we've only displayed images. We can also turn an image into a
clickable link, simply by putting it inside the A HREF
tag:
<A HREF="page2.html"><IMG SRC="image1.gif"></A>
In fact, everything between the <A HREF>
and
the </A>
becomes part of the same link: you can make the
entire webpage one long link if you want to.
Clickable images are often used to connect to a larger version of the same image. To do this we use the resize feature of Photoshop to create a small version (of the relevant area) of the image, called a `thumbnail', to use on the page. If we don't need to display a caption or other text with the large image we can make a link directly to the large image itself (a `stand-alone image' that has the browser window all to itself) instead of embedding it in another HTML file:
<A HREF="image1big.gif" TITLE="A BIG picture of my favourite digit"><IMG SRC="image1thumb.gif"></A>
You have already seen the WIDTH
and HEIGHT
attributes used with the
IMG
tag. You can also use these to display
"thumbnails", but remember that you then make the viewer
download a great big file only to see a piddly graphic. These attributes
have a more important use: the browser can only lay out the text of your
webpage once it knows how big any images are. If it has to wait for the
image to be delivered from a slow server then the viewer will be left
looking at a blank screen. Including WIDTH
and HEIGHT
in IMG
tags lets the browser start displaying the text as soon as it arrives,
leaving gaps to put the images in later. [Tables have a similar
problem: the browser can't work out how to display a table until it
receives all the data, so making your whole page into one large
table can be a bad idea on a slow server]
The other attribute that should always be included in
IMG
is ALT
,
which lets you add a text description of
your image. This is an all round hero: not only does it mean that people
who can't see images can know what's going on, but it also allows the
browser to display something useful while waiting for the image to
arrive, and if the image can't be downloaded for some reason (a "broken
link") then again the viewer at least gets some useful information,
AND on a complex page it can actually help you find the link
that doesn't work...
<IMG SRC="image1.gif" WIDTH="400" HEIGHT="300" ALT="A picture of my thumb">
A web browser will completely ignore anything in a stream of HTML after a <!--
until it sees a -->
. This lets you embed notes of how and why you did something within the HTML
code itself without it appearing in the browser, e.g.
<TABLE>
<CAPTION>Table 1: monthly sales figures for this year</CAPTION>
<!-- in reverse order so they appear to be rising -->
...
</TABLE>
Remember that comments are only ignored by the browser, not the server - they are still sent to the viewer's computer and are visible using the View Source feature.
Posh words for symbols (such as a © ). Since the < and > symbols are used to denote
the tags when coding HTML, if you need to actually display them you can't just stick them in the text without confusing the browser.
Instead of "<" you must type <
and instead of a ">"
you must type >
. As you can see, the symbol is just represented by an ampersand (the "and
symbol" &) immediately followed by the symbol name and a semicolon. Since a & now indicates a symbol,
&
must be used for an ampersand, and "
should be used
to display quotation marks (i.e. ").
Another useful entity is the non-breaking space
. This is displayed on
the screen as a normal space, but behaves differently: text joined by a non-breaking space won't be split by the browser at the end
of each line, while a series of
s can be used to push
text apart.
There is a huge list of other entities available, both symbols (©
: ©)
and accented characters (É
, é
: É, é).
A useful list is included in the HTML beginner's guide (Web edition), and continually expanding definitive lists available on the Web.
Note that character entities are CASE SENSITIVE!
This is something of an aside, as we won't actually cover creating and reading cookies in this module. The disappointingly inedible cookies in question provide a mechanism by which the web server can associate information with a particular "client" - you may recall them being the subject of heated debate a while ago. Cookies are simply pieces of information that the server asks your browser to keep on your computer. From then on, whenever the browser requests a page from a given set of servers it automatically includes this stored data with the request. Potentially this can be used to track an individual's use of the web.
First lets look at the mechanism in slightly more detail. When your browser requests a certain page, the web server will send the page back accompanied by a request for your browser to store some data as one or more cookie. A `cookie' has three main parts: it has a name, so that the server knows what to do with it when it gets it back; it has a value or content, which is the actual data to be stored, in text form; and it has some control information, notably a domain to which this particular cookie is to be sent.
If the browser accepts the cookie (and it may be set up not to or refused by the (human) user in person) then it will store it somewhere on the user's computer, such as in a file called cookies.txt. If you've used the centrally provided version of Netscape Navigator (v 7), then you can see any cookies you've picked up in H:\NS7\gobbledygook\cookies.txt. Whenever the browser then requests a page from any server in the specified domain it then includes the cookie (the name and data) in with that request.
Clearly cookies are a very useful mechanism - they make "shopping carts" for on-line shopping much easier to implement, for example. They also cannot access any information that isn't in the cookie file. They can however be used in ways that people find disturbing as they have implications for their privacy. Some simple rules for using cookies:
Frames provide a mechanism for splitting up the browser window into a series of separate areas. The contents of each area, or `frame', are themselves an HTML document that can be changed independently of the others. The original idea was to allow those parts of a web site that were the same for a large number of webpages to be kept separate, so they only needed to be downloaded once, while only the material that actually changed would need to be downloaded as the browser surfed from page to page.
The main flaw with frames is precisely that the content of each frame is a separate, complete document; this means that the whole page as seen cannot be referenced with a single URL, making it impossible for the user to bookmark or e-mail to friend, whereas if one of the sub-documents is found directly (say from a search engine) then the user will see it without any of the intended contextual, or even navigational information. These problems can be minimised by careful site design.
The second difficulty dates right back to when they were originally introduced: a lot of sites appeared that might have looked fine using Navigator, but only gave a blank screen in any other browser (OK, so a few bothered to put up a message like "You can't see this site as it uses frames - please download Netscape"). Although this was basically due to the conceit and laziness of the site authors, it created a lot of bad feeling that rubbed off on the use of frames themselves. Of course, non-frames capable browsers are still extant (notably Lynx and some TV set-top boxes) so you'll need to cater for them, which generally implies a second, no-frames version of part of your site, and of course this also has to be kept up to date and synchronised with the frames version.
The last problem has been mostly overcome: some older browser versions didn't properly integrate the frames into the history mechanism, so that you couldn't properly undo and redo surfing through framed pages, but these have been largely fixed in current versions. Again, this is a problem that is easily exacerbated through poor design.
A webpage constructed using frames simply consists of a set of HTML documents each constrained to only use a certain part of the overall browser window. The documents themselves are mostly just standard HTML, similar to that which we have been discussing so far. The mechanism by which they are allocated frames is a new type of web page, the `frameset page', which doesn't actually contain any concrete information itself but just provides formatting information to the browser:
<HTML>
<FRAMESET COLS="80,25%,*">
<FRAME SRC="page1.html" NAME="first">
<FRAME SRC="page2.html" NAME="second">
<FRAME SRC="page3.html" NAME="third">
</FRAMESET>
</HTML>
This example will divide the screen into three columns, and display page1.html in
the left-hand one, page2.html in the middle and page3.html on the right.
The work is mostly done by the FRAMESET
tag:
COLS
indicates that it should divide the screen into vertical columns (ROWS
for horizontal rows) and there
then follow three different ways of specifying the desired column width (or row depth - start from the top of
the browser window). We can either specify the size in pixels, which is good for including images, or as a
percentage of the available space, which allows the layout to scale as the browser window is resized.
The "*" just stands for "all the remaining space".
Having set up the frames we need to display something in each in turn. The operation of the FRAME
tag
should be obvious: SRC
indicates the URL of an object to be displayed in the frame (if the object is
bigger than the allowed space the browser will automatically provide scroll bars) and NAME
allows us to
assign a unique name to each frame (we'll see how to use this later).
Since a FRAMESET
can only use one of ROWS
or
COLS
, we must set up more complex arrangements of frames by nesting FRAMESET
tags. If we want page2.html and page3.html displayed next to each other in
a thin strip below page1.html we could use
<HTML>
<FRAMESET ROWS="*,50">
<FRAME SRC="page1.html" NAME="first">
<FRAMESET COLS="50%,50%">
<FRAME SRC="page2.html" NAME="second">
<FRAME SRC="page3.html" NAME="third">
</FRAMESET>
</FRAMESET>
</HTML>
Now that we can set up any desired static set of frames all we need is a way to manipulate the contents.
This is done through a simple extension to the A HREF
tag: in any of the subdocuments we can add
the TARGET
attribute to a hyperlink to have the new page displayed in the frame named as the
TARGET
. For example, if page1.html includes
<A HREF="page2.html" TARGET="third">Snap!</A>
then clicking on "Snap!" will replace page3.html with page2.html in the frame "third". As well as the frame names defined in the frameset page, there are some standard frames pre-defined (they begin with an underscore):
TARGET="_SELF"
makes the new document load into the same frame as the current document
(i.e. that containing the link) - this is the default behaviour
TARGET="_PARENT"
makes the new document load into the whole area covered by the current
FRAMESET
, getting rid of any other "sub-frames"
TARGET="_TOP"
makes the new document load into the whole browser window, removing all frames
TARGET="_BLANK"
makes the new document load into a blank window (i.e. a new browser window).
The frames remain unaltered in the old window.
Note that frame names are case sensitive.
So now we can make Netscape do also sorts of wild and wacky things, but what happens if we open the
frameset page above in a non-frames capable browser? Sure enough, it's a blank screen. The
NOFRAMES
tag should be used to help the users of such browsers:
<HTML>
<FRAMESET ROWS="*,50">
<FRAME SRC="page1.html" NAME="first">
<FRAMESET COLS="50%,50%">
<FRAME SRC="page2.html" NAME="second">
<FRAME SRC="page3.html" NAME="third">
</FRAMESET>
</FRAMESET>
<NOFRAMES>
You can't see this site as it uses frames - please download Netscape... Ow! Stop hitting me!
</NOFRAMES>
</HTML>
Frames capable browsers simply ignore anything enclosed by the NOFRAMES
tags, while
other browsers just display this material - in this example just a curt message. In fact the NOFRAMES
tag can include the BODY
tag as used in normal pages, and hence
pretty well anything else you might want to do, though in practice there rarely seems to be little more
than a link to the no-frames version of the site.
<HTML>
<FRAMESET ROWS="*,50">
...
</FRAMESET>
<NOFRAMES>
<BODY>
<H1>Welcome to a Frameless Site</H1>
...
</BODY>
</NOFRAMES>
</HTML>