Simon & Matt's new Photoshop book is out now!

Photoshop CS3 Layers Bible book cover

Sharpen your Photoshop skills today!

PageKits.com Featured PageKits

Photoshoot PageKit
Photoshoot ($54.99)


Waaarp! PageKit
Waaarp! ($29.99)


See more! > >

 

How-To: Converting HTML to XHTML

Level: Intermediate. Published on 1 October 2007 in XHTML

Learn how to convert an HTML Web page to XHTML with this detailed example.

In our Introducing XHTML article, we took a look at how XHTML differs from regular HTML 4. In this article, you'll learn how to convert an HTML 4 Web page to fully standards-compliant XHTML 1.0 by working through a practical example.

The HTML 4 page

Take a look at the page we're going to convert. This page validates to HTML 4.01 Transitional. The source markup looks like this:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
  <HEAD>
    <TITLE>My cat called Lucky</TITLE>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
  </HEAD>
  <BODY>

    <A NAME="top"> </A>

    <H1>My cat called Lucky</H1>

    I have a cat called Lucky. She is black & white, and nearly
    twelve years old.<P>
    
    I found her through a pet rescue service. She didn't like her
    old home because it had a big scary dog in it that used to
    frighten her. When I first got her she was very scared and
    hid under the table for a whole week! Nowadays she is still
    a bit jittery but much more relaxed.<P>

    Here is a picture of Lucky in the garden.<P>

    <IMG SRC="images/lucky-being-stroked.jpg" ALT="Lucky" WIDTH=400
    HEIGHT=300 BORDER=0>

    <BR><BR>
    
    She is very good at catching mice. She also catches birds,
    which can be a problem. Now that she has a collar and bell,
    though, she catches fewer birds.<P>

    <H2>Email Lucky!</H2>

    Use the form below to send Lucky an email. You never know -
    she might even reply, if she's not too busy!<P>

    <FORM METHOD="post" ACTION="mailform.cgi">
      Your email: <INPUT TYPE="text" NAME="email"><P>
      Your message: <TEXTAREA NAME="message" COLS=40 ROWS=8>
      </TEXTAREA><P>
      Do you have a cat?
        <INPUT TYPE="radio" NAME="haveCat" VALUE="yes" checked>Yes
        <INPUT TYPE="radio" NAME="haveCat" VALUE="no">No<P>
      <INPUT TYPE="submit" NAME="Send" VALUE="Send Email">
    </FORM>

    <P><A HREF="#top">Top of page</A>

  </BODY>
</HTML>

As you can see, it's a Web page about my cat. It's a simple page, but it contains a lot of markup that needs to be changed if the page is going to be valid XHTML 1.0.

Changing tags to lowercase

Our first task is to change all those uppercase tags to lowercase. XHTML requires that all elements and attributes be written in lowercase. Here's how our markup looks with lowercase tags:


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>My cat called Lucky</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>

    <a name="top"> </a>

    <h1>My cat called Lucky</h1>

    I have a cat called Lucky. She is black & white, and nearly
    twelve years old.<p>
    
    I found her through a pet rescue service. She didn't like her
    old home because it had a big scary dog in it that used to
    frighten her. When I first got her she was very scared and
    hid under the table for a whole week! Nowadays she is still
    a bit jittery but much more relaxed.<p>

    Here is a picture of Lucky in the garden.<p>

    <img src="images/lucky-being-stroked.jpg" alt="Lucky" width=400
    height=300 border=0>

    <br><br>
    
    She is very good at catching mice. She also catches birds,
    which can be a problem. Now that she has a collar and bell,
    though, she catches fewer birds.<p>

    <h2>Email Lucky!</h2>

    Use the form below to send Lucky an email. You never know -
    she might even reply, if she's not too busy!<p>

    <form method="post" action="mailform.cgi">
      Your email: <input type="text" name="email"><p>
      Your message: <textarea name="message" cols=40 rows=8>
      </textarea><p>
      Do you have a cat?
        <input type="radio" name="haveCat" value="yes" checked>Yes
        <input type="radio" name="haveCat" value="no">No<p>
      <input type="submit" name="Send" value="Send Email">
    </form>

    <p><a href="#top">Top of page</a>

  </body>
</html>

Notice that we don't need to change the values of attributes ("Lucky", "haveCat" and so on) to lowercase. Also notice that we made html lowercase in the DOCTYPE declaration at the top of the page (but left the other parts of the declaration untouched).

Quoting attribute values and expanding attributes

All attribute values need to be quoted in XHTML, even if they're numeric. For example:


Incorrect: <img ... border=0>
Correct: <img ... border="0">

In addition, XHTML doesn't allow you to use attribute names without their values; such attributes need to be expanded:


Incorrect: <input type="radio" ... checked>
Correct: <input type="radio" ... checked="checked">

Some browsers, such as Safari, actively refuse to recognise so-called "minimised" attributes (attribute names without values) if the document type is XHTML.

After going through our HTML page and correcting these issues, we're left with the following markup:


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>My cat called Lucky</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>

    <a name="top"> </a>

    <h1>My cat called Lucky</h1>

    I have a cat called Lucky. She is black & white, and nearly
    twelve years old.<p>
    
    I found her through a pet rescue service. She didn't like her
    old home because it had a big scary dog in it that used to
    frighten her. When I first got her she was very scared and
    hid under the table for a whole week! Nowadays she is still
    a bit jittery but much more relaxed.<p>

    Here is a picture of Lucky in the garden.<p>

    <img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
    height="300" border="0">

    <br><br>
    
    She is very good at catching mice. She also catches birds,
    which can be a problem. Now that she has a collar and bell,
    though, she catches fewer birds.<p>

    <h2>Email Lucky!</h2>

    Use the form below to send Lucky an email. You never know -
    she might even reply, if she's not too busy!<p>

    <form method="post" action="mailform.cgi">
      Your email: <input type="text" name="email"><p>
      Your message: <textarea name="message" cols="40" rows="8">
      </textarea><p>
      Do you have a cat?
        <input type="radio" name="haveCat" value="yes" checked="checked">Yes
        <input type="radio" name="haveCat" value="no">No<p>
      <input type="submit" name="Send" value="Send Email">
    </form>

    <p><a href="#top">Top of page</a>

  </body>
</html>

Note that all these changes still leave us with a perfectly valid HTML 4.01 page. XHTML is largely backward-compatible with HTML.

Making the document well-formed

Our HTML Transitional page isn't well-formed. XHTML Strict requires all documents to be well-formed, so we'll need to make a few changes to the markup's structure.

Closing open elements

In order to be a well-formed XHTML document, all elements in the document must be closed. This means they need a closing tag: </p>, </b> and so on. Alternatively, if the element is empty (contains no content) then you can just place a slash (/) before the > at the end of the tag — for example, <br />.

Although you can just write <br/> (without the space before the slash), it's a good idea to put the space in to avoid confusing some HTML browsers.

Nesting inline elements inside block elements

Strict-mode documents — whether HTML or XHTML — require that all inline elements such as a, img and input, as well as bare text, are nested inside block-level elements, such as p or div. This means that we need to properly wrap our text, as well as any bare inline elements, in <p></p> tags.

So let's go through our HTML document and fix up all those unclosed elements and non-nested inline elements. Here's the result:


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>My cat called Lucky</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  </head>
  <body>

    <p><a name="top"> </a></p>

    <h1>My cat called Lucky</h1>

    <p>I have a cat called Lucky. She is black & white, and nearly
    twelve years old.</p>
    
    <p>I found her through a pet rescue service. She didn't like her
    old home because it had a big scary dog in it that used to
    frighten her. When I first got her she was very scared and
    hid under the table for a whole week! Nowadays she is still
    a bit jittery but much more relaxed.</p>

    <p>Here is a picture of Lucky in the garden.</p>

    <p><img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
    height="300" border="0" /></p>
    
    <p>She is very good at catching mice. She also catches birds,
    which can be a problem. Now that she has a collar and bell,
    though, she catches fewer birds.</p>

    <h2>Email Lucky!</h2>

    <p>Use the form below to send Lucky an email. You never know -
    she might even reply, if she's not too busy!</p>

    <form method="post" action="mailform.cgi">
      <p>Your email: <input type="text" name="email" /></p>
      <p>Your message: <textarea name="message" cols="40" rows="8">
      </textarea></p>
      <p>Do you have a cat?
        <input type="radio" name="haveCat" value="yes" checked="checked" />Yes
        <input type="radio" name="haveCat" value="no" />No</p>
      <p><input type="submit" name="Send" value="Send Email" /></p>
    </form>

    <p><a href="#top">Top of page</a></p>

  </body>
</html>

Notice that we removed the <br /><br /> after the img element. Apart from being invalid XHTML — br is an inline element, so it can't be placed directly in the page body without being wrapped in a block element — the line breaks were no longer necessary once we'd correctly wrapped our img in a block-level p element.

That's better. We've closed all our elements, either by placing a closing tag after each opening tag, or by using the slash (/) shortcut to close empty elements. In addition, all inline elements are properly encased in block-level elements — in this case, p elements.

Removing presentational markup

Generally speaking, XHTML encourages you to use CSS to describe the look of your pages, rather than embedding presentation within the markup. This means that attributes such as width, height, align, size and border should be replaced with CSS equivalents; such attributes are deprecated in XHTML. Let's change our img element accordingly, from:


    <p><img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
    height="300" border="0" /></p>

to:


    <p><img src="images/lucky-being-stroked.jpg" alt="Lucky"
    style="width: 400px; height: 300px; border: none;" /></p>

In a real-world situation, it'd be a good idea to move the above inline CSS to a separate style sheet, and place a class or id attribute on the img element so that you can select it from within the CSS. If possible, keep all presentational aspects out of your markup.

Changing name to id and encoding ampersands

Nearly there. We just need to make a couple more minor changes to turn our markup into valid XHTML.

First of all, using the name attribute to identify fragments (sections of markup to link to within the page) is deprecated in XHTML. The id attribute should be used instead. This means that we need to rewrite our #top fragment:


    <a name="top"> </a>

as:


    <a id="top"> </a>

Using name is still OK in other situations, such as form fields. You only need to change name to id when defining fragments that you link to with <a href="# ... .

Don't forget that, unlike the name attribute, ids must be unique; you can't have more than one element with the same id in the page.

Secondly, we have a single bare ampersand in our markup. This is not allowed in XHTML; all ampersands must be encoded. So we need to change:


<p>I have a cat called Lucky. She is black & white, and nearly
    twelve years old.</p>

to:


<p>I have a cat called Lucky. She is black &amp; white, and nearly
    twelve years old.</p>

Changing the document type

Excellent! We've changed all our markup so that it validates to XHTML 1.0 Strict. We now need to change the page's document type from HTML 4.01 Transitional to XHTML 1.0 Strict. The DOCTYPE for XHTML 1.0 Strict is:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

In addition, we need to add an xmlns namespace declaration inside the html element to make the page a valid XML document:


<html xmlns="http://www.w3.org/1999/xhtml">

So our final XHTML 1.0 Strict markup looks like this:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>My cat called Lucky</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  </head>
  <body>

    <p><a id="top"> </a></p>

    <h1>My cat called Lucky</h1>

    <p>I have a cat called Lucky. She is black &amp; white, and nearly
    twelve years old.</p>
    
    <p>I found her through a pet rescue service. She didn't like her
    old home because it had a big scary dog in it that used to
    frighten her. When I first got her she was very scared and
    hid under the table for a whole week! Nowadays she is still
    a bit jittery but much more relaxed.</p>

    <p>Here is a picture of Lucky in the garden.</p>

    <p><img src="images/lucky-being-stroked.jpg" alt="Lucky"
    style="width: 400px; height: 300px; border: none;" /></p>
    
    <p>She is very good at catching mice. She also catches birds,
    which can be a problem. Now that she has a collar and bell,
    though, she catches fewer birds.</p>

    <h2>Email Lucky!</h2>

    <p>Use the form below to send Lucky an email. You never know -
    she might even reply, if she's not too busy!</p>

    <form method="post" action="mailform.cgi">
      <p>Your email: <input type="text" name="email" /></p>
      <p>Your message: <textarea name="message" cols="40" rows="8">
      </textarea></p>
      <p>Do you have a cat?
        <input type="radio" name="haveCat" value="yes" checked="checked" />Yes
        <input type="radio" name="haveCat" value="no" />No</p>
      <p><input type="submit" name="Send" value="Send Email" /></p>
    </form>

    <p><a href="#top">Top of page</a></p>

  </body>
</html>

View the finished XHTML page in all its glory!

As you can see, converting an HTML 4 page to XHTML can be fairly time-consuming, though the process is straightforward. If you're converting a lot of pages, you might find tools such as HTML Tidy helpful, as they can convert HTML to XHTML automatically.

The end

That's the end of this article. We hope you found it useful. If you're still stuck and would like further help, check out our online Help Forums, where you can get assistance from members of Elated and other webmasters.

Also, don't forget the free ELATED Extra Newsletter, where you can get more great Web-building articles and tips sent straight to your inbox!

If you would like to offer us feedback on this or any of our articles, please contact us. Have fun!

Top of Page

Get our free newsletter!

  • Improve your Web skills
  • Exclusive tips and tricks
  • Free bonus Web template

Sign up now!

We won't give or sell your email address to anyone, and you can unsubscribe at any time. Privacy statement