Working together for standards The Web Standards Project


HTML Versus XHTML

WaSP asks

Which should we use, HTML or XHTML, and why?

The W3C Responds

First, a bit of history

The history of HTML at W3C starts with HTML 3.2, code named Wilbur, which was followed a few years later by HTML 4.0, then HTML 4.01. HTML 4.01 is the last version of HTML, and is also the final W3C specification to define the semantics of markup. From HTML 3.2 to HTML 4.01, the language has improved a great deal, focusing on such issues as:

  • Separation of presentation from structure
  • Improved accessibility features
  • Improve internationalization features
  • Improved document rendering

XHTML 1.0 was created shortly after HTML 4.01 to help the transition of hypertext to a new generation of mark-up languages for text. XHTML 1.1 is an additional step toward a more flexible version of hypertext with the full benefits of XML architecture and integration of different technologies. Note that XHTML 1.1 has slighly improved the semantics of HTML 4.01 by including the Ruby module, used in particular languages like Japanese scripts (read the Ruby Specification for more information). For practical purposes, the discussion here will focus on HTML 4.01 and XHTML 1.0.

The meaning of semantics

When we refer to the “semantics” of a language, we’re referring to the meaning of a given tag. HTML 4.01 and XHTML 1.0 assign the same semantics to their elements and attributes. For example, an element address has exactly the same meaning in HTML 4.01 and XHTML 1.0: they’re both used to mark up addresses. (Clarification: address is used to mark up contact information for a document.) Only bits of the syntax varies between the two languages. For example :

HTML 4.01 example

<img alt="Portrait Murakami Haruki"
   src="/images/murakami.jpg">

<p lang="fr">Je
levai la tête pour regarder les
étoiles.  Leur vue apaisa peu
à peu les battements de mon
coeur.</p>

<p><cite class="title">Chroniques
de l'oiseau à ressort</cite>
 - <cite class="author">Haruki
 Murakami</cite></p>

XHTML 1.0 example

<img alt="Portrait Murakami Haruki"
   src="/images/murakami.jpg" />

<p xml:lang="fr">Je
levai la tête pour regarder les
 étoiles. Leur vue apaisa peu
 à peu les battements de mon
 coeur.</p>

<p><cite class="title">Chroniques
de l'oiseau à ressort</cite>
 - <cite class="author">Haruki
 Murakami</cite></p>

The syntax in these examples are still very similar and there are only a few differences between them.

Both languages come in three flavors: Frameset, Transitional and Strict. The “strict” version is strongly recommended by the W3C for regular documents. Using strict versions removes problematic elements as well as forcing a significant separation between the structure of your document and its presentation. Transitional versions allow deprecated elements to assist those implementers to upgrade smoothly their software or their content.

Using the right tool for the job

Is there any advantage to using HTML 4.01 over XHTML 1.0? There is no simple answer and the benefits you will gain are tied to how you’re using the language in a given situation.

Switching from HTML 4.01 to XHTML 1.0 brings almost no direct benefits for the visitors of your Web site; still, there are several good reasons for Web authors to make the switch:

XHTML is easier to maintain

XML syntax rules are far more rigorous than HTML. As a result, XHTML makes authors work more precisely, having to address issues such as:

  • all elements and attribute names must appear in lower case
  • all attribute values must be quoted
  • non-Empty Elements require a closing tag
  • empty elements are terminated using a space and a trailing slash
  • no attribute minimization is allowed
  • in strict XHTML, all inline elements must be contained in a block element

In HTML, case, quotes, termination of many elements and uncontained elements are allowed and commonplace. The margin for errors in HTML is much broader than in XHTML, where the rules are very clear. As a result, XHTML is easier to author and to maintain, since the structure is more apparent and problem syntax is easier to spot.

XHTML is XSL ready

As you are probably aware by now, XHTML 1.0 is the reformulation of HTML 4.01 in XML. Therefore, XHTML documents are hypertext documents and XML documents. A powerful technology has been developed at W3C to manipulate and transform XML documents: the Extensible Style sheet Language Transformations (XSLT). This technology is tremendously useful to create various new resources automatically from an XHTML document. For example

  • You can create a table of contents for a long document
  • Get a quick overview of a page by listing its languages and structural outlines! See the Semantics extractor for this page, created by W3C QA Working Group member Dominique HazaĆ«l-Massieux
  • You can provide a printable version of your documents by using the XSL-FO features of XSL
  • You can produce an RSS feed directly from your page, check out the QA RSS feed to see this in action
XHTML is easier to teach and to learn

The syntax rules defined by XML are far more consistent than those found in HTML and therefore easier to explain than the SGML rules on which HTML is based.

XHTML is ready for the future

When the new version of XHTML becomes a recommendation, XHTML 1.0 documents will be easily upgradable to this new version, to allow to take advantages of its exciting new features. It’s likely that an XSLT style sheet will be available by then to help you move your XHTML 1.0 (strict) documents to XHTML 2.0 documents.

Well okay, so what?

Yes, HTML 4.01 is as valuable as XHTML 1.0 in a daily usage. The syntax proposed by XHTML 1.0 has several important benefits. The weight of these benefits has to be evaluated in the context of your project: Use the right tool for the right job.

For a Web designer, starting to use XHTML 1.0 will be helpful in some circumstances and will certainly help you to smoothly negotiate the future. XHTML 1.0 gives a wonderful opportunity to learn about XML languages and their possibilities without having to learn new semantics because you’re working with familiar tags and attributes.

Related reading on WaSP and the W3C

Please see the WaSP article Common ideas between HTML and XHTML.

You can read more about using XSLT and XHTML together at the W3C’s Web site.

Discussion

For clarification and discussion on this topic, please address your comments and questions to the W3C Web Standards Education list.

To subscribe to the list, send an email to [email protected] with “Subject: subscribe”. You can read archived posts at http://lists.w3.org/Archives/Public/public-evangelist/.

The Web Standards Project is a grassroots coalition fighting for standards which ensure simple, affordable access to web technologies for all.


All of the entries posted in WaSP Buzz express the opinions of their individual authors. They do not necessarily reflect the plans or positions of the Web Standards Project as a group.

This site is valid XHTML 1.0 Strict, CSS | Get Buzz via RSS or Atom | Colophon | Legal