The HTML class attribute: for styling purposes only?

Posted on 02.08.11

0



Introduction

Cascading Style Sheets (CSS) has the HyperText Markup Language (HTML) language specific class selector. The CSS rule below applies to an HTML element which has a class attribute with the value name:

*.name { color: gray; }

Without the class selector you would have to use the attribute selector. The rule below is equivalent to the rule in the previous example:

*[style~=name] { color: gray; }

Then a question arises. Was the HTML class attribute added to HTML 4.0 in 1997 just to make it more convenient to write CSS files? To answer this question we have to dig into World Wide Web Consortium (W3C) specifications and mailing lists.

If you don’t want the details you can skip right to my conclusion.

Early specifications

This quote from the W3C Working Draft HTML and Style Sheets: W3C Working Draft 24-Mar-97, which doesn’t tie HTML to any particular style sheet language, states that the two elements div and span was added to HTML 4.0 to make it easier to style an HTML document:

To make it easier to apply a style to parts of a document, two new elements for use in the body of an HTML document are defined: DIV and SPAN. The first is to enclose a division (chapter, section, etc.) of a document, making it possible to give a whole section a distinctive style. The latter is used within paragraphs, similarly to EM, but in cases where none of the other HTML elements (EM, STRONG, VAR, CODE, etc.) apply.

These two quotes from the same document states that the HTML class attribute was added to support effective use of style sheets with HTML documents:

To support effective use of style sheets with HTML documents a number of common attributes are proposed. These can be used with most HTML elements. In general, all attribute names and values in this specification are case insensitive, except where noted otherwise.

A space separated list of class names. CLASS names specify that the element belongs to the corresponding named classes. These may be used by style sheets to provide class dependent renderings. Note that white space characters are permitted within class names, and that one or more contiguous white space characters should be treated as the same as a single space character (decimal 32).

In this quote from Cascading Style Sheets, level 1: W3C Recommendation 17 Dec 1996 we can read that the class attribute was added to HTML to increase the control of elements:

To increase the granularity of control over elements, a new attribute has been added to HTML [2]: ‘CLASS’. All elements inside the ‘BODY’ element can be classed, and the class can be addressed in the style sheet

It is also interesting to read in the same document that:

CSS gives so much power to the CLASS attribute, that in many cases it doesn’t even matter what HTML element the class is set on — you can make any element emulate almost any other. Relying on this power is not recommended, since it removes the level of structure that has a universal meaning (HTML elements). A structure based on CLASS is only useful within a restricted domain, where the meaning of a class has been mutually agreed upon.

These two documents are the result of work from several people. The W3C mailing lists gives us a unique opportunity to find information from discussions that led into adding the class attribute. You will find lot’s of quotes from the www-style mailing list from the year 1995 below:

The www-style mailing list

http://lists.w3.org/Archives/Public/www-style/1995May/0012.html:

… One example of such an (unwanted, IMHO) dependency is the fact that
Hakon's style language currently presupposes the existence of an
attribute CLASS, because that's what the notation `H1.punk' means. To
fix this, I proposed a global declaration `@archform CLASS', to be
inserted at the start of the style sheet, so that applications would
now what the dot stands for. (Btw. a `.' is maybe not a good choice,
since the dot is often used a name character in SGML; that's why I
used `@' instead.) …

http://lists.w3.org/Archives/Public/www-style/1995Jul/0016.html:

… Of these, CLASS is the one that is important, since it allows people
to create new elements at will, without it having any effect on
applications that don't know the new element. For example, I can add
tags for CITY, PERSON, DATE, INSTRUCTION, EVENT, NUMBER, etc., simply
by using <TEXT CLASS=CITY>, <TEXT CLASS=PERSON>, etc. Some
applications would do special things with this information, others
would simply recognize it as legal HTML but subsequently ignore it. …

http://lists.w3.org/Archives/Public/www-style/1995Jul/0017.html:

… It only makes sense to define extra styles if they are attached to new
pseudo-elements.  Attaching arbitaray styles to a single element is a 
regression to Word for Windows style formatting, e.g. <C CLASS="UNDERLINE">
<C CLASS="LARGE">, <C CLASS="EXTRA LARGE">.  This is just Netscape 
extensions with a more verbose syntax.

I also oppose <TEXT>, however.  Any new element should be based on an old
one.

Think of it this way: what would be the easiest way to turn a Word for Windows
document into a "correct" HTML 3.0 document:

<STYLE>
massive amounts of style sheet declarations here to precisely emulate the
Word for Windows environment
</STYLE>
<TEXT class="s23dfe2as">Text<TEXT class="s2ecfe232"> text</TEXT></TEXT>

We must not allow this.  If we force them to use real HTML elements then
we can show users the output and explain to them why it is wrong:
"look it used an emphasis tag when you didn't really want emphasis.  Look
it used an address tag to enclose something that is not an address."

If we create a tag with no semantics it can be used anywehere without
ever being wrong.  We must force authors to properly tag the semantics
of their document.  We must force editor vendors to make that choice
explicit in their interfaces.

http://lists.w3.org/Archives/Public/www-style/1995Jul/0019.html:

… With generic character-level elements, the source for the database could
be written *in HTML* with key fields (i.e. Department) inside generic
elements subclassed appropriately:
 .... Proffessor X teaches the following courses in the
<STRING CLASS=Department>Mathematics</STRING> department: ...
Unless a specific stylesheet hint for STRING[CLASS=Department] existed,
this would render as normal text, which is probably the desired effect.
The database search engine, however, can now search for all professors
from a given department by looking at the HTML. This allows the source
files for the database to be in a relatively human-readable form, and
allows hand-editing without special software. …

… secondary purpose of the generic text entity is to allow special
rendering hints for certain semantic elements. For example, the
names of individual products in a searchable catalog could be rendered in
a different font …

http://lists.w3.org/Archives/Public/www-style/1995Jul/0025.html:

I'm also getting a handle on how the classes are done in the style sheet. 
This is really useful, not only to hint at the presentation, but also for
showing more precisely what a tag means.

address.street: font.color = #FFF;
address.street: back.color = #000;
address.email: font.color = #00C;

<address class=street>7125 Riverwood Dr, Columbia, MD</address>
<address class=email>Mike Batchelor <a href="mailto:mikebat@clark.net"
&lt;mikebat@clark.net&gt;</a></address>

This very nicely highlights the difference between the two kinds of
addresses, in the tags, and in the browser.

http://lists.w3.org/Archives/Public/www-style/1995Jul/0087.html:

… I next want to consider just how we intend to use the class 
attribute. I've not been on the www-html list for long, but as people 
who read my earlier posts on the subject of alternative media 
representations for table data (i.e, drawing charts instead of 
tables!), I have argued (or at least inferred) in the past that there 
are occassions when using the class attribute for some kind of purely 
presentational descriptor is almost as bad as having added a 
presentational tag to the spec. Better to use

<ADDRESS class="internet.email">Chris.Tilbury@estate.warwick.ac.uk</ADDRESS>
(to steal an oft quoted example of the use of the SGML name tokens)

than 

<ADDRESS class="purplefontgreenbackground">Chris.Tilbury@estate.warwick.ac.uk</ADDRESS>

if for no other reason than that HTML is such a massively generic 
application of SGML that people wishing to more descriptively markup 
their content (in this I include myself) will or may want to use the 
class tag for content orientated purposes rather than purely 
presentational ones. …

http://lists.w3.org/Archives/Public/www-style/1995Jul/0095.html:

                                      … if you believe that CLASS is
useless on non-stylesheet browsers, then you have misunderstood what CLASS
is for.  Style sheets are merely one application of the CLASS attribute. 
They are useful for precisely the application you have in mind, which is
to further specify the kind of data enclosed within a tag.  That
stylesheets can take advantage of this is a happy consequence of a good
idea. …

http://lists.w3.org/Archives/Public/www-style/1995Jul/0099.html:

… >I realize that CLASS is not just for stylesheets, but are we going to
>build a library of CLASS names with suggested meanings (and some suggested
>renderings)? If so, then I would use CLASS, but this hasn't been done so
>far as I know...

I think that it would be premature to standardize CLASSes until we see what
people want to do with them. …

… The CLASS and ID attributes should also be added to HTML as soon as possible
(HTML 2.1?).  A useful style sheet language can be developed which does not
depend on them, but can use them when they are present. …

…        encourage the usage of CLASS, which would contribute to its usage in
robots and other software.
        allow us to judge how people use CLASS so that we can think about
standardizing some usages.
        increase the awareness of platform portability issues.
        put the IETF and W3C back in the driver's seat with respect to the
direction of HTML and the Web. …

http://lists.w3.org/Archives/Public/www-style/1995Jul/0103.html:

… I believe that CLASSes should *never* be standardized
in the sense that you describe, i.e., such that a browser
would give elements special treatment based on CLASS
attribute values without an explicit instruction from
the author.

Authors must be free to use whatever CLASS names they
come up with without fear that Somebody Else's Browser
might do something unexpected with their document.

Domain- or organization- specific "conventional standards"
would be quite useful, but these should not be hardwired into
browsers.


> The CLASS and ID attributes should also be added to HTML as soon as possible
> (HTML 2.1?).  A useful style sheet language can be developed which does not
> depend on them, but can use them when they are present.  

Yes, definitely.  Also DIV. …

http://lists.w3.org/Archives/Public/www-style/1995Dec/0048.html:

… CLASS is a way of semantically subclassing elements.  Applying a style is
just one reason you would want to subclass an element.  Creating CLASSes
with types of "big" or "blue" or "five_point" are just as bad as creating
elements named "<BIG>" or "<FONT>".  If you absolutely must put style
information directly in your HTML document, and that style information does
not correspond to a semantic subclass, then you should use some other
attribute, such as STYLE. …

http://lists.w3.org/Archives/Public/www-style/1995Dec/0053.html:

… >> Why use STYLE as the name of the attribute instead of CLASS? Because it
>> matches the <STYLE> element in the HEAD. It makes no sense to use CLASS
>> instead, unless you have <CLASS> in the HEAD. In a word: Consistency.
>
>A good argument.  I think that the intention was to overload class to
>give semantic meaning as well as formatting control.

There was no overload.  The intention was to provide a mechanism for
subdividing classes of elements according to your needs.  The ability to
apply a particular style to a particular class is a BENEFIT of being able to
define classes, not an "alternate usage".

>However, as the consensus seemed to be away from any list of named
>classes, and since a thesaurus-type system such as Murray-Rust is using
>for CML is unlikely to be an option for all HTML documents, 

Maybe not for all, but perhaps for many.

>it seems
>that CLASS will only be used for semantic markup withing focussed
>subject areas where application conventions can be agreed.  This means
>that generic search engines, for example, are unlikely to offer
>searching based on CLASS (which was I think the original point).

If CLASS becomes widely used in certain disciplines, then specialized search
engines oriented for that discipline will support it.  Why should we care
about "generic" search engines? …

http://lists.w3.org/Archives/Public/www-style/1995Dec/0062.html:

… >Example: I have a table presenting the results of a scientific study; I
>want to call out three pairs of columns to talk about in the text, by
>presenting them in three different background colors.  There is no
>"meaning" to the colors.  The styling will not be used anywhere else in
>the document.  With a STYLE attribute I can put "STYLE={color: xxx}" on
>each of the three column groups and be done with it.  If all styles must
>go through a stylesheet, I must include a STYLE element (possibly
>otherwise unneeded) in the HEAD of the document, define three classes
>in it (with the wonderfully significant class names "red," "green," and
>"blue"), and then apply those classes to the colgroups.

Are they really just "red," "green," and "blue?"

Or are they CLASS="first_example", CLASS="second_example",
CLASS="third_example".  The latter is quite useful to someone reading your
paper through a speach synth.  The former is not.

You may have thought this through, and decided that "red," "green," and
"blue" are the most meaningful names you can come up with.  But what
percentage of the population is that knowledgable about device-independant
content presentation? Forcing them to go to the header may trigger some
thought about the issue.  "Carol, why does HTML force me to declare styles
like this?"  "Because you are supposed to use meaningful names for your
classes so that your documents will be useful to people using display
devices you have not thought about."

Similarly, HTML's limited formatting commands is a powerful pedagogic tool.
In online and offline fora all over the world the question "why can't I make
blue text" is answered every day. When it is explained, some respond: "I
don't care.  I just want to make blue text."  Some reply: "Interesting idea.
Where can I learn more about this structual markup."  Many of today's
proponents of structural markup started out this way.

When you are trying to move people to a new paradigm, you must make it
difficult to slip back into the old one.  I would guess that that is why
SmallTalk and Java don't have functions, and why ANSI C has strong type
checking.

The proposed STYLE attribute allows you to do your "red", "green", "blue"
thing and still serves this educational purpose.  It seems like a good
compromise to me. …

http://lists.w3.org/Archives/Public/www-style/1995Dec/0074.html:

… > One could also argue that where the styling is used to
>convey a specific typographic impression, giving the styling would be
>more useful to the visually impaired reader than giving an arbitrary
>name to the styling, since the reader could then visualize the
>appearance of the material.

A blind person visualize?  I hope I'm not showing ignorance here, but if
you've been blind since birth, the difference between red and blue is
probably pretty meaningless.  The difference between "first example" and
"second example" is very explicit. …

Later specifications

From the HTML 4.0 Specification: W3C Working Draft 8-July-1997 specification:

… attribute assigns a class or set of classes to a specific instance of an element. Any number of elements may be assigned the same class name or names. They must be separated by white space characters.

The id and class attributes assign identifiers to an element instance. …

… A class name specified by class may be shared by several element instances. Class values should be chosen to distinguish the role of the element the class is associated with, e.g. note, example, warning. …

… Style sheets can use the class attribute to apply a style to a set of elements associated with this class, or to elements that occur as the children of such elements. …

… class can be used for further processing purposes, e.g. for identifying fields when extracting data from HTML pages into a database, translating HTML documents into other formats, etc.). …

From the HTML 4.0 Specification: W3C Recommendation 18-Dec-1997 and HTML 4.01 Specification: W3C Recommendation 24 December 1999 specifications:

… This attribute assigns a class name or set of class names to an element. Any number of elements may be assigned the same class name or names. Multiple class names must be separated by white space characters. …

… class attribute, on the other hand, assigns one or more class names to an element; the element may be said to belong to these classes. A class name may be shared by several element instances. The class attribute has several roles in HTML:

  • As a style sheet selector (when an author wishes to assign style information to a set of elements).
  • For general purpose processing by user agents. …

From the HTML5: A vocabulary and associated APIs for HTML and XHTML: W3C Working Draft 25 May 2011 specification:

Every HTML element may have a class attribute specified.

The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.

The classes that an HTML element has assigned to it consists of all the classes returned when the value of the class attribute is split on spaces. (Duplicates are ignored.)

There are no additional restrictions on the tokens authors can use in the class attribute, but authors are encouraged to use values that describe the nature of the content, rather than values that describe the desired presentation of the content.

Later HTML specifications ends up talking less and less about the class attribute. The style sheet selector role is listed before the role of being used for general purpose processing. Maybe this is some kind of lost in translation or maybe it just reflects how the class attribute has been used in practice?

Pulling it all together

The HyperText Markup language provides a means to create structured documents. The structure is created by marking up different parts of a document with HTML elements. HTML provides elements such as headings, paragraphs, lists, links, quotes and other items. These elements belong to the document domain, which is the main domain of HTML documents, although its general design and adaptations over the years have enabled it to be used to describe a number of other types of documents. New versions of HTML is also addressing a vague subject referred to as Web Applications and other issues raised in the past few years.

You can use HTML to write about a person. Then person becomes a subdomain of your HTML document. HTML is lacking elements for marking up different properties of a person such as name, date of birth, job title, etc. … The class attribute allows authors to create new pseudo elements at will, without it having any effect on applications that don’t know the new element. You can markup the name of a person with classes like this:

… <span class="person">… <span class="name">Knut K. Johansen</span> …</span> …

If the the meaning of a class has been mutually agreed upon applications can do special things with this information. Properties can be indexed, searched for, saved or cross-referenced, so that information can be reused or combined. They where discussing standardizing classes back then in 1995. They meant it would be premature to standardize classes until they could see what people would do with them. Now there are standards for the use of classes like Microformats. Despite these standards classes has ended up as being mainly a presentational descriptor. Google is the perfect proof. You can’t search for types of entities like persons, cities or cars. You can only use keywords and you as a human have to extract the semantics of the HTML documents. Maybe the history would have been different if some standards were developed back then.

Conclusion

Even though the later HTML specifications ends up talking very little about the class attribute we can see through earlier specifications and discussions that there is a deeper meaning to it. The fundamental idea behind the HTML class attribute is to semantically subclass HTML elements. These pseudo elements are useful for precisely the application you have in mind, which is to further specify the kind of data enclosed within a tag. That means to provide data about data. This data is called metadata.

Style sheets are merely one application of the class attribute. That stylesheets can take advantage of this is just a happy consequence. The HTML class attribute is meant for semantics, not for styling purposes! We can only hope that we see more and more semantics added to HTML documents in the future!

Side note

There are other ways to add semantics to HTML documents. One such way is RDFa. Lately HTML5 Microdata has emerged. I have previously written a blog post about how the use of HTML5 Microdata can change the way we write CSS. Google supports RDFa, Microformats and Microdata.

Advertisements
Tagged: , ,
Posted in: Uncategorized