Skip to content

Valid characters in attribute names in HTML/XML

This has been bugging me for a while, because I do a fair bit of HTML and XML custom parsing code, and kind of wondered what would be the valid characters for an attribute name in a HTML tag, e.g.

<a href="..." name="...">thing</a>

So, what are the valid characters in HTML (or XML) for “href” and “name”, the attribute names in an HTML tag?

I finally found this, here:

http://www.w3.org/TR/2000/REC-xml-20001006#NT-Name

In short, a HTML attribute name can be:

  • First character is a letter, the underscore “_”, or colon “:” (oddly!)
  • Additional (optional) characters can be: a letter, a digit, underscore, colon, period, dash, or a “CombiningChar” or “Extender” character, which I believe allows Unicode attributes names.

So, the following are all valid HTML attributes names:

:
_
_0:funky
:.:valid-_-tag-really:.:
_.:._

Note that the W3C suggests only using colon for namespaces, so you should use it sparingly.

The regular expression, therefore, for parsing an HTML attribute is as follows:

[a-zA-Z_:][-a-zA-Z0-9_:.]

This, obviously, leaves the CombiningChar and Extender characters out of the mix.

And, a final note: When reading the W3C’s specifications, does anyone else have difficulty finding what you’re looking for?

I swear, when I’m looking for something simple, the mountains of documents I have to wade through makes it impossible to find anything with any accuracy.

My $0.02.

7 Comments