Valid characters in attribute names in HTML/XML

This has been bugging me for a while, because I do a fair bit of HTML and XML custom parsing code, and kind of wondered what would be the valid characters for an attribute name in a HTML tag, e.g.

<a href="..." name="...">thing</a>

So, what are the valid characters in HTML (or XML) for “href” and “name”, the attribute names in an HTML tag?

I finally found this, here:

In short, a HTML attribute name can be:

  • First character is a letter, the underscore “_”, or colon “:” (oddly!)
  • Additional (optional) characters can be: a letter, a digit, underscore, colon, period, dash, or a “CombiningChar” or “Extender” character, which I believe allows Unicode attributes names.

So, the following are all valid HTML attributes names:


Note that the W3C suggests only using colon for namespaces, so you should use it sparingly.

The regular expression, therefore, for parsing an HTML attribute is as follows:


This, obviously, leaves the CombiningChar and Extender characters out of the mix.

