HTML character references are short bits of HTML, commonly referred to as character entities or entity codes, that are used to display characters that have special meaning in HTML as well as characters that don’t appear on your keyboard.
- Characters with special meaning in HTML are called reserved characters. For example, left (<) and right (>) angle brackets are reserved in HTML to identify the opening and closing tags of elements.
- Characters that don’t appear on your keyboard include things like the copyright symbol (©) and the mathematical value pi (π).
If we want to use these types of characters in an HTML document and have them appear when rendered in a browser we use HTML character references.
A Practical Example
Let’s say that you want to display a block of HTML in a web page and have the element tags show up on the page. You may try to do so by simply dropping
<code> blocks around the block of HTML you want to display. However, what you will find is that even with the
<code> tags surrounding the bit of HTML in question, it will still be processed as HTML and rendered by the browser. What we can do is replace all of the special characters with the appropriate character references to prevent the browser from processing the code.
<!--The <code> blocks don't prevent this HTML from being rendered--> <code> <p>This is a list of items.</p> <ul> <li>List Item A</li> <li>List Item B</li> <li>List Item C</li> </ul> </code> <!--Replace special characters with character references--> <code> <p>This is a list of items.</p> <ul> <li>List Item A</li> <li>List Item B</li> <li>List Item C</li> </ul> </code>
Let’s see how that code renders in the browser.
This is a list of items.
- List Item A
- List Item B
- List Item C
<p>This is a list of items.</p>
<li>List Item A</li>
<li>List Item B</li>
<li>List Item C</li>
As you can see, the code blocks around the first block of code did not prevent the browser from processing the HTML. However, by replacing certain characters in the second block with HTML character references, we can display the code block as HTML markup.
Character Entity Format
In HTML, there are three different ways to format a character entity. You can use the character name, a Unicode value, or a number. For example, an ampersand may be displayed using any of the following entities:
In all three cases, the format looks basically the same. Each entity begins with an ampersand (&), followed by the character name, Unicode, or number reference, and ends with a semicolon. When a number is used, it must be preceded by the pound symbol (#), and when a Unicode value is used, it must be preceded by a pound symbol and the letter x (#x).
Most people use character names rather than Unicode values or numbers when adding named characters to HTML documents since they’re much easier to remember, but it’s equally acceptable to use either the Unicode or number references as well.
There is one special subtype of character entity code that merits special mention: diacritical marks. These are marks that appear directly over the preceding letter and include accent marks and tildes. Here are the three most common diacritics:
Support for diacritical mark character names is limited right now, and you will see more consistent results between browsers if you stick with the number codes until more browsers add support for the character names.
Most Common Character Codes
Here is a quick reference table with a few of the most commonly seen HTML character references:
|Left-pointing double angle||«|
|Right-pointing double angle||»|
Full List of Reserved Character Codes
A complete list of all HTML character references is maintained by the World Wide Web Consortium as part of the HTML specification.