What Is Metadata In HTML Documents?: Head Elements Explained
An HTML document head
element is used to provide information about a web page to web browsers and search engines, and to link external resources to the page. It does not contain content that is visible on the web page, but that doesn’t mean that it is unimportant. Proper web page head
structure is critical for search engine optimization and website accessibility.
Contents
Basic HTML Document Structure
HTML documents consist of two major parts:
- The
body
is where the visible content of the web page is placed. The content in the body is organized by a wide variety of semantic elements such theheader
,main
, andfooter
elements. - The
head
is placed before thebody
and includes information about the web page and instructions for web browsers and search engine web crawlers.
In our HTML Document Guide we explain how to structure the body
element and the other semantic tags used to identify and group the content of a web page. In this guide we will cover the head
element and the information that you need to include in the head of an HTML document to provide the best user experience and to achieve the highest search engine ranking possible.
A Word About Content Management Systems & Website Builders
The majority of modern website’s are built on a content management system (CMS) or website builder of some variety. If your website is built with one of these modern tools then the content in the page head
will generated automatically by your CMS or website builder. In that case, what can you hope to glean from this tutorial?
By better understanding what should be in the head
of your website’s HTML pages you will be able to evaluate whether or not the appropriate elements are actually there. In addition, if nothing else, you’ll be able to better understand what is going on the head
element.
If you find there are missing elements from your website’s head
, as long as you are using a mainstream CMS or website builder, in all likelihood there are dependable plugins available for you to use to add and adjust the elements in the document head
.
The Optional <head>
Element
Beginning with HTML5, the head
element is actually optional. That doesn’t mean that HTML5 documents don’t have a head
. What is means is that in HTML5 any elements that appear before the opening body
tag are automatically grouped into the document head
.
Some developers still prefer to use the head
element since it provides a logical container for all of the elements discussed in the rest of this document. However, if you prefer to simplify you HTML just a bit, feel free to drop the head
tags. Just be sure to group all of the following elements at the beginning of the document before getting into any of the web page content. Once you get into the web page content some of these tags will no longer work as intended.
The Page <title>
One element every HTML document head
should contain is a page title
. The page title
is used as the name of the web page when it appears on a search engine results page (SERP). The title
is also used as a label for the browser window or tab where the web page is loaded. Title syntax is very simple:
<title>Insert Title Here | Follow it with the Website Name if you Wish</title>
It is common for web pages other than the homepage to list the page name, followed by a separator and the name of the website. It’s important to remember to include the closing title
tag. Failing to do so may cause some web browsers to ignore the rest of the content on a web page.
Defining a <link>
to an External Resource
Another element that is used by virtually every modern website is the link
element. This element is used to create and define a relationship between a web page and an external resource or another web page. Link elements can be used to pull in many different resources:
- Stylesheets to define the visual presentation of a web page.
- An icon file, or favicon, to be used in the browser title bar or when the web page is bookmarked.
- Attribution resources such as authorship and copyright information.
- And much more.
Here are a few of the most common ways the link
element is used in modern web development.
<link>
: To Load Stylesheets
Web page presentation or styling rules written in an external CSS file must be linked to a web page with a link
element to notify the browser which stylesheets should be loaded. The following syntax is used to link
an external stylesheet to an HTML document:
<link href="../file_path/file_name.css" rel="stylesheet" title="Style Sheet Name">
Our CSS Tutorial provides a deeper look at linking external stylesheets and an overview of basic CSS styling.
<link>
: To Add a Favicon
Open a web page in your web browser and take a look at the tab at the top of the browser. Do you see a small icon next to the name of the web page? That small icon is called a favicon (short for favorite icon). If you have a shared hosting account and you haven’t replaced the default favicon you will probably see your web host’s favicon displayed next to the name of your web page.
In the past, creating a favicon was as easy as creating any 16 x 16 pixel image and converting it to favicon, .ico, format using one of the many free favicon generator websites or a graphics program like Gimp or Adobe Illustrator. However, due to the wide variety of devices which may be used to access your site today, creating a favicon is now a much more involved process.
It is generally recommended that you now create a graphic that is 196 x 196 pixels and then save it in three different sizes:
- 32 x 32 pixels in .ico format for IE9 and prior (either use on of the online favicon generators linked to previously or a graphics program like Gimp or Adobe Illustrator to save your file in .ico format).
- 180 x 180 pixels in .png format smartphone touch icons (in the event that someone pins your bookmarked site as an icon on their smartphone).
- 196 x 196 pixels in .png format for modern browsers.
With those three images uploaded to your website root folder, use the following link
syntax to associate them with your website.
<!-- For IE 9 and below ICO should be 32x32 pixels in size -->
<!--[if IE]><link rel="shortcut icon" href="path/to/favicon.ico"><![endif]-->
<!-- Touch Icons - iOS and Android 2.1+ 180x180 pixels in size. -->
<link rel="apple-touch-icon-precomposed" href="path/to/apple-touch-icon-precomposed.png">
<!-- Firefox, Chrome, Safari, IE 11+, Edge, and Opera. 196x196 pixels in size. -->
<link rel="icon" href="path/to/favicon.png">
For the full details on why this exact set of link
elements is recommended read the full explanation at StackOverflow.
Canonical & Base URLs
Another task that can be accomplished in the document head
is to tell web crawlers and browsers something about the relationship of the current web page to the overall website structure.
- When a web page can be reached using multiple URLs, the
link
element can be used to tell web crawlers which version of the page is the canonical or preferred URL formulation and the one that should be indexed by search engines. - When a web page is reached using an unexpected URL structure, the
base
element can be used to tell the web browser the URL to use as the basis for any relative links on the page.
Identifying a Canonical URL
If you’re using a modern content management system for your website, depending on the CMS it’s possible that certain pages might be accessible using more than one URL structure. For example, if you have a post categorized in multiple ways, you might be able to access the page using multiple URLs such as:
- Post Position in Category 1: http://example.com/category-1/post/
- Post Position in Category 2: http://example.com/category-2/post/
- Post Permalink: http://example.com/post/
If all three versions can be found by a web crawler, all three versions will be indexed. However, since the content indexed at all three pages is identical, web crawlers may flag the pages as being spam and penalize their position when they appear on a SERP.
Another common problem is for your site to be accessible at all of the following URLs:
- http://www.example.com/
- http://www.example.com/index.html
- http://example.com/
- http://example.com/index.html
Believe it or not, these are actually four different website’s from a web crawler’s perspective. So picking the one that you want to be canonical and tell web crawlers which you’ve picked is pretty important.
The answer is to use the link
element to tell web crawlers which version of a page to index. Using our first example above, you would probably want to use the permalink as the canonical URL. To tell web crawlers to prefer this version of the page, the following HTML would be added to the head element for every other instance of the page:
<link rel="canonical" href="http://example.com/post/" />
To learn more about how search engines interpret canonical ULRs read up on the topic at Google Seach Console Help.
If your website is powered by a CMS, and it’s a CMS that creates duplicate URLs (some do, some don’t), it’s important that you find a plugin or extensions that can fix the issue by adding the correct rel="canonical"
rules dynamically or by eliminating duplicate content.
Establishing a Base URL
If you use relative URLs to link between pages of your website, yet have a website structure that allows for duplicate content, you can use the base
element to tell web browsers the URL to use as the base for relative URLs. For example, let’s say that you have a page that exists at two locations:
- Product Category Location: http://example.com/product-category/product/
- Product Permalink: http://example.com/product/
Now let’s say that you use relative URLs to link to other product pages like this: <a href="../related-product/">Check out this other product!</a>
. Depending on which version of the page the viewer is on, that link may point in two different directions:
- Link from Product Category Location: http://example.com/product-category/related-product/
- Link from Product Permalink: http://example.com/related-product/
If the related product is in a different product category the first link won’t work. How do you resolve this? Simply add a base
URL to your page head using this syntax:
<base href="http://example.com">
When you use a relative URL on a page with that base
element in the page head
the base URL will be used rather than the current page URL. Now you can build relative URLs using that URL as your base regardless of the page where the link happens to be found.
Alternate URLs
If you have a webpage that is available in part, or in whole, in multiple languages the link
element combined with three additional attributes can be used to tell web crawlers about the alternate versions of the web page.
For example, let’s say that the main version of your website is in English, but that you also maintain versions of you website in spanish, french, and german. You could use the following code to tell web crawlers about the relationship between these different web pages:
<link rel="alternate" hreflang="x-default" href="http://www.example.com/">
<link rel="alternate" hreflang="es" href="http://es.example.com/">
<link rel="alternate" hreflang="fr" href="http://fr.example.com/">
<link rel="alterante" hreflang="de" href="http://de.example.com/">
For the sake of our example we’ve assumed that the website would be structured so that each translated page exists as a subdomain of the primary domain, with the language code use to create the subdomain. However, any other structure could be used. For example, you could just as easily use http://www.example.com/espanol/ rather than http://es.example.com/.
The language codes used to identify the language in the hreflang
attribute must be presented in ISO 639-1 format. You can also use this same syntax to create localized versions of a web page in the same language by adding a region code in ISO 31661 Alpha 2 format after the language code, and seperating the region and language codes with a hyphen.
When combining region and langauge codes, be sure to always list the language code first and then the country code, with the two codes seperated by a hyphen.
For example, let’s say that we wanted to create multiple translations for visitors from the same country, such as spanish and english translations of the same page for visitors from the United States, as well as a third page targeting english speakers from the UK. We could do this with the following syntax:
<!--The first links identifies the default domain-->
<link rel="alternate" hreflang="x-default" href="http://example.com/">
<!--This link identifies a spanish version for visitors in the USA-->
<link rel="alternate" hreflang="es-us" href="http://www.example.com/es/">
<!--This link identifies an english version for visitors from the UK-->
<link rel="alternate" hreflang="en-gb" href="http://www.example.co.uk/">
If your web page exists in multiple languages, it’s important to use this technique to tell web browsers and web crawlers about the alternate versions of the website. Failing to do so can have a negative impact on your search engine ranking as web crawlers find the various versions but penalize your site for containing duplicate content.
Give Credit Where Credit is Due
The link
element can also be used to give information to a web browser about the author and licensing that applies to the content of the web page. The rel
attribute and the values author
and license
are used to embed this information in the document head
.
Here’s an example of how we could identify a page author and provide copyright information using the link
element.
<link rel="author" href="https://plus.google.com/+ExampleProfile">
<link rel="license" href="https://creativecommons.org/licenses/by/4.0/">
What the first link
element does is tell web crawlers that more information about the author is available at href="http://www.example.com"
. This link may point to the author’s website, a social media profile, or an email address (though that practice is not recommended). Up until the end of August 2014, it was common for the author’s Google+ profile to be linked to using this method. That was because of something called Google Authorship that has since been discontinued.
The second element tells web crawlers that the page at href="https://creativecommons.org/licenses/by/4.0/"
contains the copyright terms that apply to this page. This can come into play since some search engines offer advanced search capability that filters results based on the usage rights applied to the page. While this link
could be used to point to a copyright page at your own website, in order for search engines to interpret licensing correctly it’s important to link to a license that search engines will recognize such as one of the Creative Commons licenses.
Adding Scripting
Scripting languages, or which JavaScript is the unquestioned market leader, are typically loaded in the page head
element using the script
tag. JavaScript can be used to do all sorts of things including:
- Enabling Google Analytics or other visitor tracking applications.
- Conditionally adding the HTML5 Shiv for website visitors using older browsers.
- Loading JavaScript libraries such as jQuery.
- And much, much more.
The basic syntax for the script
tag looks like this:
<script src="../path/to/javascript_file.js">
//The src attribute above identifies an JavaScipt file.
//However, JavaScript can also be added between the opening
//and closing <script> tags.
</script>
Our JavaScript Tutorial provides a detailed look at embedding JavaScript, as well as common tasks accomplished with this popular programming language.
<meta>
Data
Metadata is data about data. It is information that describes some other information in a meaninful way. The meta
tag is used in an HTML document to provide high level metadata about the web page: information that describes the web page in a meaningful way that can be understood by web crawlers and browsers.
What sort of metadata can we provide to web crawlers and browsers with the meta
tag?
- A page description and the keywords that describe the subject of the page.
- Page authorship information.
- Instructions for specific browser actions.
- Details about the page title, description, and author to be used when the page is posted on social media or shown in SERPs.
- And much, much more.
In this section we’re going to cover a few of the most common actions handled by the meta
tag in the head
of an HTML document.
Establishing the charset
The charset
, short for character set, is the character encoding used on the web page. In nearly all cases, you’ll be writing in UTF-8, and if you aren’t, you probably already know that.
It is important to declare the charset
as early as possible in an HTML document because browsers will stop looking for the character encoding after 512 bytes and guess at which encoding should apply. This can create some security risks, so play it safe and declare the character encoding as one of the first elements within an HTML document head
.
Here’s the syntax for declaring UTF-8 as the character encoding:
<meta charset="utf-8">
Assigning a Description, Keywords, and Authorship
The meta
element combined with the name
attribute can be used to create document-level metadata and assign a name to that data. The resulting data may be used by search engines as a clue to determine the contents of the page. Here’s the syntax used to create a description, assign keywords, and provide authorship information.
<meta name="description" content="A short description of the website or of the organization behind the website can be added here and will be used by some search engines and web browsers in a variety of different ways.">
<meta name="keywords" content="insert, relevant, keywords, separated, by, commas">
<meta name="author" content="Author Name">
In the past, the meta-level description and keywords were a critical component of SEO. While this isn’t really true anymore – search engines have gotten a lot smarter in the last several years – it’s still important to provide a good description and relevant keywords. They may show up in a variety of places depending on the web browser you are using and the search engines that index your website.
Giving Instructions to the Browser
The http-equiv
attribute can be applied to the meta
tag to force a web browser to refresh or redirect automatically after a few seconds.
Lots of websites today are built on user-generated content that is continuously updated. For example, every social media website you visit displays a site populated with content submitted by your social media connections. All of this content is continuously updated and new content is ready to be displayed on a regular basis. In the past, it was common to use http-equiv
to force the web page to refresh on a periodic basis to pull down updated content. However, today that sort of manipulation of a visitor’s browser is not recommended.
Another way http-equiv
is sometimes used is to redirect users from an outdated or obsolete web page to a new web page. However, this too is now frowned upon and webmasters or developers should use a 301 server redirect for permanent redirects or a 302 server redirect for temporary redirects.
If you do find that you have a legitmate use-case for the ability to force the browser to redirect or refresh you can use the following syntax to make it happen:
<meta http-equiv="refresh" content="5; url=http://www.example.com">
What this code will do is tell the browser to redirect the website visitor to www.example.com after five seconds. If the URL is not provided, then the current page is refreshed after the time interval indicated in the content
attribute (five seconds in this case).
When you share a website on social media one of two things will happen:
- The social media site will scan the site and find some relevant
meta
tags telling it what to display. - The social media site will scan the site but won’t find the tags it’s looking for and will either display very little information or an auto-generated title, photo, and text snippet.
As a website designer, developer, or owner the value of being able to tell social media sites exactly what you want them to say about your website should be obvious.
Open Graph is a protocol used to define how your website should appear when posted on social networks. It’s pretty straightforward, and if we’re talking about a website there are only four items you really must specify: the title, URL, description, and featured image.
Here’s the syntax for defining each of these elements.
<meta property="og:title" content="Title of blog post or page">
<meta property="og:url" content="http://www.example.com">
<meta property="og:description" content="A good description around 300 characters long.">
<meta property="og:image" content="http://www.example.com/image-name.jpg">
There’s a lot more that can be done with Open Graph, and you can learn more about the additional fields you can use to further define how your site appears on social media sites that use Open Graph by visiting the Open Graph website.
Twitter uses it’s own syntax, but will use the Open Graph content if it can’t find meta
tags with the information it’s looking for. Since Twitter does have a shorter character limit, it’s a good idea to use the Twitter tags separately to ensure your page looks as good as possible when shared on Twitter.
The basic fields you’ll need to include are very similar to those required for Open Graph with one noteworthy difference.
<meta name="twitter:title" content="Blog post or page title">
<meta name="twitter:url" content="http://www.example.com">
<meta name="twitter:site" content="@Twitter_username" />
<meta name="twitter:description" content="A good description limited to 200 characters.">
<meta name="twitter:image" content="http://www.example.com/image-name.jpg">
When adding meta
information for Twitter to use when a web page is tweeted, be sure to include the name="twitter:site"
and the twitter username which should be given attribution for the tweet. Once again, there’s a lot more than can be done with Twitter cards and the Twitter Developers Documentation can help you craft the perfect Twitter Card.
It Isn’t a Complicated as it Seems
If you’re new to the idea of using an HTML head
element to talk to web browsers and search engines, you may be feeling a bit overwhelmed right now. The challenge of figuring out how to add all of these tags to every page of your site is undoubtedly daunting. However, there’s good news! In all likelihood, if you’ve built your site with a CMS or website builder, your site probably already has many of these elements. And if it doesn’t, there are plugins that can help you get these tags added to your web page.
Since most of the tags we’ve covered have SEO implications, good SEO plugins for leading CMSs will take care of the majority of the tags we’ve identified in this tutorial. Start by researching the best SEO plugins based on your content management system or website platform. Once you have an SEO plugin installed, activated, and properly configured take a look at your document head
, identify additional problems, and look for additional plugins or extensions that specifically identify what you see going on in your document head
element.