What Is Metadata In HTML Documents?: Head Elements Explained
An HTML document
head element is used to provide information about a web page to web browsers and search engines, and to link external resources to the page. It does not contain content that is visible on the web page, but that doesn’t mean that it is unimportant. Proper web page
head structure is critical for search engine optimization and website accessibility.
- 1 Basic HTML Document Structure
- 2 A Word About Content Management Systems & Website Builders
- 3 The Optional <head> Element
- 4 The Page <title>
- 5 Defining a <link> to an External Resource
- 6 Adding Scripting
- 7 <meta> Data
- 8 It Isn’t a Complicated as it Seems
Basic HTML Document Structure
HTML documents consist of two major parts:
bodyis where the visible content of the web page is placed. The content in the body is organized by a wide variety of semantic elements such the
headis placed before the
bodyand includes information about the web page and instructions for web browsers and search engine web crawlers.
In our HTML Document Guide we explain how to structure the
body element and the other semantic tags used to identify and group the content of a web page. In this guide we will cover the
head element and the information that you need to include in the head of an HTML document to provide the best user experience and to achieve the highest search engine ranking possible.
A Word About Content Management Systems & Website Builders
The majority of modern website’s are built on a content management system (CMS) or website builder of some variety. If your website is built with one of these modern tools then the content in the page
head will generated automatically by your CMS or website builder. In that case, what can you hope to glean from this tutorial?
By better understanding what should be in the
head of your website’s HTML pages you will be able to evaluate whether or not the appropriate elements are actually there. In addition, if nothing else, you’ll be able to better understand what is going on the
If you find there are missing elements from your website’s
head, as long as you are using a mainstream CMS or website builder, in all likelihood there are dependable plugins available for you to use to add and adjust the elements in the document
Beginning with HTML5, the
head element is actually optional. That doesn’t mean that HTML5 documents don’t have a
head. What is means is that in HTML5 any elements that appear before the opening
body tag are automatically grouped into the document
Some developers still prefer to use the
head element since it provides a logical container for all of the elements discussed in the rest of this document. However, if you prefer to simplify you HTML just a bit, feel free to drop the
head tags. Just be sure to group all of the following elements at the beginning of the document before getting into any of the web page content. Once you get into the web page content some of these tags will no longer work as intended.
One element every HTML document
head should contain is a page
title . The page
title is used as the name of the web page when it appears on a search engine results page (SERP). The
title is also used as a label for the browser window or tab where the web page is loaded. Title syntax is very simple:
<title>Insert Title Here | Follow it with the Website Name if you Wish</title>
It is common for web pages other than the homepage to list the page name, followed by a separator and the name of the website. It’s important to remember to include the closing
title tag. Failing to do so may cause some web browsers to ignore the rest of the content on a web page.
<link> to an External Resource
Another element that is used by virtually every modern website is the
link element. This element is used to create and define a relationship between a web page and an external resource or another web page. Link elements can be used to pull in many different resources:
- Stylesheets to define the visual presentation of a web page.
- An icon file, or favicon, to be used in the browser title bar or when the web page is bookmarked.
- Attribution resources such as authorship and copyright information.
- And much more.
Here are a few of the most common ways the
link element is used in modern web development.
<link>: To Load Stylesheets
Web page presentation or styling rules written in an external CSS file must be linked to a web page with a
link element to notify the browser which stylesheets should be loaded. The following syntax is used to
link an external stylesheet to an HTML document:
<link href="../file_path/file_name.css" rel="stylesheet" title="Style Sheet Name">
Our CSS Tutorial provides a deeper look at linking external stylesheets and an overview of basic CSS styling.
<link>: To Add a Favicon
Open a web page in your web browser and take a look at the tab at the top of the browser. Do you see a small icon next to the name of the web page? That small icon is called a favicon (short for favorite icon). If you have a shared hosting account and you haven’t replaced the default favicon you will probably see your web host’s favicon displayed next to the name of your web page.
In the past, creating a favicon was as easy as creating any 16 x 16 pixel image and converting it to favicon, .ico, format using one of the many free favicon generator websites or a graphics program like Gimp or Adobe Illustrator. However, due to the wide variety of devices which may be used to access your site today, creating a favicon is now a much more involved process.
It is generally recommended that you now create a graphic that is 196 x 196 pixels and then save it in three different sizes:
- 32 x 32 pixels in .ico format for IE9 and prior (either use on of the online favicon generators linked to previously or a graphics program like Gimp or Adobe Illustrator to save your file in .ico format).
- 180 x 180 pixels in .png format smartphone touch icons (in the event that someone pins your bookmarked site as an icon on their smartphone).
- 196 x 196 pixels in .png format for modern browsers.
With those three images uploaded to your website root folder, use the following
link syntax to associate them with your website.
<!-- For IE 9 and below ICO should be 32x32 pixels in size --> <!--[if IE]><link rel="shortcut icon" href="path/to/favicon.ico"><![endif]--> <!-- Touch Icons - iOS and Android 2.1+ 180x180 pixels in size. --> <link rel="apple-touch-icon-precomposed" href="path/to/apple-touch-icon-precomposed.png"> <!-- Firefox, Chrome, Safari, IE 11+, Edge, and Opera. 196x196 pixels in size. --> <link rel="icon" href="path/to/favicon.png">
For the full details on why this exact set of
link elements is recommended read the full explanation at StackOverflow.
Canonical & Base URLs
Another task that can be accomplished in the document
head is to tell web crawlers and browsers something about the relationship of the current web page to the overall website structure.
- When a web page can be reached using multiple URLs, the
linkelement can be used to tell web crawlers which version of the page is the canonical or preferred URL formulation and the one that should be indexed by search engines.
- When a web page is reached using an unexpected URL structure, the
baseelement can be used to tell the web browser the URL to use as the basis for any relative links on the page.
Identifying a Canonical URL
If you’re using a modern content management system for your website, depending on the CMS it’s possible that certain pages might be accessible using more than one URL structure. For example, if you have a post categorized in multiple ways, you might be able to access the page using multiple URLs such as:
- Post Position in Category 1: http://example.com/category-1/post/
- Post Position in Category 2: http://example.com/category-2/post/
- Post Permalink: http://example.com/post/
If all three versions can be found by a web crawler, all three versions will be indexed. However, since the content indexed at all three pages is identical, web crawlers may flag the pages as being spam and penalize their position when they appear on a SERP.
Another common problem is for your site to be accessible at all of the following URLs:
Believe it or not, these are actually four different website’s from a web crawler’s perspective. So picking the one that you want to be canonical and tell web crawlers which you’ve picked is pretty important.
The answer is to use the
link element to tell web crawlers which version of a page to index. Using our first example above, you would probably want to use the permalink as the canonical URL. To tell web crawlers to prefer this version of the page, the following HTML would be added to the head element for every other instance of the page:
<link rel="canonical" href="http://example.com/post/" />
To learn more about how search engines interpret canonical ULRs read up on the topic at Google Seach Console Help.
If your website is powered by a CMS, and it’s a CMS that creates duplicate URLs (some do, some don’t), it’s important that you find a plugin or extensions that can fix the issue by adding the correct
rel="canonical" rules dynamically or by eliminating duplicate content.
Establishing a Base URL
If you use relative URLs to link between pages of your website, yet have a website structure that allows for duplicate content, you can use the
base element to tell web browsers the URL to use as the base for relative URLs. For example, let’s say that you have a page that exists at two locations:
- Product Category Location: http://example.com/product-category/product/
- Product Permalink: http://example.com/product/
Now let’s say that you use relative URLs to link to other product pages like this:
<a href="../related-product/">Check out this other product!</a>. Depending on which version of the page the viewer is on, that link may point in two different directions:
- Link from Product Category Location: http://example.com/product-category/related-product/
- Link from Product Permalink: http://example.com/related-product/
If the related product is in a different product category the first link won’t work. How do you resolve this? Simply add a
base URL to your page head using this syntax:
When you use a relative URL on a page with that
base element in the page
head the base URL will be used rather than the current page URL. Now you can build relative URLs using that URL as your base regardless of the page where the link happens to be found.
If you have a webpage that is available in part, or in whole, in multiple languages the
link element combined with three additional attributes can be used to tell web crawlers about the alternate versions of the web page.
For example, let’s say that the main version of your website is in English, but that you also maintain versions of you website in spanish, french, and german. You could use the following code to tell web crawlers about the relationship between these different web pages:
<link rel="alternate" hreflang="x-default" href="http://www.example.com/"> <link rel="alternate" hreflang="es" href="http://es.example.com/"> <link rel="alternate" hreflang="fr" href="http://fr.example.com/"> <link rel="alterante" hreflang="de" href="http://de.example.com/">
For the sake of our example we’ve assumed that the website would be structured so that each translated page exists as a subdomain of the primary domain, with the language code use to create the subdomain. However, any other structure could be used. For example, you could just as easily use http://www.example.com/espanol/ rather than http://es.example.com/.
The language codes used to identify the language in the
hreflang attribute must be presented in ISO 639-1 format. You can also use this same syntax to create localized versions of a web page in the same language by adding a region code in ISO 31661 Alpha 2 format after the language code, and seperating the region and language codes with a hyphen.
When combining region and langauge codes, be sure to always list the language code first and then the country code, with the two codes seperated by a hyphen.
For example, let’s say that we wanted to create multiple translations for visitors from the same country, such as spanish and english translations of the same page for visitors from the United States, as well as a third page targeting english speakers from the UK. We could do this with the following syntax:
<!--The first links identifies the default domain--> <link rel="alternate" hreflang="x-default" href="http://example.com/"> <!--This link identifies a spanish version for visitors in the USA--> <link rel="alternate" hreflang="es-us" href="http://www.example.com/es/"> <!--This link identifies an english version for visitors from the UK--> <link rel="alternate" hreflang="en-gb" href="http://www.example.co.uk/">
If your web page exists in multiple languages, it’s important to use this technique to tell web browsers and web crawlers about the alternate versions of the website. Failing to do so can have a negative impact on your search engine ranking as web crawlers find the various versions but penalize your site for containing duplicate content.
Give Credit Where Credit is Due
link element can also be used to give information to a web browser about the author and licensing that applies to the content of the web page. The
rel attribute and the values
license are used to embed this information in the document
Here’s an example of how we could identify a page author and provide copyright information using the
<link rel="author" href="https://plus.google.com/+ExampleProfile"> <link rel="license" href="https://creativecommons.org/licenses/by/4.0/">
What the first
link element does is tell web crawlers that more information about the author is available at
href="http://www.example.com". This link may point to the author’s website, a social media profile, or an email address (though that practice is not recommended). Up until the end of August 2014, it was common for the author’s Google+ profile to be linked to using this method. That was because of something called Google Authorship that has since been discontinued.
The second element tells web crawlers that the page at
href="https://creativecommons.org/licenses/by/4.0/" contains the copyright terms that apply to this page. This can come into play since some search engines offer advanced search capability that filters results based on the usage rights applied to the page. While this
link could be used to point to a copyright page at your own website, in order for search engines to interpret licensing correctly it’s important to link to a license that search engines will recognize such as one of the Creative Commons licenses.
head element using the
- Enabling Google Analytics or other visitor tracking applications.
- Conditionally adding the HTML5 Shiv for website visitors using older browsers.
- And much, much more.
The basic syntax for the
script tag looks like this:
Metadata is data about data. It is information that describes some other information in a meaninful way. The
meta tag is used in an HTML document to provide high level metadata about the web page: information that describes the web page in a meaningful way that can be understood by web crawlers and browsers.
What sort of metadata can we provide to web crawlers and browsers with the
- A page description and the keywords that describe the subject of the page.
- Page authorship information.
- Instructions for specific browser actions.
- Details about the page title, description, and author to be used when the page is posted on social media or shown in SERPs.
- And much, much more.
In this section we’re going to cover a few of the most common actions handled by the
meta tag in the
head of an HTML document.
charset, short for character set, is the character encoding used on the web page. In nearly all cases, you’ll be writing in UTF-8, and if you aren’t, you probably already know that.
It is important to declare the
charset as early as possible in an HTML document because browsers will stop looking for the character encoding after 512 bytes and guess at which encoding should apply. This can create some security risks, so play it safe and declare the character encoding as one of the first elements within an HTML document
Here’s the syntax for declaring UTF-8 as the character encoding:
Assigning a Description, Keywords, and Authorship
meta element combined with the
name attribute can be used to create document-level metadata and assign a name to that data. The resulting data may be used by search engines as a clue to determine the contents of the page. Here’s the syntax used to create a description, assign keywords, and provide authorship information.
<meta name="description" content="A short description of the website or of the organization behind the website can be added here and will be used by some search engines and web browsers in a variety of different ways."> <meta name="keywords" content="insert, relevant, keywords, separated, by, commas"> <meta name="author" content="Author Name">
In the past, the meta-level description and keywords were a critical component of SEO. While this isn’t really true anymore – search engines have gotten a lot smarter in the last several years – it’s still important to provide a good description and relevant keywords. They may show up in a variety of places depending on the web browser you are using and the search engines that index your website.
Giving Instructions to the Browser
http-equiv attribute can be applied to the
meta tag to force a web browser to refresh or redirect automatically after a few seconds.
Lots of websites today are built on user-generated content that is continuously updated. For example, every social media website you visit displays a site populated with content submitted by your social media connections. All of this content is continuously updated and new content is ready to be displayed on a regular basis. In the past, it was common to use
http-equiv to force the web page to refresh on a periodic basis to pull down updated content. However, today that sort of manipulation of a visitor’s browser is not recommended.
http-equiv is sometimes used is to redirect users from an outdated or obsolete web page to a new web page. However, this too is now frowned upon and webmasters or developers should use a 301 server redirect for permanent redirects or a 302 server redirect for temporary redirects.
If you do find that you have a legitmate use-case for the ability to force the browser to redirect or refresh you can use the following syntax to make it happen:
<meta http-equiv="refresh" content="5; url=http://www.example.com">
What this code will do is tell the browser to redirect the website visitor to www.example.com after five seconds. If the URL is not provided, then the current page is refreshed after the time interval indicated in the
content attribute (five seconds in this case).
When you share a website on social media one of two things will happen:
- The social media site will scan the site and find some relevant
metatags telling it what to display.
- The social media site will scan the site but won’t find the tags it’s looking for and will either display very little information or an auto-generated title, photo, and text snippet.
As a website designer, developer, or owner the value of being able to tell social media sites exactly what you want them to say about your website should be obvious.
Open Graph is a protocol used to define how your website should appear when posted on social networks. It’s pretty straightforward, and if we’re talking about a website there are only four items you really must specify: the title, URL, description, and featured image.
Here’s the syntax for defining each of these elements.
<meta property="og:title" content="Title of blog post or page"> <meta property="og:url" content="http://www.example.com"> <meta property="og:description" content="A good description around 300 characters long."> <meta property="og:image" content="http://www.example.com/image-name.jpg">
There’s a lot more that can be done with Open Graph, and you can learn more about the additional fields you can use to further define how your site appears on social media sites that use Open Graph by visiting the Open Graph website.
Twitter uses it’s own syntax, but will use the Open Graph content if it can’t find
meta tags with the information it’s looking for. Since Twitter does have a shorter character limit, it’s a good idea to use the Twitter tags separately to ensure your page looks as good as possible when shared on Twitter.
The basic fields you’ll need to include are very similar to those required for Open Graph with one noteworthy difference.
<meta name="twitter:title" content="Blog post or page title"> <meta name="twitter:url" content="http://www.example.com"> <meta name="twitter:site" content="@Twitter_username" /> <meta name="twitter:description" content="A good description limited to 200 characters."> <meta name="twitter:image" content="http://www.example.com/image-name.jpg">
meta information for Twitter to use when a web page is tweeted, be sure to include the
name="twitter:site" and the twitter username which should be given attribution for the tweet. Once again, there’s a lot more than can be done with Twitter cards and the Twitter Developers Documentation can help you craft the perfect Twitter Card.
It Isn’t a Complicated as it Seems
If you’re new to the idea of using an HTML
head element to talk to web browsers and search engines, you may be feeling a bit overwhelmed right now. The challenge of figuring out how to add all of these tags to every page of your site is undoubtedly daunting. However, there’s good news! In all likelihood, if you’ve built your site with a CMS or website builder, your site probably already has many of these elements. And if it doesn’t, there are plugins that can help you get these tags added to your web page.
Since most of the tags we’ve covered have SEO implications, good SEO plugins for leading CMSs will take care of the majority of the tags we’ve identified in this tutorial. Start by researching the best SEO plugins based on your content management system or website platform. Once you have an SEO plugin installed, activated, and properly configured take a look at your document
head, identify additional problems, and look for additional plugins or extensions that specifically identify what you see going on in your document