An Introduction to Schema.org Markup for Emails
Search Trends: Are Compound Queries the Start of the Shift to Data-Driven Search?
Posted by Tom-Anthony
The Web is an ever-diminishing aspect of our online lives. We increasingly use apps, wearables, smart assistants (Google Now, Siri, Cortana), smart watches, and smart TVs for searches, and none of these are returning 10 blue links. In fact, we usually don’t end up on a website at all.
Apps are the natural successor, and an increasing amount of time spent optimising search is going to be spent focusing on apps. However, whilst app search is going to be very important, I don’t think it is where the trend stops.
This post is about where I think the trends take us—towards what I am calling “Data-Driven Search”. Along the way I am going to highlight another phenomenon: “Compound Queries”. I believe these changes will dramatically alter the way search and SEO work over the next 1-3 years, and it is important we begin now to think about how that future could look.
App indexing is just the beginning
With App Indexing Google is moving beyond the bounds of the web-search paradigm which made them famous. On Android, we are now seeing blue links which are not to web pages but are deep links to open specific pages within apps:
This is interesting in and of itself, but it is also part of a larger pattern which began with things like the answer box and knowledge graph. With these, we saw that Google was shifting away from sending you somewhere else but was starting to provide the answer you were looking for right there in the SERPs. App Indexing is the next step, which moves Google from simply providing answers to enabling actions—allow you to
do things.
App Indexing is going to be around for a while—but here I want to focus on this trend towards providing answers and enabling actions.
Notable technology trends
Google’s mission is to build the “ultimate assistant”—something that anticipates your needs and facilitates fulfilling them. Google Now is just the beginning of what they are dreaming of.
So many of the projects and technologies that Google, and their competitors, are working on are converging with the trend towards “answers and actions”, and I think this is going to lead to a really interesting evolution in searches—namely what I am calling “Data-Driven Search”.
Let’s look at some of the contributing technologies.
Compound queries: query revisions & chained queries
There is a lot of talk about conversational search at the moment, and it is fascinating for many reasons, but in this instance I am mostly interested in two specific facets:
- Query revision
- Chained queries
The current model for multiple queries looks like this:
You do one query (e.g. “recipe books”) and then, after looking at the results of that search, you have a better sense of exactly what it is you are looking for and so you refine your query and run another search (e.g. “vegetarian recipe books”). Notice that you do two distinct searches—with the second one mostly completely separate from the first.
Conversational search is moving us towards a new model which looks more like this, which I’m calling the Compound Query model:
In this instance, after evaluating the results I got, I don’t make a new query but instead a Query Revision which relates back to that initial query. After searching “recipe books”, I might follow up with “just show me the vegetarian ones”. You can already do this with conversational search:
Example of a “Query Revision”—one type of Compound Query
Currently, we only see this intent revision model working in conversational search, but I expect we will see it migrate into desktop search as well. There will be a new generation of searchers who won’t have been “trained” to search in the unnatural and stilted keyword-oriented that we have. They’ll be used to conversational search on their phones and will apply the same patterns on desktop machines. I suspect we’ll also see other changes to desktop-based search which will merge in other aspects of how conversational search results are presented. There are also other companies working on radical new interfaces, such as Scinet by Etsimo (their interface is quite radical, but the problems it solves and addresses are ones Google will likely also be working on).
So many SEO paradigms don’t begin to apply in this scenario; things like keyword research and rankings are not compatible with a query model that has multiple phases.
This new query model has a second application, namely Chained Queries, where you perform an initial query, and then on receiving a response you perform a second query on the same topic (the classic example is “How tall is Justin Bieber?” followed by “How old is he?”—the second query is dependent upon the first):
Example of a Chained Query—the second type of Compound Query
It might be that in the case of chained queries, the latter queries could be converted to be standalone queries, such that they don’t muddy the SEO waters quite as much as as queries that have revisions. However, I’m not sure that this necessarily stands true, because every query in a chain adds context that makes it much easier for Google to accurately determine your intent in later queries.
If you are not convinced, consider that in the example above, as is often the case in examples (such as the Justin Bieber example), it is usually clear from the formulation that this is explicitly a chained query. However—there are chained queries where it is not necessarily clear that the current query is chained to the previous. To illustrate this, I’ve borrowed an example which Behshad Behzadi, Director of Conversational Search at Google, showed at SMX Munich last month:
Example of a “hidden” Chained Query—it is not explicit that the last search refers to the previous one.
If you didn’t see the first search for “pictures of mario” before the second and third examples, it might not be immediately obvious that the second “pictures of mario” query has taken into account the previous search. There are bound to be far more subtle examples than this.
New interfaces
The days of all Google searches coming solely via a desktop-based web browser are already long since dead, but mobile users using voice search are just the start of the change—there is an ongoing divergence of interfaces. I’m focusing here on the output interfaces—i.e., how we consume the results from a search on a specific device.
The primary device category that springs to mind is that of wearables and smart watches, which have a variety of ways in which they communicate with their users:
- Compact screens—devices like the Apple Watch and Microsoft Band have compact form factor screens, which allow for visual results, but not in the same format as days gone by—a list of web links won’t be helpful.
- Audio—with Siri, Google Now, and Cortana all becoming available via wearable interfaces (that pair to smart phones) users can also consume results as voice.
- Vibrations—the Apple Watch can give users directions using vibrations to signal left and right turns without needing to look or listen to the device. Getting directions already covers a number of searches, but you could imagine this also being useful for various yes/no queries (e.g. “is my train on time?”).
Each of these methods is incompatible with the old “title & snippet” method that made up the 10 blue links, but furthermore they are also all different from one another.
What is clear is that there is going to need to be an increase in the forms in which search engines can respond to an identical query, with responses being adaptive to the way in which the user will consume their result.
We will also see queries where the query may be “handed off” to another device: imagine me doing a search for a location on my phone and then using my watch to give me direction. Apple already has “Handover”which does this in various contexts, and I expect we’ll see the concept taken further.
This is related to Google increasingly providing us with encapsulated answers, rather than links to websites—especially true on wearables and smart devices. The interesting phenomenon here is that these answers don’t specify a specific layout, like a webpage does. The data and the layout are separated.
Which leads us to…
Cards
Made popular by Google Now, cards are prevalent in both iOS and Android, as well as on social platforms. They are a growing facet of the mobile experience:
Cards provide small units of information in an accessible chunk, often with a link to dig deeper by flipping a card over or by linking through to an app.
Cards exactly fit into the paradigm above—they are more concerned with the data you will see and less so about the way in which you will see it. The same cards look different in different places.
Furthermore, we are entering a point where you can now do more and more from a card, rather than it leading you into an app to do more. You can response to messages, reply to tweets, like and re-share, and all sorts of things all from cards, without opening an app; I highly recommend this blog post which explores this phenomenon.
It seems likely we’ll see Google Now (and mobile search as it becomes more like Google Now) allowing you to do more and more right from cards themselves—many of these things will be actions facilitated by other parties (by way of APIs of schema.org actions). In this way Google will become a “junction box” sitting between us and third parties who provide services; they’ll find an API/service provider and return us a snippet of data showing us options and then enable us to pass back data representing our response to the relevant API.
Shared screens
The next piece of the puzzle is “shared screens”, which covers several things. This starts with Google Chromecast, which has popularised the ability to “throw” things from one screen to another. At home, any guests I have over who join my wifi are able to “throw” a YouTube video from their mobile phone to my TV via the Chromecast. The same is true for people in the meeting rooms at Distilled offices and in a variety of other public spaces.
I can natively throw a variety of things: photos, YouTube videos, movies on Netflix etc., etc. How long until that includes searches? How long until I can throw the results of a search on an iPad on to the TV to show my wife the holiday options I’m looking at? Sure we can do that by sharing the whole screen now, but how long until, like photos of YouTube videos, the search results I throw to the TV take on a new layout that is suitable for that larger screen?
You can immediately see how this links back to the concept of cards and interfaces outlined above; I’m moving data from screen to screen, and between devices that provide different interfaces.
These concepts are all very related to the concept of “fluid mobility” that Microsoft recently presented in their Productivity Future Vision released in February this year.
An evolution of this is if we reach the point that some people have envisioned, whereby many offices workers, who don’t require huge computational power, no longer have computers at their desks. Instead their desks just house dumb terminals: a display, keyboard and mouse which connect to the phone in their pockets which provides the processing power.
In this scenario, it becomes even more usual for people to be switching interfaces “mid task” (including searches)—you do a search at your desk at work (powered by your phone), then continue to review the results on the train home on the phone itself before browsing further on your TV at home.
Email structured markup
This deserves a quick mention—it is another data point in the trend of “enabling action”. It doesn’t seem to be common knowledge that you can use structured markup and schema.org markup in emails, which works in both Gmail and Google Inbox.
Editor’s note: Stay tuned for more on this in tomorrow’s post!
The main concepts they introduce are “highlights” and “actions”—sound familiar? You can define actions that become buttons in emails allowing people to confirm, save, review, RSVP, etc. with a single click right in the email.
Currently, you have to apply to Google for them to whitelist emails you send out in order for them to mark the emails up, but I expect we’ll see this rolling out more and more. It may not seem directly search-related but if you’re building the “ultimate personal assistant”, then merging products like Google Now and Google Inbox would be a good place to start.
The rise of data-driven search
There is a common theme running through all of the above technologies and trends, namely data:
- We are increasingly requesting from Search Engines snippets of data, rather than links to strictly formatted web content
- We are increasingly being provided the option for direct action without going to an app/website/whatever by providing a snippet of data with our response/request
I think in the next 2 years small payloads of data will be the new currency of Google. Web search won’t go away anytime soon, but large parts of it will be subsumed into the data driven paradigm. Projects like Knowledge Vault, which aims to dislodge the Freebase/Wikipedia (i.e. manually curated) powered Knowledge Graph by pulling facts directly from the text of all pages on the web, will mean mining the web for parcels of data become feasible at scale. This will mean that Google knows where to look for specific bits of data and can extract and return this data directly to the user.
How all this might change the way users and search engines interact:
- The move towards compound queries will mean it becomes more natural for people to use Google to “interact” with data in an iterative process; Google won’t just send us to a set of data somewhere else but will help us sift through it all.
- Shared screens will mean that search results will need to be increasingly device agnostic. The next generation of technologies such as Apple Handover and Google Chromecast will mean we increasingly pass results between devices where they may take on a new layout.
- Cards will be one part of making that possible by ensuring that results can rendered in various formats. Users will become more and more accustomed to interacting with sets of cards.
- The focus on actions will mean that Google plugs directly into APIs such that they can connect users with third party backends and enable that right there in their interface.
What we should be doing
I don’t have a good answer to this—which is exactly why we need to talk about it more.
Firstly, what is obvious is that lots of the old facets of technical SEO are already breaking down. For example, as I mentioned above, things like keyword research and rankings don’t fit well with the conversational search model where compound queries are prevalent. This will only become more and more the case as we go further down the rabbit hole. We need to educate clients and work out what new metrics help us establish how Google perceive us.
Secondly, I can’t escape the feeling that APIs are not only going to increase further in importance, but also become more “mainstream”. Think how over the years ownership of company websites started in the technical departments and migrated to marketing teams—I think we could see a similar pattern with more core teams being involved in APIs. If Google wants to connect to APIs to retrieve data and help users do things, then more teams within a business are going to want to weigh in on what it can do.
APIs might seem out of the reach and unnecessary for many businesses (exactly as websites used to…), but structured markup and schema.org are like a “lite API”—enabling programmatic access to your data and even now to actions available via your website. This will provide a nice stepping stone where needed (and might even be sufficient).
Lastly, if this vision of things does play out, then much of our search behaviour could be imagined to be a sophisticated take on faceted navigation—we do an initial search and then sift through and refine the data we get back to drill down to the exact morsels we were looking for. I could envision “Query Revision” queries where the initial search happens within Google’s index (“science fiction books”) but subsequent searches happen in someone else’s, for example Amazon’s, “index” (‘show me just those with 5 stars and more than 10 reviews that were released in the last 5 years’).
If that is the case, then what I will be doing is ensuring that Distilled’s clients have a thorough and accurate “indexes” with plenty of supplementary information that users could find useful. A few years ago we started worrying about ensuring our clients’ websites have plenty of unique content, and this would see us worrying about ensuring they have a thorough “index” for their product/service. We should be doing that already, but suddenly it isn’t going to be just a conversion factor, but a ranking factor too (following the same trend as many other signals, in that regard)
Discussion
Please jump in the comments, or tweet me at @TomAnthonySEO, with your thoughts. I am sure many of the details for how I have envisioned this may not be perfectly accurate, but directionally I’m confident and I want to hear from others with their ideas.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Continue reading →Search Trends: Are Compound Queries the Start of the Shift to Data-Driven Search?
Posted by Tom-Anthony
The Web is an ever-diminishing aspect of our online lives. We increasingly use apps, wearables, smart assistants (Google Now, Siri, Cortana), smart watches, and smart TVs for searches, and none of these are returning 10 blue links. In fact, we usually don’t end up on a website at all.
Apps are the natural successor, and an increasing amount of time spent optimising search is going to be spent focusing on apps. However, whilst app search is going to be very important, I don’t think it is where the trend stops.
This post is about where I think the trends take us—towards what I am calling “Data-Driven Search”. Along the way I am going to highlight another phenomenon: “Compound Queries”. I believe these changes will dramatically alter the way search and SEO work over the next 1-3 years, and it is important we begin now to think about how that future could look.
App indexing is just the beginning
With App Indexing Google is moving beyond the bounds of the web-search paradigm which made them famous. On Android, we are now seeing blue links which are not to web pages but are deep links to open specific pages within apps:
This is interesting in and of itself, but it is also part of a larger pattern which began with things like the answer box and knowledge graph. With these, we saw that Google was shifting away from sending you somewhere else but was starting to provide the answer you were looking for right there in the SERPs. App Indexing is the next step, which moves Google from simply providing answers to enabling actions—allow you to
do things.
App Indexing is going to be around for a while—but here I want to focus on this trend towards providing answers and enabling actions.
Notable technology trends
Google’s mission is to build the “ultimate assistant”—something that anticipates your needs and facilitates fulfilling them. Google Now is just the beginning of what they are dreaming of.
So many of the projects and technologies that Google, and their competitors, are working on are converging with the trend towards “answers and actions”, and I think this is going to lead to a really interesting evolution in searches—namely what I am calling “Data-Driven Search”.
Let’s look at some of the contributing technologies.
Compound queries: query revisions & chained queries
There is a lot of talk about conversational search at the moment, and it is fascinating for many reasons, but in this instance I am mostly interested in two specific facets:
- Query revision
- Chained queries
The current model for multiple queries looks like this:
You do one query (e.g. “recipe books”) and then, after looking at the results of that search, you have a better sense of exactly what it is you are looking for and so you refine your query and run another search (e.g. “vegetarian recipe books”). Notice that you do two distinct searches—with the second one mostly completely separate from the first.
Conversational search is moving us towards a new model which looks more like this, which I’m calling the Compound Query model:
In this instance, after evaluating the results I got, I don’t make a new query but instead a Query Revision which relates back to that initial query. After searching “recipe books”, I might follow up with “just show me the vegetarian ones”. You can already do this with conversational search:
Example of a “Query Revision”—one type of Compound Query
Currently, we only see this intent revision model working in conversational search, but I expect we will see it migrate into desktop search as well. There will be a new generation of searchers who won’t have been “trained” to search in the unnatural and stilted keyword-oriented that we have. They’ll be used to conversational search on their phones and will apply the same patterns on desktop machines. I suspect we’ll also see other changes to desktop-based search which will merge in other aspects of how conversational search results are presented. There are also other companies working on radical new interfaces, such as Scinet by Etsimo (their interface is quite radical, but the problems it solves and addresses are ones Google will likely also be working on).
So many SEO paradigms don’t begin to apply in this scenario; things like keyword research and rankings are not compatible with a query model that has multiple phases.
This new query model has a second application, namely Chained Queries, where you perform an initial query, and then on receiving a response you perform a second query on the same topic (the classic example is “How tall is Justin Bieber?” followed by “How old is he?”—the second query is dependent upon the first):
Example of a Chained Query—the second type of Compound Query
It might be that in the case of chained queries, the latter queries could be converted to be standalone queries, such that they don’t muddy the SEO waters quite as much as as queries that have revisions. However, I’m not sure that this necessarily stands true, because every query in a chain adds context that makes it much easier for Google to accurately determine your intent in later queries.
If you are not convinced, consider that in the example above, as is often the case in examples (such as the Justin Bieber example), it is usually clear from the formulation that this is explicitly a chained query. However—there are chained queries where it is not necessarily clear that the current query is chained to the previous. To illustrate this, I’ve borrowed an example which Behshad Behzadi, Director of Conversational Search at Google, showed at SMX Munich last month:
Example of a “hidden” Chained Query—it is not explicit that the last search refers to the previous one.
If you didn’t see the first search for “pictures of mario” before the second and third examples, it might not be immediately obvious that the second “pictures of mario” query has taken into account the previous search. There are bound to be far more subtle examples than this.
New interfaces
The days of all Google searches coming solely via a desktop-based web browser are already long since dead, but mobile users using voice search are just the start of the change—there is an ongoing divergence of interfaces. I’m focusing here on the output interfaces—i.e., how we consume the results from a search on a specific device.
The primary device category that springs to mind is that of wearables and smart watches, which have a variety of ways in which they communicate with their users:
- Compact screens—devices like the Apple Watch and Microsoft Band have compact form factor screens, which allow for visual results, but not in the same format as days gone by—a list of web links won’t be helpful.
- Audio—with Siri, Google Now, and Cortana all becoming available via wearable interfaces (that pair to smart phones) users can also consume results as voice.
- Vibrations—the Apple Watch can give users directions using vibrations to signal left and right turns without needing to look or listen to the device. Getting directions already covers a number of searches, but you could imagine this also being useful for various yes/no queries (e.g. “is my train on time?”).
Each of these methods is incompatible with the old “title & snippet” method that made up the 10 blue links, but furthermore they are also all different from one another.
What is clear is that there is going to need to be an increase in the forms in which search engines can respond to an identical query, with responses being adaptive to the way in which the user will consume their result.
We will also see queries where the query may be “handed off” to another device: imagine me doing a search for a location on my phone and then using my watch to give me direction. Apple already has “Handover”which does this in various contexts, and I expect we’ll see the concept taken further.
This is related to Google increasingly providing us with encapsulated answers, rather than links to websites—especially true on wearables and smart devices. The interesting phenomenon here is that these answers don’t specify a specific layout, like a webpage does. The data and the layout are separated.
Which leads us to…
Cards
Made popular by Google Now, cards are prevalent in both iOS and Android, as well as on social platforms. They are a growing facet of the mobile experience:
Cards provide small units of information in an accessible chunk, often with a link to dig deeper by flipping a card over or by linking through to an app.
Cards exactly fit into the paradigm above—they are more concerned with the data you will see and less so about the way in which you will see it. The same cards look different in different places.
Furthermore, we are entering a point where you can now do more and more from a card, rather than it leading you into an app to do more. You can response to messages, reply to tweets, like and re-share, and all sorts of things all from cards, without opening an app; I highly recommend this blog post which explores this phenomenon.
It seems likely we’ll see Google Now (and mobile search as it becomes more like Google Now) allowing you to do more and more right from cards themselves—many of these things will be actions facilitated by other parties (by way of APIs of schema.org actions). In this way Google will become a “junction box” sitting between us and third parties who provide services; they’ll find an API/service provider and return us a snippet of data showing us options and then enable us to pass back data representing our response to the relevant API.
Shared screens
The next piece of the puzzle is “shared screens”, which covers several things. This starts with Google Chromecast, which has popularised the ability to “throw” things from one screen to another. At home, any guests I have over who join my wifi are able to “throw” a YouTube video from their mobile phone to my TV via the Chromecast. The same is true for people in the meeting rooms at Distilled offices and in a variety of other public spaces.
I can natively throw a variety of things: photos, YouTube videos, movies on Netflix etc., etc. How long until that includes searches? How long until I can throw the results of a search on an iPad on to the TV to show my wife the holiday options I’m looking at? Sure we can do that by sharing the whole screen now, but how long until, like photos of YouTube videos, the search results I throw to the TV take on a new layout that is suitable for that larger screen?
You can immediately see how this links back to the concept of cards and interfaces outlined above; I’m moving data from screen to screen, and between devices that provide different interfaces.
These concepts are all very related to the concept of “fluid mobility” that Microsoft recently presented in their Productivity Future Vision released in February this year.
An evolution of this is if we reach the point that some people have envisioned, whereby many offices workers, who don’t require huge computational power, no longer have computers at their desks. Instead their desks just house dumb terminals: a display, keyboard and mouse which connect to the phone in their pockets which provides the processing power.
In this scenario, it becomes even more usual for people to be switching interfaces “mid task” (including searches)—you do a search at your desk at work (powered by your phone), then continue to review the results on the train home on the phone itself before browsing further on your TV at home.
Email structured markup
This deserves a quick mention—it is another data point in the trend of “enabling action”. It doesn’t seem to be common knowledge that you can use structured markup and schema.org markup in emails, which works in both Gmail and Google Inbox.
Editor’s note: Stay tuned for more on this in tomorrow’s post!
The main concepts they introduce are “highlights” and “actions”—sound familiar? You can define actions that become buttons in emails allowing people to confirm, save, review, RSVP, etc. with a single click right in the email.
Currently, you have to apply to Google for them to whitelist emails you send out in order for them to mark the emails up, but I expect we’ll see this rolling out more and more. It may not seem directly search-related but if you’re building the “ultimate personal assistant”, then merging products like Google Now and Google Inbox would be a good place to start.
The rise of data-driven search
There is a common theme running through all of the above technologies and trends, namely data:
- We are increasingly requesting from Search Engines snippets of data, rather than links to strictly formatted web content
- We are increasingly being provided the option for direct action without going to an app/website/whatever by providing a snippet of data with our response/request
I think in the next 2 years small payloads of data will be the new currency of Google. Web search won’t go away anytime soon, but large parts of it will be subsumed into the data driven paradigm. Projects like Knowledge Vault, which aims to dislodge the Freebase/Wikipedia (i.e. manually curated) powered Knowledge Graph by pulling facts directly from the text of all pages on the web, will mean mining the web for parcels of data become feasible at scale. This will mean that Google knows where to look for specific bits of data and can extract and return this data directly to the user.
How all this might change the way users and search engines interact:
- The move towards compound queries will mean it becomes more natural for people to use Google to “interact” with data in an iterative process; Google won’t just send us to a set of data somewhere else but will help us sift through it all.
- Shared screens will mean that search results will need to be increasingly device agnostic. The next generation of technologies such as Apple Handover and Google Chromecast will mean we increasingly pass results between devices where they may take on a new layout.
- Cards will be one part of making that possible by ensuring that results can rendered in various formats. Users will become more and more accustomed to interacting with sets of cards.
- The focus on actions will mean that Google plugs directly into APIs such that they can connect users with third party backends and enable that right there in their interface.
What we should be doing
I don’t have a good answer to this—which is exactly why we need to talk about it more.
Firstly, what is obvious is that lots of the old facets of technical SEO are already breaking down. For example, as I mentioned above, things like keyword research and rankings don’t fit well with the conversational search model where compound queries are prevalent. This will only become more and more the case as we go further down the rabbit hole. We need to educate clients and work out what new metrics help us establish how Google perceive us.
Secondly, I can’t escape the feeling that APIs are not only going to increase further in importance, but also become more “mainstream”. Think how over the years ownership of company websites started in the technical departments and migrated to marketing teams—I think we could see a similar pattern with more core teams being involved in APIs. If Google wants to connect to APIs to retrieve data and help users do things, then more teams within a business are going to want to weigh in on what it can do.
APIs might seem out of the reach and unnecessary for many businesses (exactly as websites used to…), but structured markup and schema.org are like a “lite API”—enabling programmatic access to your data and even now to actions available via your website. This will provide a nice stepping stone where needed (and might even be sufficient).
Lastly, if this vision of things does play out, then much of our search behaviour could be imagined to be a sophisticated take on faceted navigation—we do an initial search and then sift through and refine the data we get back to drill down to the exact morsels we were looking for. I could envision “Query Revision” queries where the initial search happens within Google’s index (“science fiction books”) but subsequent searches happen in someone else’s, for example Amazon’s, “index” (‘show me just those with 5 stars and more than 10 reviews that were released in the last 5 years’).
If that is the case, then what I will be doing is ensuring that Distilled’s clients have a thorough and accurate “indexes” with plenty of supplementary information that users could find useful. A few years ago we started worrying about ensuring our clients’ websites have plenty of unique content, and this would see us worrying about ensuring they have a thorough “index” for their product/service. We should be doing that already, but suddenly it isn’t going to be just a conversion factor, but a ranking factor too (following the same trend as many other signals, in that regard)
Discussion
Please jump in the comments, or tweet me at @TomAnthonySEO, with your thoughts. I am sure many of the details for how I have envisioned this may not be perfectly accurate, but directionally I’m confident and I want to hear from others with their ideas.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Continue reading →The 3 Most Common SEO Problems on Listings Sites
Posted by Dom-Woodman
Listings sites have a very specific set of search problems that you don’t run into everywhere else. In the day I’m one of Distilled’s analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I’d cover the three search problems with listings sites that I spent far too long agonising about.
Quick clarification time: What is a listings site (i.e. will this post be useful for you)?
The classic listings site is Craigslist, but plenty of other sites act like listing sites:
- Job sites like Monster
- E-commerce sites like Amazon
- Matching sites like Spareroom
1. Generating quality landing pages
The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they’re usually generated automatically (or are occasionally custom category pages) .
For example, if I search “Jobs in Manchester“, you can see nearly every result is an automatically generated landing page or category page.
There are three common ways to generate these pages (occasionally a combination of more than one is used):
- Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
- Category pages: These pages are listings which have already had a filter applied and can’t be changed. They’re usually custom pages.
- Free-text search pages: These pages are generated by a free-text search box.
Those definitions are still bit general; let’s clear them up with some examples:
Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.
Indeed generates its landing pages through free text search, for example if we search for “IT jobs in manchester” it will generate: IT jobs in manchester.
teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.
Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.
Aside
Facets and free text search will typically generate pages with parameters e.g. a search for “dogs” would produce:
But to make the URL user friendly sites will often alter the URLs to display them as folders
These are still just ordinary free text search and facets, the URLs are just user friendly. (They’re a lot easier to work with in robots.txt too!)
Free search (& category) problems
If you’ve decided the base of your search will be a free text search, then we’ll have two major goals:
- Goal 1: Helping search engines find your landing pages
- Goal 2: Giving them link equity.
Solution
Search engines won’t use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.
There are plenty of ways to do this, but two of the most common are:
-
Category links alongside a search
Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)
-
Putting the main landing pages in a top-level menu
Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.
Breadcrumbs are also often used in addition to the two above and in both the examples above, you’ll find breadcrumbs that reinforce that hierarchy.
Category (& facet) problems
Categories, because they tend to be custom pages, don’t actually have many search disadvantages. Instead it’s the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won’t have too many problems.
However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you’ll run into all the problems described in the next section.
At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don’t put limits on facets.
Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:
The China facet in countries will generate “Jobs in China” that’s a useful landing page.
On the other hand, the “Conditional Bonus” facet will generate “Jobs with a conditional bonus,” and that’s not so great.
We can also see that the options within a single facet aren’t always useful. As of writing, I have a single job available in Serbia. That’s not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it’s thin content. Depending on the scale of your site it’s very easy to generate a mass of poor-quality landing pages.
Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.
We end up with four goals for our facet-generated landing pages:
- Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we’re not handing a mass of low-value pages to the search engines.
- Goal 2: Make sure we don’t generate multiple copies of our automatically generated landing pages.
- Goal 3: Make sure search engines don’t get caught in the metaphorical plastic six-pack rings of our facets.
- Goal 4: Make sure our landing pages have strong internal linking.
The first goal needs to be set internally; you’re always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I’d argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.
We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:
Category/facet solution 1: Blocking the majority of facets and providing external links
- Easiest method
- Good if your valuable category pages rarely change and you don’t have too many of them.
- Can be problematic if your valuable facet pages change a lot
Nofollow all your facet links, and noindex and block category pages which aren’t valuable or are deeper than x facet/folder levels into your search using robots.txt.
You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.
On the other hand, if people are searching for three levels (e.g. “Samsung 42″ Full HD TV”) but you only have one or two results for three-level facets, then you’d be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.
If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside “Indexing more than 1 level of facets” below.)
The immediate problem with this set-up, however, is that in one stroke we’ve removed most of the internal links to our category pages, and by no-following all the facet links, search engines won’t be able to find your valuable category pages.
In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.
The top level drop down menu you can see on teflSearch (it’s the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.
The big advantage for this method is how quick it is to implement, it doesn’t require any fiddly internal logic and adding an extra menu option is usually minimal effort.
Category/facet solution 2: Creating internal logic to work with the facets
- Requires new internal logic
- Works for large numbers of category pages with value that can change rapidly
There are four parts to the second solution:
- Select valuable facet categories and allow those links to be followed. No-follow the rest.
- No-index all pages that return a number of items below the threshold for a useful landing page
- No-follow all facets on pages with a search depth greater than x.
- Block all facet pages deeper than x level in robots.txt
As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you’re indexing more than one level you’ll need to check out the aside below to see how to deal with the duplicate content it generates.
Aside: Indexing more than one level of facets
If you want more than one level of facets to be indexable, then this will create certain problems.
Suppose you have a facet for size:
- Televisions: Size: 46″, 44″, 42″
And want to add a brand facet:
- Televisions: Brand: Samsung, Panasonic, Sony
This will create duplicate content because the search engines will be able to follow your facets in both orders, generating:
- Television – 46″ – Samsung
- Television – Samsung – 46″
You’ll have to either rel canonical your duplicate pages with another rule or set up your facets so they create a single unique URL.
You also need to be aware that each followable facet you add will multiply with each other followable facet and it’s very easy to generate a mass of pages for search engines to get stuck in. Depending on your setup you might need to block more paths in robots.txt or set-up more logic to prevent them being followed.
Letting search engines index more than one level of facets adds a lot of possible problems; make sure you’re keeping track of them.
2. User-generated content cannibalization
This is a common problem for listings sites (assuming they allow user generated content). If you’re reading this as an e-commerce site who only lists their own products, you can skip this one.
As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.
Suppose you’re a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you’ve got a duplicate content problem.
This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you’re lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.
Solution 1: Create structured titles
Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.
For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:
- The no. of positions: 4
- The primary area: PHP Developer
- The name of the recruiting company: ABC Recruitment
- Location: Manchester
We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:
4 PHP Developers with ABC Recruitment in Manchester
Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified “Castlefield, Manchester” as the location.
All of a sudden, you’ve got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.
On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn’t have used.
For example, suppose Manchester has a jobs program called “Green Highway.” A job advert title containing “Green Highway” might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.
Solution 2: Use regex to noindex the offending pages
Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.
If it’s not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might’ve been able to aim for.
Solution 3: De-index all your listings
It may seem rash, but if you’re a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they’ll drive long tail traffic.
Don’t de-index them all lightly!
3. Constantly expiring content
Our third and final problem is that user-generated content doesn’t last forever. Particularly on listings sites, it’s constantly expiring and changing.
For most use cases I’d recommend 301’ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they’ve been redirected. It typically comes out as the best combination of search and UX.
For more information or advice on how to deal with the edge cases, there’s a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.
Summary
In summary, if you’re working with listings sites, all three of the following need to be kept in mind:
- How are the landing pages generated? If they’re generated using free text or facets have the potential problems been solved?
- Is user generated content cannibalising the main landing pages?
- How has constantly expiring content been dealt with?
Good luck listing, and if you’ve had any other tricky problems or solutions you’ve come across working on listings sites lets chat about them in the comments below!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Continue reading →