Posted by gfiorelli1
Sometimes I think that us SEOs could be wonderful characters for a Woody Allen movie: We are stressed, nervous, paranoid, we have a tendency for sudden changes of mood…okay, maybe I am exaggerating a little bit, but that’s how we tend to (over)react whenever Google announces something.
Cases like this webmaster, who is desperately thinking he was penalized by Hummingbird, are not uncommon.
One thing that doesn’t help is the lack of clarity coming from Google, which not only never mentions Hummingbird in any official document (for example, in the post of its 15th anniversary), but has also shied away from details of this epochal update in the “off-the-record” declarations of Amit Singhal. In fact, in some ways those statements partly contributed to the confusion.
When Google announces an update—especially one like Hummingbird—the best thing to do is to avoid trying to immediately understand what it really is based on intuition alone. It is better to wait until the dust falls to the ground, recover the original documents, examine those related to them (and any variants), take the time to see the update in action, calmly investigate, and then after all that try to find the most plausible answers.
This method is not scientific (and therefore the answers can’t be defined as “surely correct”), it is philological, and when it comes to Google and its updates, I consider it a great method to use.
The original documents are the story for the press of the event during which Google announced Hummingbird, and the FAQ that Danny Sullivan published immediately after the event, which makes direct reference to what Amit Singhal said.
Related documents are the patents that probably underlie Hummingbird, and the observations that experts like Bill Slawski, Ammon Johns, Rand Fishkin, Aaron Bradley and others have derived.
This post is the result of my study of those documents and field observations.
Why did Amit Singhal mix apples with oranges?
When announcing Hummingbird, Amit Singhal said that it wasn’t since Caffeine in 2010 that the Google Algorithm was updated so deeply.
The problem is that Caffeine wasn’t an algorithmic change; it was an infrastructural change.
Caffeine’s purpose, in fact, was to optimize the indexation of the billions of Internet documents Google crawls, presenting a richer, bigger, and fresher pool of results to the users.
Instead, Hummingbird’s objective is not a newer optimization of the indexation process, but to better understand the users’ intent when searching, thereby offering the most relevant results to them.
Nevertheless, we can affirm that Hummingbird is also an infrastructural update, as it governs the more than 200 elements that make up Google’s algorithm.
The (maybe unconscious) association Amit Singhal created between Caffeine and Hummingbird should tell us:
- That Hummingbird would not be here if Caffeine wasn’t deployed in 2010, and hence it should be considered an evolution of Google Search, and not a revolution.
- Moreover, that Hummingbird should be considered Google’s most ambitious attempt to solve all the algorithmic issues that Caffeine caused.
Let me explain this last point.
Caffeine, quitting the so-called “Sand Box,” caused the SERPs to be flooded with poor-quality results.
Google reacted by creating “patches” like Panda, Penguin, and the exact-match domain (EMD) updates, among others.
But these updates, so effective in what we define as middle- and head-tail queries, were not so effective for a type of query that—mainly because of the fast adoption of mobile search by the users—more and more people have begun to use: conversational long tail queries, or those that Amit Singhal has defined as “verbose queries.”
The evolution of natural language recognition by Google, the improved ability to disambiguate entities and concepts through technology inherited from Metaweb and improved with Knowledge Graph, and the huge improvements made in the SERPs’ personalized customization have given Google the theoretical and practical tools not only for solving the problem of long-tail queries, but also for giving a fresh start to the evolution of Google Search.
That is the backstory that explains what Amit Singhal told about Hummingbird, paraphrased here by Danny Sullivan:
[Hummingbird] Gave us an opportunity […] to take synonyms and knowledge graph and other things Google has been doing to understand meaning to rethink how we can use the power of all these things to combine meaning and predict how to match your query to the document in terms of what the query is really wanting and are the connections available in the documents. and not just random coincidence that could be the case in early search engines.
How does Hummingbird work?
“To take synonyms and knowledge graph and other things…”
Google has been working with synonyms for a long time. If we look at the timeline Google itself shared in its 15th anniversary post, it has used them since 2002, even though we can also tell that disambiguation (meant as orthographic analysis of the queries) has been applied since 2001.
Last year Vanessa Fox wrote “Is Google’s Synonym Matching Increasing?…” on Search Engine Land.
Reading that post and seeing the examples presented, it is clear that synonyms were already used by Google—in connection with the user intent underlying the query—in order to broaden the query and rewrite it to offer the best results to the users.
That same post, though, shows us why only using a thesaurus of synonyms or relying on the knowledge of the highly ranked queries was not enough to assure relevant SERPs (see how Vanessa points out how Google doesn’t consider “dogs” pets in the query “pet adoption,” but does consider “cats”).
Amit Singhal, in this old patent, was also conscious that only relying on synonyms was not a perfect solution, because two words may be synonyms and may not be so depending on the context they are used (i.e.: “coche” and “automóvil” both mean “car” in Spanish, but “carro” only means “car” in Latin American Spanish, meaning “wagon” in Spain).
Therefore, in order to deliver the best results possible using semantic search, what Google needed to understand better, easier, and faster was context. Hummingbird is how Google solved that need.
Synonyms remain essential; Amit Singhal confirmed that in the post-event talk with Danny Sullivan. How they are used now has been described by Bill Slawski in this post, where he dissects the Synonym identification based on co-occurring terms patent.
That patent, then is also based on the concept of “search entities,” which I described in my last post here on Moz, when talking about personalized search.
Speaking literally, words are not “things” themselves but the verbal representation of things, and search entities are how Google objectifies words into concepts. An object may have a relationship with others that may change depending on the context in which they are used together. In this sense, words are treated like people, cities, books, and all the other named entities usually related to the Knowledge Graph.
The mechanisms Google uses in identifying search entities are especially important in disambiguating the different potential meanings of a word, and thereby refining the information retrieval accordingly to a “probability score.”
This technique is not so different from what the Knowledge Graph does when disambiguating, for instance, Saint Peter the Apostle from Saint Peter the Basilica or Saint Peter the city in Minnesota.
Finally, there is a third concept playing an explicit role in what could be the “Hummingbird patent:” co-occurrences.
Integrating these three elements, Google now is (in theory) able:
- To better understand the intent of a query;
- To broaden the pool of web documents that may answer that query;
- To simplify how it delivers information, because if query A, query B, and query C substantively mean the same thing, Google doesn’t need to propose three different SERPs, but just one;
- To offer a better search experience, because expanding the query and better understanding the relationships between search entities (also based on direct/indirect personalization elements), Google can now offer results that have a higher probability of satisfying the needs of the user.
- As a consequence, Google may present better SERPs also in terms of better ads, because in 99% of the cases, verbose queries were not presenting ads in their SERPs before Hummingbird.
Maybe Hummingbird could have solved Fred Astaire and Ginger Rogers speaking issues…
90% of the queries affected, seriously?
Many SEOs have questioned the fact that Hummingbird has affected the 90% of all queries for the simple reason they didn’t notice any change in traffic and rankings.
Apart from the fact that the SERPs were in constant turmoil between the end of August and the first half of September, during which time Hummingbird first saw the light (though it could just be a coincidence, quite an opportune one indeed), the typical query that Hummingbird targets is the conversational one (e.g.: “What is the best pizzeria to eat at close to Piazza del Popolo e via del Corso?”), a query that usually is not tracked by us SEOs (well, apart from Dr. Pete, maybe).
Moreover, Hummingbird is about queries, not keywords (much less long-tail ones), as was so well explained by Ammon Johns in his post “Hummingbird – The opposite of long-tail search.” For that reason, tracking long-tail rankings as a metric of the impact of Hummingbird is totally wrong.
Finally, Hummingbird has not meant the extinction of all the classic ranking factors, but is instead a new framework set upon them. If a site was both authoritative and relevant for a query, it still will be ranking as well as it was before Hummingbird.
So, which sites got hit? Probably those sites that were relying just on very long tail keyword-optimized pages, but had no or very low authority. Therefore, as Rand said in his latest Whiteboard Friday, now it is far more convenient to create better linkable/shareable content, which also semantically relates to long-tail keywords, than it is to create thousands of long tail-based pages with poor or no quality or utility.
If Hummingbird is a shift to semantic SEO, does that mean that using Schema.org will make my site rank better?
One of the myths that spread very fast when Hummingbird was announced was that it is heavily using structured data as a main factor.
Although it is true that for some months now Google has stressed the importance of structured data (for example, dedicating a section to it in Google Webmaster Tools), considering Schema.org as the magic solution is not correct. It is an example of how us SEOs sometimes confuse the means with the purpose.
What we need to do is offer Google easily understandable context for the topics around which we have created a page, and structured data are helpful in this respect. By themselves, however, they are not enough. As mentioned before, if a page is not considered authoritative (thanks to external links and mentions), it most likely will not have enough strength for ranking well, especially now that long-tail queries are simplified by Hummingbird.
Is Hummingbird related to the increased presence of the Knowledge Graph and Answers Cards?
Many people came up with the idea that Hummingbird is the translation of the Knowledge Graph to the classic Google Search, and that it has a direct connection with the proliferation of the Answer Cards. This theory led to some very angry posts ranting against the “scraper” nature of Google.
This is most likely due to the fact that Hummingbird was announced alongside new features of Knowledge Graph, but there is no evident relationship between Hummingbird and Knowledge Graph.
What many have thought as being a cause (Hummingbird causing more Knowledge Graph and Answer Cards, hence being the same) is most probably a simple correlation.
Hummingbird substantially simplified verbose queries into less verbose ones, the latter of which are sometimes complemented with the constantly expanding Knowledge Graph. For that reason, we see a greater number of SERPs presenting Knowledge Graph elements and Answer Cards.
That said, the philosophy behind Hummingbird and the Knowledge Graph is the same, moving from strings to things.
Is Hummingbird strongly based on the Knowledge Base?
The Knowledge Base is potent and pervasive in how Google works, but reducing Hummingbird to just the Knowledge Base would be simplistic.
As we saw, Hummingbird relies on several elements, the Knowledge Base being one of them, especially in all queries with personalization (which should be considered a pervasive layer that affects the algorithm).
If Hummingbird was heavily relying on the Knowledge Base, without complementing it with other factors, we could fall into the issues that Amit Singhal was struggling with in the earlier patent about synonyms.
Does Hummingbird mean the end of the link graph?
No. PageRank and link-related elements of the algorithm are still alive and kicking. I would also dare to say that links are even more important now.
In fact, without the authority a good link profile grants to a site, a web page will have even more difficulty ranking now (see what I wrote just above about the fate of low-authority pages).
What is even more important now is the context in which the link is present. We already learned this with Penguin, but Hummingbird reaffirms how inbound links from topically irrelevant contexts are bad links.
That said, Google still has to improve on the link front, as Danny Sullivan said well in this tweet:
Links are the fossil fuel of search relevancy signals. Polluted. Not getting better. And yet, that’s what Google Hummingbird drinks most.
— Danny Sullivan (@dannysullivan) October 18, 2013
At the same time, though (again because of context and entity recognition), brand co-occurrences and co-citations assume an even more important role with Hummingbird.
Is Hummingbird related to 100% (not provided)?
The fact that Hummingbird and 100% (not provided) were rolled out at almost the same time seems to be more than just a coincidence.
If Hummingbird is more about search entities, better information retrieval, and query expansion—an update where keywords by themselves have lost part of the omnipresent value they had—then relying on keyword data alone is not enough anymore.
We should stop focusing only on keyword optimization and start thinking about topical optimization.
This obliges us to think about great content, and not just about “content.” Things like “SEO copywriting” will end up being the same as “amazing copywriting.”
For that, as SEOs, we should start understanding how search entities work, and not simply become human thesauruses of synonyms.
If Hummingbird is a giant step toward Semantic SEO, then as SEOs, our job “is not about optimizing for strings, or for things, but for the connections between things,” as brilliantly says Aaron Bradley in this post and deck for SMX East.
Semantic SEO – The Shift From Strings To Things by Aaron Bradley #SMX
from Search Marketing Expo – SMX
What must we do to be Hummingbird-friendly?
Let me ask you few questions, and try to answer them sincerely:
- When creating/optimizing a site, are you doing it with a clear audience in your mind?
- When performing on-page optimization for your site, are you following at least these SEO best practices?
- Using a clear and not overly complex information architecture;
- Avoiding canonicalization issues;
- Avoiding thin-content issues;
- Creating a semantic content model;
- Topically optimizing the content of the site on a page-by-page basis, using natural and semantically rich language and with a landing page-centric strategy in mind;
- Creating useful content using several formats, that you yourself would like to share with your friends and link to;
- Implementing Schema.org, Open Graph and semantic mark-ups.
- Are your link-building objectives:
- Better brand visibility?
- Gaining referral traffic?
- Enhancing the sense of thought leadership of your brand?
- Topically related sites and/or topically related sections of a more generalist site (i.e.: News site)?
- As an SEO, is social media offering these advantages?
- Wider brand visibility;
- Social echo;
- Increased mentions/links in the form of derivatives, co-occurrences, and co-citation in others’ web sites;
- Organic traffic and brand ambassadors’ growth.
If you answered yes to all these questions, you don’t have to do anything but keep up the good work, refine it, and be creative and engaging. You were likely already seeing your site ranking well and gaining traffic thanks to the more holistic vision of SEO you have.
If you answered no to few of them, then you have just to correct the things you’re doing wrong and follow the so-called SEO best practices (and the 2013 Moz Ranking Factors are a good list of best practices).
If you sincerely answered no to many of them, then you were having problems even before Hummingbird was unleashed, and things won’t get better with it if you don’t radically change your mindset.
Hummingbird is not asking us to rethink SEO or to reinvent the wheel. It is simply asking us to not do crappy SEO… but that is something we should know already, shouldn’t we?
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!