4 Lessons From a Year of MozCast Data
Posted by BenMorel86
This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of Moz, Inc.
We all know that over the past year, there have been some big updates to Google’s algorithms, and we have felt what it has been like to be in the middle of those updates. I wanted to take a big step back and analyse the cumulative effects of Google’s updates. To do that, I asked four questions and analysed a year of MozCast data to find the answers.
Looking back over the last year – or more precisely the last 15 months through 1st September 2013 – I aimed to answer four questions I felt are really important to SEOs and inbound marketers. These questions were:
- Are there really more turbulent days in the SERPs than we should expect, or are all SEOs British at heart and enjoy complaining about the weather?
- If it’s warmer today than yesterday, will it cool down tomorrow or get even warmer?
- It sometimes feels like big domains are taking over the SERPs; is this true, or just me being paranoid?
- What effects have Google’s spam-fighting had on exact and partial domain matches in SERPs?
Before We Start
First, thanks to Dr. Pete for sending me the dataset, and for checking this post over before submission to make sure all the maths made sense.
Second, as has been discussed many times before on Moz, there is a big caveat whenever we talk about statistics: correlation does not imply causation. It is important not to reverse engineer a cause from an effect and get things muddled up. In addition, Dr. Pete had a big caveat about this particular dataset:
“One major warning – I don’t always correct metrics data past 90 days, so sometimes there are issues with that data on the past. Notably, there was a problem with how we counted YouTube results in November/December, so some metrics like “Big 10” and diversity were out of whack during those months. In the case of temperatures, we actively correct bad data, but we didn’t catch this problem early enough…
All that’s to say that I can’t actually verify that any given piece of past data is completely accurate, outside of the temperatures (and a couple of those days have been adjusted). So, proceed with caution.”
So, with that warning, let’s have a look at the data and see if we can start to answer those questions.
Analysis: MozCast gives us a metric for turbulence straight away: temperature. That makes this one of the easier questions to answer. All we need to do is to take the temperature’s mean, standard deviation, skew (to see whether the graph is symmetric or not), and kurtosis (to see how “fat” the tails of the curve are). Do that, and we get the following:
Mean | 68.10°F |
Standard Deviation | 10.68°F |
Skew | 1.31 |
Kurtosis | 2.60 |
What does all this mean? Well:
- A normal day should feel pretty mild (to the Brits out there, 68°F is 20°C). The standard deviation tells us that 90% of all days should be between 46°F and 90°F (8°C and 32°C), which is a nicely temperate range.
- However, the positive skew means that there are more days on the warm side than the cool side of 68°F.
- On top of this, the positive kurtosis means we actually experience more days above 90°F than we would expect.
You can see all of this in the graph below, with its big, fat tail to the right of the mean.
Graph showing the frequency of recorded temperatures (columns) and how a normal distribution of temperatures would look (line).
As you can see from the graph, there have definitely been more warm days than we would expect, and more days of extreme heat. In fact, while the normal distribution tells us we should see temperatures over 100°F (38°C) about once a year we have actually seen 14 of them. That’s two full weeks of the year! Most of those were in June of this year (the 10th, 14th, 18th, 19th, 26th, 28th, 29th to be precise, coinciding with the multi-week update that Dr. Pete wrote about)
And it looks like we’ve had it especially bad over the last few months. If we take data up to the end of May the average is only 66°F (19°C), so the average temperature over the last three months has actually been a toasty 73°F (23°C).
Answer: The short answer to the question is “pretty turbulent, especially recently”. The high temperatures this summer indicate a lot of turbulence, while the big fat tail on the temperature graph tells us that it has regularly been warmer than we might expect throughout the last 15 months. We have had a number of days of unusually high turbulence, and there are no truly calm days. So, it looks like SEOs haven’t just been griping about the unpredictable SERPs they’ve had to deal with, they’ve been right.
Analysis: The real value of knowing about the weather is in being able to make predictions with that knowledge. So, if today’s MozCast shows is warmer than yesterday it would be useful to know whether it will be warmer again tomorrow or colder.
To find out, I turned to something called the Hurst exponent, H. If you want the full explanation, which involves autocorrelations, rescaled ranges, and partial time series, then head over to Wikipedia. If not, all you need to know is that:
- If H<0.5 then the data is anti-persistent (an up-swing today means that there is likely to be a down-swing tomorrow)
- If H>0.5 the data is persistent (an increase is likely to be followed by another increase)
- If H=0.5 then today’s data has no effect on tomorrow’s
The closer H is to 0 or 1 the longer the influence of a single day exists through the data.
A normal distribution – like the red bell curve in the graph above – has a Hurst exponent of H=0.5. Since we know the distribution of temperatures with its definite lean and fat tails not normal, we can guess that its Hurst exponent probably won’t be 0.5. So, is the data persistent or anti-persistent?
Well, as of 4th September that answer is persistent: H=0.68. But if you’d asked on 16th July – just after Google’s Multi-week Update but before The Day The Knowledge Graph Exploded – the answer would have been “H=0.48, so neither”: it seems that one effect of that multi-week update was to reduce the long-term predictability of search result changes. But back in May, before that update, the answer would again have been “H=0.65, so the data is persistent”.
Answer: With the current data, I am pretty confident in saying that if the last few days have got steadily warmer, it’s likely to get warmer again tomorrow. If Google launches another major algorithm change, we might have to revisit that conclusion. The good news is that the apparent persistence of temperature changes should give us a few days warning of that algo change.
Analysis: We’ve all felt at some point like Wikipedia and About.com have taken over the SERPs. That we’re never going to beat Target or Tesco despite the fact that they never seem to produce any interesting content. Again, MozCast supplies us with a couple of ready-made metrics to analyse whether or not this is true or not: Big 10 and Domain Diversity.
First, domain diversity. Plotting each day’s domain diversity for the last 15 months gives you the graph below (I’ve taken a five-day moving average to reduce noise and make trends clearer).
Trends in domain diversity, showing a clear drop in the number of domains in the SERPs used for the MozCast.
As you can see, domain diversity has dropped quite a lot. It dropped 16% from 57% in June 2012 to 48% in August 2013. There were a couple of big dips in domain diversity – 6th May 2012, 29th September 2012, and 31st January 2013 – but really this seems like a definite trend, not the result of a few jumps.
Meanwhile, if we plot the proportion of the SERPs being taken over by the Big 10 we see a big increase over the same period, from 14.3% to 15.4%. That’s an increase of 8%.
Trends in the five-day moving average of the proportion of SERPs used in the MozCast dataset taken up by the daily Big 10 domains.
Answer: The diversity of domains is almost certainly going down, and big domains are taking over at least a portion of the space those smaller domains leave behind. Whether this is a good or bad thing almost certainly depends on personal opinion: somebody who owns one of the domains that have disappeared from the listings would probably say it’s a bad thing, Mr. Cutts would probably say that a lot of the domains that have gone were spammy or full of thin content so it’s a good thing. Either way, it highlights the importance of building a brand.
Analysis: Keyword-matched domains are a rather interesting subject. Looking purely at the trends, the proportion of listings with exact (EMD) and partial (PMD) matched domains is definitely going down. A few updates in particular have had an effect: One huge jolt in December 2012 had a particular and long-lasting effect, knocking 10% of EMDs and 10% of PMDs out of the listings; Matt Cutts himself announced the bump in September 2012; and that multi-week update that cause the temperature highs in June also bumped down the influence of PMDs.
Trends in the five day moving averages of Exact and Partial Matched Domain (EMD and PMD) influence in the SERPs used in the MozCast dataset.
Not surprisingly, there is a strong correlation (0.86) between changes in the proportion of EMDs and PMDs in the SERPs. What is more interesting is that there is also a correlation (0.63) between their 10-day volatilities, the standard deviation of all their values over the last 10 days. This implies that when one metric sees a big swing it is likely that the other will see a big swing in the same direction – mostly down, according to the graph. This supports the statements Google have made about various updates tackling low-quality keyword-matched domains.
Something else rather interesting that is linked to our previous question is the very strong correlation between the portion proportion of PMDs in the SERPs and domain diversity. This is a whopping 0.94, meaning that a move up or down in domain diversity is almost always accompanied by a swing the same way for the proportion of SERP space occupied by PMDs, and vice versa.
All of this would seem to indicate that keyword matching domains is becoming less important in the search engines’ eyes. But hold your conclusions-drawing horses: this year’s Moz ranking factors study tells us that “In our data collected in early June (before the June 25 update), we found EMD correlations to be relatively high at 0.17… just about on par with the value from our 2011 study”. So, how can the correlation stay the same but the number of results go down? Well, I would tend to agree with Matt Peters‘ hypothesis in that post that it could be due to “Google removing lower quality EMDs”. There is also the fact that keyword matches do tend to have some relevance to searches: if I’m looking for pizzas and I see benspizzzas.com in the listings I’m quite likely to think “they sound like they do pizzas – I’ll take a look at them”. So domain matches are still relevant to search queries, as long as they are supported by relevant content.
So, how can the correlation stay the same but the numbers of results drop? Well, the ranking factors report looks at how well sites rank once they have already ranks. If only a few websites with EMDs rank but they rank very highly, the correlation between rankings and domain matching might be the same as if a number of websites rank way down the list. So if lower quality EMDs have been removed from the ranking – as Dr. Matt and Dr. Pete speculate – but the ones remaining rank higher than they used to, the correlation coefficient we measure will be the same today in 2011.
Answer: The number of exact and partial matches is definitely going down, but domain matches are still relevant to search queries – as long as they are supported by relevant content. We know about this relevance because brands constantly put their major services into their names: look at SEOmoz (before it changed), or British Gas, or HSBC (Hong Kong-Shanghai Banking Corporation). Brands do this because it means their customers can instantly see what they do – and the same goes for domains.
So, if you plan on creating useful, interesting content for your industry then go ahead and buy a domain with a keyword or two in. You could even buy the exact match domain, even if that doesn’t match your brand (although this might give people trust issues, which is a whole different story). But if you don’t plan on creating that content, buying a keyword-matched domain looks unlikely to help you, and you could even be in for a more rocky ride in the future than if you stick to your branded domain.
Whew, that was a long post. So what conclusions can we draw from all of this?
Well, in short:
- Although the “average” day is relatively uneventful, there are more hot, stormy days than we would hope for
- Keyword-matched domains, whether exact or partial, have seen a huge decline in influence over the last 15 months – and if you own one, you’ve probably seen some big drops in a short space of time
- The SERPs are less diverse than they were a year ago, and the big brands have extended their influence
- When EMD/PMD influence drops, SERP diversity also drops. Could the two be connected?
- If today is warmer than yesterday, it’s likely that tomorrow will be warmer still
What are your thoughts on the past year? Does this analysis answer any questions you had – or make you want to ask more? Let me know in the comments below (if it does make you ask more questions I’ll try to do some more digging and answer them).
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Continue reading →New: The MozCast Feature Graph – Tracking Google’s Landscape
Posted by Dr-Pete
Over the last year-and-a-half of tracking Google’s daily “weather”, it’s become painfully clear to me that there’s much more to future-proofing your SEO than just the core algorithm. From Knowledge Graph to In-depth articles, Google is launching new features faster than ever, and pages with nothing but ten blue links will soon be a memory.
So, we started working on a way to track how features change over time, and today I’m happy to announce the launch of the MozCast Feature Graph. It looks a little something like this:
Three tools in one
The Feature Graph is really three tools in one. The top graph shows a 30-day history of four major groups of features: Ads, Local, Knowledge Graph, and Verticals. The legend is color-coded to the bars at the bottom, which show the current density of each feature and the day-over-day change for that feature. So, for example, “Adwords (Top)” in the graph above shows that 77.9% of the queries tracked by MozCast displayed ads at the top the last time we checked them.
The third tool is my favorite, and the one that probably delayed this project the most. I’ve attempted to put some of the power of the raw data into your hands, and we’ve created a mini laboratory to find and preview SERPs.
The SERP mini-lab
Let’s say you’re looking for a SERP that has a Knowledge Graph entry, image results, and shopping results. Just check on the boxes next to those three features. As you add each feature, you’ll see the “Matched Queries” box populate with a list of search terms:
Click on any of those queries, and you’ll be taken to the corresponding Google search (parameterized to match the original capture as closely as possible). For example, if I click on “vespa”, I get the following:
You can see the paid product placements and Knowledge Graph on the right, as well as the image results after the third organic listing. Note that these links are to live SERPs on Google.com – in some cases, the page may be slightly different from the one we visited the night before. This is especially true of AdWords placements, which can vary considerably from visit to visit.
When you select a feature or set of features, you don’t just get sample queries – the 30-day graph at the top changes to match your search:
The lines on the graph now show the trends for each of the individual features you’ve selected. You can mouse over any point for the exact percentage on that day.
Bonus feature: new ads
There’s one feature that works a bit differently than the rest. We’ve started tracking the prevalence of Google’s new AdWords format, which is in large-scale testing but not fully live yet. The “New Ad Format” feature tracks the percentage of ads using the new format across the queries that displayed ads (not the entire query set). Please note that the new ad format is only rolled out for some users, so the search/preview function won’t work properly (you may see the old ads). I’ve added this feature simply to track the roll-out over time.
Some technical notes
The Feature Graph is powered by the MozCast 10K, a set of 10,000 queries across 20 industry categories. Half of the MozCast 10K is delocalized and half is locally targeted (1,000 keywords each to 5 major cities). Local SEO features are measured only from the local data (5,000 total queries). All results are depersonalized.
A few thank-yous
I’d like to thank the inbound engineering team (Casey, Devin, and Shelly) for their help making this a reality, and our design leads, Daan and Derric, for hashing out a few ideas with me. Special thanks to Devin, who had the thankless job of translating my old-school PHP into something Moz-friendly that won’t break 50 times/day.
Have fun with it
The Google SERP Feature Graph is live as of last night. This data has powered quit a few insights and blog posts over the past few months, and I’m excited to release it to the public. My hope is that people will use the tool to surface new SERP combinations and make their own discoveries. Let me know what you find.
Editor note: We had non-launch related outage of Mozcast around 12:30am PST, 12/10/13, if you had errors then. Service has been completely restored at 1:20am PST, and the new features are working. Enjoy.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Continue reading →