frans |

About frans

Website:
frans has written 4625 articles so far, you can find them below.

Big, Fast, and Strong: Setting the Standard for Backlink Index Comparisons

It’s all wrong

It always was. Most of us knew it. But with limited resources, we just couldn’t really compare the quality, size, and speed of link indexes very well. Frankly, most backlink index comparisons would barely pass for a high school science fair project, much less a rigorous peer review.

My most earnest attempt at determining the quality of a link index was back in 2015, before I joined Moz as Principal Search Scientist. But I knew at the time that I was missing a huge key to any study of this sort that hopes to call itself scientific, authoritative or, frankly, true: a random, uniform sample of the web.

But let me start with a quick request. Please take the time to read this through. If you can’t today, schedule some time later. Your businesses depend on the data you bring in, and this article will allow you to stop taking data quality on faith alone. If you have questions with some technical aspects, I will respond in the comments, or you can reach me on twitter at @rjonesx. I desperately want our industry to finally get this right and to hold ourselves as data providers to rigorous quality standards.

Quick links:

Getting it right

One of the greatest things Moz offers is a leadership team that has given me the freedom to do what it takes to “get things right.” I first encountered this when Moz agreed to spend an enormous amount of money on clickstream data so we could make our keyword tool search volume better (a huge, multi-year financial risk with the hope of improving literally one metric in our industry). Two years later, Ahrefs and SEMRush now use the same methodology because it’s just the right way to do it.

About 6 months into this multi-year project to replace our link index with the huge Link Explorer, I was tasked with the open-ended question of “how do we know if our link index is good?” I had been thinking about this question ever since that article published in 2015 and I knew I wasn’t going to go forward with anything other than a system that begins with a truly “random sample of the web.” Once again, Moz asked me to do what it takes to “get this right,” and they let me run with it.

What’s the big deal with random?

It’s really hard to over-state how important a good random sample is. Let me diverge for a second. Let’s say you look at a survey that says 90% of Americans believe that the Earth is flat. That would be a terrifying statistic. But later you find out the survey was taken at a Flat-Earther convention and the 10% who disagreed were employees of the convention center. This would make total sense. The problem is the sample of people surveyed wasn’t of random Americans — instead, it was biased because it was taken at a Flat-Earther convention.

Now, imagine the same thing for the web. Let’s say an agency wants to run a test to determine which link index is better, so they look at a few hundred sites for comparison. Where did they get the sites? Past clients? Then they are probably biased towards SEO-friendly sites and not reflective of the web as a whole. Clickstream data? Then they would be biased towards popular sites and pages — once again, not reflective of the web as a whole!

Starting with a bad sample guarantees bad results.

It gets even worse, though. Indexes like Moz report our total statistics (number of links or number of domains in our index). However, this can be terribly misleading. Imagine a restaurant which claimed to have the largest wine selection in the world with over 1,000,000 bottles. They could make that claim, but it wouldn’t be useful if they actually had 1,000,000 of the same type, or only Cabernet, or half-bottles. It’s easy to mislead when you just throw out big numbers. Instead, it would be much better to have a random selection of wines from the world and measure if that restaurant has it in stock, and how many. Only then would you have a good measure of their inventory. The same is true for measuring link indexes — this is the theory behind my methodology.

Unfortunately, it turns out getting a random sample of the web is really hard. The first intuition most of us at Moz had was to just take a random sample of the URLs in our own index. Of course we couldn’t — that would bias the sample towards our own index, so we scrapped that idea. The next thought was: “We know all these URLs from the SERPs we collect — perhaps we could use those.” But we knew they’d be biased towards higher-quality pages. Most URLs don’t rank for anything — scratch that idea. It was time to take a deeper look.

I fired up Google Scholar to see if any other organizations had attempted this process and found literally one paper, which Google produced back in June of 2000, called “On Near-Uniform URL Sampling.” I hastily whipped out my credit card to buy the paper after reading just the first sentence of the abstract: “We consider the problem of sampling URLs uniformly at random from the Web.” This was exactly what I needed.

Why not Common Crawl?

Many of the more technical SEOs reading this might ask why we didn’t simply select random URLs from a third-party index of the web like the fantastic Common Crawl data set. There were several reasons why we considered, but chose to pass, on this methodology (despite it being far easier to implement).

We can’t be certain of Common Crawl’s long-term availability. Top million lists (which we used as part of the seeding process) are available from multiple sources, which means if Quantcast goes away we can use other providers.
We have contributed crawl sets in the past to Common Crawl and want to be certain there is no implicit or explicit bias in favor of Moz’s index, no matter how marginal.
The Common Crawl data set is quite large and would be harder to work with for many who are attempting to create their own random lists of URLs. We wanted our process to be reproducible.

How to get a random sample of the web

The process of getting to a “random sample of the web” is fairly tedious, but the general gist of it is this. First, we start with a well-understood biased set of URLs. We then attempt to remove or balance this bias out, making the best pseudo-random URL list we can. Finally, we use a random crawl of the web starting with those pseudo-random URLs to produce a final list of URLs that approach truly random. Here are the complete details.

1. The starting point: Getting seed URLs

The first big problem with getting a random sample of the web is that there is no true random starting point. Think about it. Unlike a bag of marbles where you could just reach in and blindly grab one at random, if you don’t already know about a URL, you can’t pick it at random. You could try to just brute-force create random URLs by shoving letters and slashes after each other, but we know language doesn’t work that way, so the URLs would be very different from what we tend to find on the web. Unfortunately, everyone is forced to start with some pseudo-random process.

We had to make a choice. It was a tough one. Do we start with a known strong bias that doesn’t favor Moz, or do we start with a known weaker bias that does? We could use a random selection from our own index for the starting point of this process, which would be pseudo-random but could potentially favor Moz, or we could start with a smaller, public index like the Quantcast Top Million which would be strongly biased towards good sites.

We decided to go with the latter as the starting point because Quantcast data is:

Reproducible. We weren’t going to make “random URL selection” part of the Moz API, so we needed something others in the industry could start with as well. Quantcast Top Million is free to everyone.
Not biased towards Moz: We would prefer to err on the side of caution, even if it meant more work removing bias.
Well-known bias: The bias inherent in the Quantcast Top 1,000,000 was easily understood — these are important sites and we need to remove that bias.
Quantcast bias is natural: Any link graph itself already shares some of the Quantcast bias (powerful sites are more likely to be well-linked)

With that in mind, we randomly selected 10,000 domains from the Quantcast Top Million and began the process of removing bias.

2. Selecting based on size of domain rather than importance

Since we knew the Quantcast Top Million was ranked by traffic and we wanted to mitigate against that bias, we introduced a new bias based on the size of the site. For each of the 10,000 sites, we identified the number of pages on the site according to Google using the “site:” command and also grabbed the top 100 pages from the domain. Now we could balance the “importance bias” against a “size bias,” which is more reflective of the number of URLs on the web. This was the first step in mitigating the known bias of only high-quality sites in the Quantcast Top Million.

3. Selecting pseudo-random starting points on each domain

The next step was randomly selecting domains from that 10,000 with a bias towards larger sites. When the system selects a site, it then randomly selects from the top 100 pages we gathered from that site via Google. This helps mitigate the importance bias a little more. We aren’t always starting with the homepage. While these pages do tend to be important pages on the site, we know they aren’t always the MOST important page, which tends to be the homepage. This was the second step in mitigating the known bias. Lower-quality pages on larger sites were balancing out the bias intrinsic to the Quantcast data.

4. Crawl, crawl, crawl

And here is where we make our biggest change. We actually crawl the web starting with this set of pseudo-random URLs to produce the actual set of random URLs. The idea here is to take all the randomization we have built into the pseudo-random URL set and let the crawlers randomly click on links to produce the truly random URL set. The crawler will select a random link from our pseudo-random crawlset and then start a process of randomly clicking links, each time with a 10% chance of stopping and a 90% chance of continuing. Wherever the crawler ends, the final URL is dropped into our list of random URLs. It is this final set of URLs that we use to run our metrics. We generate around 140,000 unique URLs through this process monthly to produce our test data set.

Phew, now what? Defining metrics

Once we have the random set of URLs, we can start really comparing link indexes and measuring their quality, quantity, and speed. Luckily, in their quest to “get this right,” Moz gave me generous paid access to competitor APIs. We began by testing Moz, Majestic, Ahrefs, and SEMRush, but eventually dropped SEMRush after their partnership with Majestic.

So, what questions can we answer now that we have a random sample of the web? This is the exact wishlist I sent out in an email to leaders on the link project at Moz:

Size:
- What is the likelihood a randomly selected URL is in our index vs. competitors?
- What is the likelihood a randomly selected domain is in our index vs. competitors?
- What is the likelihood an index reports the highest number of backlinks for a URL?
- What is the likelihood an index reports the highest number of root linking domains for a URL?
- What is the likelihood an index reports the highest number of backlinks for a domain?
- What is the likelihood an index reports the highest number of root linking domains for a domain?

Speed:
- What is the likelihood that the latest article from a randomly selected feed is in our index vs. our competitors?
- What is the average age of a randomly selected URL in our index vs. competitors?
- What is the likelihood that the best backlink for a randomly selected URL is still present on the web?
- What is the likelihood that the best backlink for a randomly selected domain is still present on the web?
Quality:
- What is the likelihood that a randomly selected page’s index status (included or not included in index) in Google is the same as ours vs. competitors?
- What is the likelihood that a randomly selected page’s index status in Google SERPs is the same as ours vs. competitors?
- What is the likelihood that a randomly selected domain’s index status in Google is the same as ours vs. competitors?
- What is the likelihood that a randomly selected domain’s index status in Google SERPs is the same as ours vs. competitors?
- How closely does our index compare with Google’s expressed as “a proportional ratio of pages per domain vs our competitors”?
- How well do our URL metrics correlate with US Google rankings vs. our competitors?

Reality vs. theory

Unfortunately, like all things in life, I had to make some cutbacks. It turns out that the APIs provided by Moz, Majestic, Ahrefs, and SEMRush differ in some important ways — in cost structure, feature sets, and optimizations. For the sake of politeness, I am only going to mention name of the provider when it is Moz that was lacking. Let’s look at each of the proposed metrics and see which ones we could keep and which we had to put aside…

Size: We were able monitor all 6 of the size metrics!

Speed:
- We were able to include this Fast Crawl metric.
- What is the average age of a randomly selected URL in our index vs. competitors?
  Getting the age of a URL or domain is not possible in all APIs, so we had to drop this metric.
- What is the likelihood that the best backlink for a randomly selected URL is still present on the web?
  Unfortunately, doing this at scale was not possible because one API is cost prohibitive for top link sorts and another was extremely slow for large sites. We hope to run a set of live-link metrics independently from our daily metrics collection in the next few months.
- What is the likelihood that the best backlink for a randomly selected Domain is still present on the web?
  Once again, doing this at scale was not possible because one API is cost prohibitive for top link sorts and another was extremely slow for large sites. We hope to run a set of live-link metrics independently from our daily metrics collection in the next few months.
Quality:
- We were able to keep this metric.
- What is the likelihood that a randomly selected page’s index status in Google SERPs is the same as ours vs. competitors?
  Chose not to pursue due to internal API needs, looking to add soon.
- We were able to keep this metric.
- What is the likelihood that a randomly selected domain’s index status in Google SERPs is the same as ours vs. competitors?
  Chose not to pursue due to internal API needs at the beginning of project, looking to add soon.
- How closely does our index compare with Google’s expressed as a proportional ratio of pages per domain vs our competitors?
  Chose not to pursue due to internal API needs. Looking to add soon.
- How well do our URL metrics correlate with US Google rankings vs. our competitors?
  Chose not to pursue due to known fluctuations in DA/PA as we radically change the link graph. The metric would be meaningless until the index became stable.

Ultimately, I wasn’t able to get everything I wanted, but I was left with 9 solid, well-defined metrics.

On the subject of live links:

In the interest of being TAGFEE, I will openly admit that I think our index has more deleted links than others like the Ahrefs Live Index. As of writing, we have about 30 trillion links in our index, 25 trillion we believe to be live, but we know that some proportion are likely not. While I believe we have the most live links, I don’t believe we have the highest proportion of live links in an index. That honor probably does not go to Moz. I can’t be certain because we can’t test it fully and regularly, but in the interest of transparency and fairness, I felt obligated to mention this. I might, however, devote a later post to just testing this one metric for a month and describe the proper methodology to do this fairly, as it is a deceptively tricky metric to measure. For example, if a link is retrieved from a chain of redirects, it is hard to tell if that link is still live unless you know the original link target. We weren’t going to track any metric if we couldn’t “get it right,” so we had to put live links as a metric on hold for now.

Caveats

Don’t read any more before reading this section. If you ask a question in the comments that shows you didn’t read the Caveats section, I’m just going to say “read the Caveats section.” So here goes…

This is a comparison of data that comes back via APIs, not within the tools themselves. Many competitors offer live, fresh, historical, etc. types of indexes which can differ in important ways. This is just a comparison of API data using default settings.
Some metrics are hard to estimate, especially like “whether a link is in the index,” because no API — not even Moz — has a call that just tells you whether they have seen the link before. We do our best, but any errors here are on the the API provider. I think we (Moz, Majestic, and Ahrefs) should all consider adding an endpoint like this.
Links are counted differently. Whether duplicate links on a page are counted, whether redirects are counted, whether canonicals are counted (which Ahrefs just changed recently), etc. all affect these metrics. Because of this, we can’t be certain that everything is apples-to-apples. We just report the data at face value.
Subsequently, the most important takeaway in all of these graphs and metrics is direction. How are the indexes moving relative to one another? Is one catching up, is another falling behind? These are the questions best answered.
The metrics are adversarial. For each random URL or domain, a link index (Moz, Majestic, or Ahrefs) gets 1 point for being the biggest, for tying with the biggest, or for being “correct.” They get 0 points if they aren’t the winner. This means that the graphs won’t add up to 100 and it also tends to exaggerate the differences between the indexes.
Finally, I’m going to show everything, warts and all, even when it was my fault. I’ll point out why some things look weird on graphs and what we fixed. This was a huge learning experience and I am grateful for the help I received from the support teams at Majestic and Ahrefs who, as a customer, responded to my questions honestly and openly.

The metrics dashboard

The Dashboard for All Metrics We’ve been tracking these 9 core metrics (albeit with improvements) since November of 2017. With a close eye on quality, size, and speed, we have methodically built an amazing backlink index, not driven by broad counts but instead by intricately defined and measured metrics. Let’s go through each of those metrics now.

Size matters

It does. Let’s admit it. The diminutive size of the Mozscape index has been a limitation for years. Maybe someday we will write a long post about all the efforts Moz has made to grow the index and what problems stood in our way, but that’s a post for a different day. The truth is, as much as quality matters, size is huge for a number of specific use-cases for a link index. Do you want to find all your bad links? Bigger is better. Do you want to find a lot of link opportunities? Bigger is better. So we came up with a number of metrics to help us determine where we were relative to our competitors. Here are each of our Size metrics.

Index Has URL

What is the likelihood a randomly selected URL is in our index vs. competitors?

This is one of my favorite metrics because I think it’s a pure reflection of index size. It answers the simple question of “if we grabbed a random URL on the web, what’s the likelihood an index knows about it?” However, you can see my learning curve in the graph (I was misreporting the Ahrefs API due to an error on my part) but once corrected, we had a nice reflection of the indexes. Let me restate this — these are comparisons in APIs, not in the web tools themselves. If I recall correctly, you can get more data out of running reports in Majestic, for example. However, I do think this demonstrates that Moz’s new Link Explorer is a strong contender, if not the largest, as we have led in this category every day except one. As of writing this post, Moz is winning.

Index Has Domain

What is the likelihood a randomly selected domain is in our index vs competitors?

When I said I would show “warts and all,” I meant it. Determining whether a domain is in an index isn’t as simple as you would think. For example, perhaps a domain has pages in the index, but not the homepage. Well, it took me a while to figure this one out, but by February of this year I had it down.

The scale of this graph is important to note as well. The variation is between 99.4 and 100% between Moz, Majestic, and Ahrefs over the last few months. This indicates just how close the link indexes are in terms of knowing about root domains. Majestic has historically tended to win this metric with near 100% coverage, but you would have to select 100 random domains to find one that Moz or Ahrefs doesn’t have information on. However, Moz’s continued growth has allowed us to catch up. While the indexes are super close, as of writing this post, Moz is winning.

Backlinks Per URL

Which index has the highest backlink count for a randomly selected URL?

This is a difficult metric to really pin down. Unfortunately, it isn’t easy to determine what backlinks should count and what shouldn’t. For example, imagine a URL has one page linking to it, but that page includes that link 100 times. Is that 100 backlinks or one? Well, it turns out that the different link indexes probably measure these types of scenarios differently and getting an exact definition out of each is like pulling teeth because the definition is so complicated and there are so many edge cases. At any rate, I think this is a great example of where we can show the importance of direction. Whatever the metrics actually are, Moz and Majestic are catching up to Ahrefs, which has been the leader for some time. As of writing this post, Ahrefs is winning.

Root Linking Domains Per URL

Which index reports the highest RLD count for a randomly selected URL?

Simple, right? No, even this metric has its nuances. What is a root linking domain? Do subdomains count if they are on subdomain sites like Blogspot or Wordpress.com? If so, how many sites are there on the web which should be treated this way? We used a machine learned methodology based on surveys, SERP data, and unique link data to determine our list, but each competitor does it differently. Thus, for this metric, direction really matters. As you can see, Moz has been steadily catching up and as of writing today, Moz is finally winning.

Backlinks Per Domain

Which index reports the highest backlink count for a randomly selected domain?

This metric was not kind to me, as I found a terrible mistake early on. (For the other techies reading this, I was storing backlink counts as INT(11) rather than BIGINT, which caused lots of ties for big domains when they were larger than the maximum number size because the database defaults to same highest number.) Nevertheless, Majestic has been stealing the show on this metric for a little while, although the story is deeper than that. Their dominance is such an outlier that it needs to be explained.

One of the hardest decisions a company has to make regarding its backlink index is how to handle spam. On one hand, spam is expensive to the index and probably ignored by Google. On the other hand, it is important for users to know if they have received tons of spammy links. I don’t think there is a correct answer to this question; each index just has to choose. A close examination of the reason why Majestic is winning (and continuing to increase their advantage) is because of a particularly nefarious Wikipedia-clone spam network. Any site with any backlinks from Wikipedia are getting tons of links from this network, which is causing their backlink counts to increase rapidly. If you are worried about these types of links, you need to go take a look at Majestic and look for links ending in primarily .space or .pro, including sites like tennis-fdfdbc09.pro, troll-warlord-64fa73ba.pro, and badminton-026a50d5.space. As of my last tests, there are over 16,000 such domains in this spam network within Majestic’s index. Majestic is winning this metric, but for purposes other than finding spam networks, it might not be the right choice.

Linking Root Domains Per Domain

Which index reports the highest LRD count for a randomly selected domain?

OK, this one took me a while to get just right. In the middle of this graph, I corrected an important error where I was looking at domains only for the root domain on Ahrefs rather than the root domain and all subdomains. This was unfair to Ahrefs until I finally got everything corrected in February. Since then, Moz has been aggressively growing its index, Majestic has picked up LRD counts through the previously discussed network but steadied out, and Ahrefs has remained relatively steady in size. Because of the “adversarial” nature of these metrics, it gives the false appearance that Ahrefs is dropping dramatically. They aren’t. They are still huge, and so is Majestic. The real takeaway is directional: Moz is growing dramatically relative to their networks. As of writing this post, Moz is winning.

Speed

Being the “first to know” is an important part in almost any industry and with link indexes it is no different. You want to know as soon as possible when a link goes up or goes down and how good that link is so you can respond if necessary. Here is our current speed metric.

FastCrawl

What is the likelihood the latest post from a randomly selected set of RSS feeds is indexed?

Unlike the other metrics discussed, the sampling here is a little bit different. Instead of using the randomization above, we make a random selection from a million+ known RSS feeds to find their latest post and check to see if they have been included in the various indexes of Moz and competitors. While there are a few errors in this graph, I think there is only one clear takeaway. Ahrefs is right about their crawlers. They are fast and they are everywhere. While Moz has increased our coverage dramatically and quickly, it has barely put a dent in this FastCrawl metric.

Now you may ask, if Ahrefs is so much faster at crawling, how can Moz catch up? Well, there are a couple of answers, but probably the biggest is that new URLs only represent a fraction of the web. Most URLs aren’t new. Let’s say two indexes (one new, one old) have a bunch of URLs they’re considering crawling. Both might prioritize URLs on important domains that they’ve never seen before. For the larger, older index, that will be a smaller percentage of that group because they have been crawling fast a long time. So, during the course of the day, a higher percentage of the old index’s crawl will be dedicated to re-crawl pages it already knows about. The new index can dedicate more of its crawl potential to new URLs.

It does, however, put the pressure on Moz now to improve crawl infrastructure as we catch up to and overcome Ahrefs in some size metrics. As of this post, Ahrefs is winning the FastCrawl metric.

Quality

OK, now we’re talking my language. This is the most important stuff, in my opinion. What’s the point of making a link graph to help people with SEO if it isn’t similar to Google? While we had to cut some of the metrics temporarily, we did get a few in that are really important and worth taking a look.

Domain Index Matches

What is the likelihood a random domain shares the same index status in Google and a link index?

Domain Index Matches seeks to determine when a domain shares the same index status with Google as it does in one of the competing link indexes. If Google ignores a domain, we want to ignore a domain. If Google indexes a domain, we want to index a domain. If we have a domain Google doesn’t, or vice versa, that is bad.

This graph is a little harder to read because of the scale (the first few days of tracking were failures), but what we actually see is a statistically insignificant difference between Moz and our competitors. We can make it look more competitive than it really is if we just calculate wins and losses, but we have to take into account an error in the way we determined Ahrefs index status up until around February. To do this, I show wins/losses for all time vs. wins/losses over the last few months.

As you can see, Moz wins the “all time,” but Majestic has been winning more over the last few months. Nevertheless, these are quite insignificant, often being the difference between one or two domain index statuses out of 100. Just like the Index Has Domain metric we discussed above, nearly every link index has nearly every domain, and looking at the long-term day-by-day graph shows just how incredibly close they are. However, if we are keeping score, as of today (and the majority of the last week), Moz is winning this metric.

Domain URL Matches

What is the likelihood a random URL shares the same index status in Google as in a link index?

This one is the most important quality metric, in my honest opinion. Let me explain this one a little more. It’s one thing to say that your index is really big and has lots of URLs, but does it look like Google’s? Do you crawl the web like Google? Do you ignore URLs Google ignores while crawling URLs that Google crawls? This is a really important question and sets the foundation for a backlink index that is capable of producing good relational metrics like PA and DA.

This is one of the metrics where Moz just really shines. Once we corrected for an error in the way we were checking Ahrefs, we could accurately determine whether our index was more or less like Google’s than our competitors. Since the beginning of tracking, Moz Link Explorer has never been anything but #1. In fact, we only had 3 ties with Ahrefs and never lost to Majestic. We have custom-tailored our crawl to be as much like Google as possible, and it has paid off. We ignore the types of URLs Google hates, and seek out the URLs Google loves. We believe this will pay huge dividends in the long run for our customers as we expand our feature set based on an already high-quality, huge index.

The Link Index Olympics

Alright, so we’ve just spent a lot of time delving into these individual metrics, so I think it’s probably worth it to put these things into an easy-to-understand context. Let’s pretend for a moment that this is the Link Index Olympics, and no matter how much you win or lose by, it determines whether you receive a gold, bronze or silver medal. I’m writing this on Wednesday, April 25th. Let’s see how things play out if the Olympics happened today:

As you can see, Moz takes the gold in six of the nine metrics we measure, two silvers, and one bronze. Moreover, we’re continuing to grow and improve our index daily. As most of the above graphs indicate, we tend to be improving relative to our competitors, so I hope that by the time of publication in a week or so our scores will even be better. But the reality is that based on the metrics above, our link index quality, quantity, and speed are excellent. I’m not going to say our index is the best. I don’t think that’s something anyone can really even know and is highly dependent upon the specific use case. But I can say this — it is damn good. In fact, Moz has won or tied for the “gold” 27 out of the last 30 days.

What’s next?

We are going for gold. All gold. All the time. There’s a ton of great stuff on the horizon. Look forward to regular additions of features to Link Explorer based on the data we already have, faster crawling, and improved metrics all around (PA, DA, Spam Score, and potentially some new ones in the works!) There’s way too much to list here. We’ve come a long way but we know we have a ton more to do. These are exciting times!

A bit about DA and PA

Domain Authority and Page Authority are powered by our link index. Since we’re moving from an old, much smaller index to a larger, much faster index, you may see small or large changes to DA and PA depending on what we’ve crawled in this new index that the old Mozscape index missed. Your best bet is just to compare yourselves to your competitors. Moreover, as our index grows, we have to constantly adjust the model to address the size and shape of our index, so both DA and PA will remain in beta a little while. They are absolutely ready for primetime, but that doesn’t mean we don’t intend to continue to improve them over the next few months as our index growth stabilizes. Thanks!

Quick takeaways

Congratulations for getting through this post, but let me give you some key takeaways:

The new Moz Link Explorer is powered by an industry-leading link graph and we have the data to prove it.
Tell your data providers to put their math where their mouth is. You deserve honest, well-defined metrics, and it is completely right of you to demand it from your data providers.
Doing things right requires that we sweat the details. I cannot begin to praise our leadership, SMEs, designers, and engineers who have asked tough questions, dug in, and solved tough problems, refusing to build anything but the best. This link index proves that Moz can solve the hardest problem in SEO: indexing the web. If we can do that, you can only expect great things ahead.

Thanks for taking the time to read! I look forward to answering questions in the comments or you can reach me on Twitter at @rjonesx.

Also, I would like to thank the non-Mozzers who offered peer reviews and critiques of this post in advance — they do not necessarily endorse any of the conclusions, but provided valuable feedback. In particular, I would like to thank Patrick Stox of IBM, JR Oakes of Adapt Partners, Alexander Darwin of HomeAgency, Paul Shapiro of Catalyst SEM, the person I most trust in SEO, Tony Spencer, and a handful of others who wished to remain anonymous.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Faster, Fresher, Better: Announcing Link Explorer, Moz’s New Link Building Tool

Posted by SarahBird

More link data. Fresher link data. Faster link data.

Today, I’m delighted to share that after eons of hard work, blood, sweat, tears, and love, Moz is taking a major step forward on our commitment to provide the best SEO tools money can buy.

We’ve rebuilt our link technology from the ground up and the data is now broadly available throughout Moz tools. It’s bigger, fresher, and much, much faster than our legacy link tech. And we’re just getting started! The best way to quickly understand the potential power of our revolutionary new link tech is to play with the beta of our Link Explorer.

Introducing Link Explorer, the newest addition to the Moz toolset!

We’ve heard your frustrations with Open Site Explorer and we know that you want more from Moz and your link building tools. OSE has done more than put in its time. Groundbreaking when it launched in 2008, it’s worked long and hard bring link data to the masses. It deserves the honor of a graceful retirement.

OSE represents our past; the new Link Explorer is our fast, innovative, ambitious future.

Here are some of my favorite things about the Link Explorer beta:

It’s 20x larger and 30x fresher than OSE (RIP)
Despite its huge index size, the app is lightning fast! I can’t stand waiting so this might be my number-one fav improvement.
We’re introducing Link Tracking Lists to make managing your link building efforts a breeze. Sometimes the simple things make the biggest difference, like when they started making vans with doors on each side. You’ll never go back.
Link Explorer includes historic data, a painful gap in OSE. Studying your gained/lost linking domains is fast and easy.
The new UX surfaces competitive insights much more quickly
Increases the size and freshness of the index improved the quality of Domain Authority and Spam Score. Voilà.

All this, and we’re only in beta.

Dive into your link data now!

Here’s a deeper dive into my favorites:

#1: The sheer size, quality, and speed of it all

We’re committed to data quality. Here are some ways that shows up in the Moz tools:

When we collect rankings, we evaluate the natural first page of rankings to ensure that the placement and content of featured snippets and other SERP features are correctly situated (as can happen when ranking are collected in 50- or 100-page batches). This is more expensive, but we think the tradeoff is worth it.
We were the first to build a hybrid search volume model using clickstream data. We still believe our model is the most accurate.
Our SERP corpus, which powers Keywords by Site, is completely refreshed every two weeks. We actively update up to 15 million of the keywords each month to remove keywords that are no longer being searched and replace them with trending keywords and terms. This helps keep our keyword data set fresh and relevant.

The new Link Explorer index extends this commitment to data quality. OSE wasn’t cutting it and we’re thrilled to unleash this new tech.

Link Explorer is over 20x larger and 30x fresher than our legacy link index. Bonus points: the underlying technology is very cost-efficient, making it much less expensive for us to scale over time. This frees up resources to focus on feature delivery. BOOM!

One of my top pet peeves is waiting. I feel physical pain while waiting in lines and for apps to load. I can’t stand growing old waiting for a page to load (amirite?).

The new Link Explorer app is delightfully, impossibly fast. It’s like magic. That’s how link research should be. Magical.

#2: Historical data showing discovered and lost linking domains

If you’re a visual person, this report gives you an immediate idea of how your link building efforts are going. A spike you weren’t expecting could be a sign of spam network monkey business. Deep-dive effortlessly on the links you lost and gained so you can spend your valuable time doing thoughtful, human outreach.

#3: Link Tracking Lists

Folks, this is a big one. Throw out (at least one of… ha. ha.) those unwieldy spreadsheets and get on board with Link Tracking Lists, because these are the future. Have you been chasing a link from a particular site? Wondering if your outreach emails have borne fruit yet? Want to know if you’ve successfully placed a link, and how you’re linking? Link Tracking Lists cut out a huge time-suck when it comes to checking back on which of your target sites have actually linked back to you.

Why announce the beta today?

We’re sharing this now for a few reasons:

The new Link Explorer data and app have been available in beta to a limited audience. Even with a quiet, narrow release, the SEO community has been talking about it and asking good questions about our plans. Now that the Link Explorer beta is in broad release throughout all of Moz products and the broader Moz audience can play with it, we’re expecting even more curiosity and excitement.
If you’re relying on our legacy link technology, this is further notice to shift your applications and reporting to the new-and-improved tech. OSE will be retired soon! We’re making it easier for API customers to get the new data by providing a translation layer for the legacy API.
We want and need your feedback. We are committed to building the very best link building tool on the planet. You can expect us to invest heavily here. We need your help to guide our efforts and help us make the most impactful tradeoffs. This is your invitation to shape our roadmap.

Today’s release of our new Link Explorer technology is a revolution in Moz tools, not an evolution. We’ve made a major leap forward in our link index technology that delivers a ton of immediate value to Moz customers and the broader Moz Community.

Even though there are impactful improvements around the corner, this ambitious beta stands on its own two feet. OSE wasn’t cutting it and we’re proud of this new, fledgling tech.

What’s on the horizon for Link Explorer?

We’ve got even more features coming in the weeks and months ahead. Please let us know if we’re on the right track.

Link Building Assistant: a way to quickly identify new link acquisition opportunities
A more accurate and useful Link Intersect feature
Link Alerts to notify you when you get a link from a URL you were tracking in a list
Changes to how we count redirects: Currently we don’t count links to a redirect as links to the target of the redirect (that’s a lot of redirects), but we have this planned for the future.
Significantly scaling up our crawling to further improve freshness and size

Go forth, and explore:

Try the new Link Explorer!

Tomorrow Russ Jones will be sharing a post that discusses the importance of quality metrics when it comes to a link index, and don’t miss our pinned Q&A post answering questions about Domain Authority and Page Authority changes or our FAQ in the Help Hub.

We’ll be releasing early and often. Watch this space, and don’t hold back your feedback. Help us shape the future of Links at Moz. We’re listening!

Continue reading →

Content for Answers: The Inverted Pyramid – Whiteboard Friday

Posted by Dr-Pete

If you’ve been searching for a quick hack to write content for featured snippets, this isn’t the article for you. But if you’re looking for lasting results and a smart tactic to increase your chances of winning a snippet, you’re definitely in the right place.

Borrowed from journalism, the inverted pyramid method of writing can help you craft intentional, compelling, rich content that will help you rank for multiple queries and win more than one snippet at a time. Learn how in this Whiteboard Friday starring the one and only Dr. Pete!

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Hey, Moz fans, Dr. Pete here. I’m the Marketing Scientist at Moz and visiting you from not-so-sunny Chicago in the Seattle office. We’ve talked a lot in the last couple years in my blog posts and such about featured snippets.

So these are answers that kind of cross with organic. So it’s an answer box, but you get the attribution and the link. Britney has done some great Whiteboard Fridays, the last couple, about how you do research for featured snippets and how you look for good questions to answer. But I want to talk about something that we don’t cover very much, which is how to write content for answers.

The inverted pyramid style of content writing

It’s tough, because I’m a content marketer and I don’t like to think that there’s a trick to content. I’m afraid to give people the kind of tricks that would have them run off and write lousy, thin content. But there is a technique that works that I think has been very effective for featured snippets for writing for questions and answers. It comes from the world of journalism, which gives me a little more faith in its credibility. So I want to talk to you about that today. That’s called the inverted pyramid.

Content for Answers

1. Start with the lead

It looks something like this. When you write a story as a journalist, you start with the lead. You lead with the lead. So if we have a story like “Penguins Rob a Bank,” which would be a strange story, we want to put that right out front. That’s interesting. Penguins rob a bank, that’s all you need to know. The thing about it is, and this is true back to print, especially when we had to buy each newspaper. We weren’t subscribers. But definitely on the web, you have to get people’s attention quickly. You have to draw them in. You have to have that headline.

2. Go into the details

So leading with the lead is all about pulling them in to see if they’re interested and grabbing their attention. The inverted pyramid, then you get into the smaller pieces. Then you get to the details. You might talk about how many penguins were there and what bank did they rob and how much money did they take.

3. Move to the context

Then you’re going to move to the context. That might be the history of penguin crime in America and penguin ties to the mafia and what does this say about penguin culture and what are we going to do about this. So then it gets into kind of the speculation and the value add that you as an expert might have.

How does this apply to answering questions for SEO?

So how does this apply to answering questions in an SEO context?

Content for Answers

Lead with the answer, get into the details and data, then address the sub-questions.

Well, what you can do is lead with the answer. If somebody’s asked you a question, you have that snippet, go straight to the summary of the answer. Tell them what they want to know and then get into the details and get into the data. Add those things that give you credibility and that show your expertise. Then you can talk about context.

But I think what’s interesting with answers — and I’ll talk about this in a minute — is getting into these sub-questions, talking about if you have a very big, broad question, that’s going to dive up into a lot of follow-ups. People who are interested are going to want to know about those follow-ups. So go ahead and answer those.

If I win a featured snippet, will people click on my answer? Should I give everything away?

Content for Answers

So I think there’s a fear we have. What if we answer the question and Google puts it in that box? Here’s the question and that’s the query. It shows the answer. Are people going to click? What’s going to happen? Should we be giving everything away? Yes, I think, and there are a couple reasons.

Questions that can be very easily answered should be avoided

First, I want you to be careful. Britney has gotten into some of this. This is a separate topic on its own. You don’t always want to answer questions that can be very easily answered. We’ve already seen that with the Knowledge Graph. Google says something like time and date or a fact about a person, anything that can come from that Knowledge Graph. “How tall was Abraham Lincoln?” That’s answered and done, and they’re already replacing those answers.

Answer how-to questions and questions with rich context instead

So you want to answer the kinds of things, the how-to questions and the why questions that have a rich enough context to get people interested. In those cases, I don’t think you have to be afraid to give that away, and I’m going to tell you why. This is more of a UX perspective. If somebody asks this question and they see that little teaser of your answer and it’s credible, they’re going to click through.

“Giving away” the answer builds your credibility and earns more qualified visitors

Content for Answers

So here you’ve got the penguin. He’s flushed with cash. He’s looking for money to spend. We’re not going to worry about the ethics of how he got his money. You don’t know. It’s okay. Then he’s going to click through to your link. You know you have your branding and hopefully it looks professional, Pyramid Inc., and he sees that question again and he sees that answer again.

Giving the searcher a “scent trail” builds trust

If you’re afraid that that’s repetitive, I think the good thing about that is this gives him what we call a scent trail. He can see that, “You know what? Yes, this is the page I meant to click on. This is relevant. I’m in the right place.” Then you get to the details, and then you get to the data and you give this trail of credibility that gives them more to go after and shows your expertise.

People who want an easy answer aren’t the kind of visitors that convert

I think the good thing about that is we’re so afraid to give something away because then somebody might not click. But the kind of people who just wanted that answer and clicked, they’re not the kind of people that are going to convert. They’re not qualified leads. So these people that see this and see it as credible and want to go read more, they’re the qualified leads. They’re the kind of people that are going to give you that money.

So I don’t think we should be afraid of this. Don’t give away the easy answers. I think if you’re in the easy answer business, you’re in trouble right now anyway, to be honest. That’s a tough topic. But give them something that guides them to the path of your answer and gives them more information.

How does this tactic work in the real world?

Thin content isn’t credible.

Content for Answers

So I’m going to talk about how that looks in a more real context. My fear is this. Don’t take this and run off and say write a bunch of pages that are just a question and a paragraph and a ton of thin content and answering hundreds and hundreds of questions. I think that can really look thin to Google. So you don’t want pages that are like question, answer, buy my stuff. It doesn’t look credible. You’re not going to convert. I think those pages are going to look thin to Google, and you’re going to end up spinning out many, many hundreds of them. I’ve seen people do that.

Use the inverted pyramid to build richer content and lead to your CTA

Content for Answers

What I’d like to see you do is craft this kind of question page. This is something that takes a fair amount of time and effort. You have that question. You lead with that answer. You’re at the top of the pyramid. Get into the details. Get into the things that people who are really interested in this would want to know and let them build up to that. Then get into data. If you have original data, if you have something you can contribute that no one else can, that’s great.

Then go ahead and answer those sub-questions, because the people who are really interested in that question will have follow-ups. If you’re the person who can answer that follow-up, that makes for a very, very credible piece of content, and not just something that can rank for this snippet, but something that really is useful for anybody who finds it in any way.

So I think this is great content to have. Then if you want some kind of call to action, like a “Learn More,” that’s contextual, I think this is a page that will attract qualified leads and convert.

Moz’s example: What is a Title Tag?

So I want to give you an example. This is something we’ve used a lot on Moz in the Learning Center. So, obviously, we have the Moz blog, but we also have these permanent pages that answer kind of the big questions that people always have. So we have one on the title tag, obviously a big topic in SEO.

Content for Answers

Here’s what this page looks like. So we go right to the question: What is a title tag? We give the answer: A title tag is an HTML element that does this and this and is useful for SEO, etc. Right there in the paragraph. That’s in the featured snippet. That’s okay. If that’s all someone wants to know and they see that Moz answered that, great, no problem.

But naturally, the people who ask that question, they really want to know: What does this do? What’s it good for? How does it help my SEO? How do I write one? So we dug in and we ended up combining three or four pieces of content into one large piece of content, and we get into some pretty rich things. So we have a preview tool that’s been popular. We give a code sample. We show how it might look in HTML. It gives it kind of a visual richness. Then we start to get into these sub-questions. Why are title tags important? How do I write a good title tag?

One page can gain the ability to rank for hundreds of questions and phrases

What’s interesting, because I think sometimes people want to split up all the questions because they’re afraid that they have to have one question per page, what’s interesting is that I think looked the other day, this was ranking in our 40 million keyword set for over 200 phrases, over 200 questions. So it’s ranking for things like “what is a title tag,” but it’s also ranking for things like “how do I write a good title tag.” So you don’t have to be afraid of that. If this is a rich, solid piece of content that people are going to, you’re going to rank for these sub-questions, in many cases, and you’re going to get featured snippets for those as well.

Then, when people have gotten through all of this, we can give them something like, “Hey, Moz has some of these tools. You can help write richer title tags. We can check your title tags. Why don’t you try a free 30-day trial?” Obviously, we’re experimenting with that, and you don’t want to push too hard, but this becomes a very rich piece of content. We can answer multiple questions, and you actually have multiple opportunities to get featured snippets.

So I think this inverted pyramid technique is legitimate. I think it can help you write good content that’s a win-win. It’s good for SEO. It’s good for your visitors, and it will hopefully help you land some featured snippets.

So I’d love to hear about what kind of questions you’re writing content for, how you can break that up, how you can answer that, and I’d love to discuss that with you. So we’ll see you in the comments. Thank you.

Video transcription by Speechpad.com

Continue reading →

Win a Ticket + Lodging to MozCon 2018!

Posted by ErinMcCaul

Have you been wanting to come to MozCon but just can’t swing the budget? Want to take a selfie with Roger, meet like-minded friends at our afterparties, and learn from leading industry experts? I’m thrilled to announce that you can do it all by winning a free ticket to join us at MozCon this July!

Those front-row seats look awfully cushy.

I’m one of the behind-the-scenes house elves who helps make MozCon happen, and I’m here to tell you everything you need to know about entering to win!

To enter, just submit a unique piece of content telling us why we should send you to MozCon by Sunday May 6th at 5pm PDT. Make sure your entry is both original and creative — the Moz staff will review all submissions and vote on the winner! If you’re chosen, we’ll pick up the tab for your registration and accommodations at the Grand Hyatt. You’ll also have a reserved VIP seat in our front row, and an invite to mix and mingle at our pre-event MozCon speakers’ dinner!

Without further ado, here’s the scoop:

Step 1: Create!

Create a unique, compelling piece of content telling us why you want to come to MozCon. Past ideas have included:

Drawings
Videos (must be one minute or less)
Blog posts
Original songs
Books
Slide decks
Anything else you can cook up!

Don’t feel limited by these examples. Is this the year we’ll see a Lego Roger stop-motion film, a MozCon-inspired show tune, or Roger-themed sugar cookies? The sky’s the limit, my friends! (But think hard about trying your hand at those cookies.)

Step 2: Submit!

Once you’re ready to throw your hat in the game, tweet us a link @Moz and use the hashtag #MozConVIP by Sunday May 6th at 5pm PDT. Make sure to follow the instructions, and include your name and email address somewhere easily visible within your content. To keep things fair, there will be no exceptions to the rules. We need to be able to contact you if you’re our lucky winner!

Let’s recap:

The submission deadline is Sunday May 6th at 5pm PDT.
Mozzers will vote on all the entries based on the creativity and uniqueness of the content
We’ll announce the winning entry from @Moz via Twitter on Friday, May 11. You must be able to attend MozCon, July 9–11 2018, in Seattle. Prizes are non-transferable.
All submissions must adhere to the MozCon Code of Conduct
Content is void where prohibited by law.
The value of the prize will be reported for tax purposes as required by law; the winner will receive an IRS form 1099 at the end of the calendar year and a copy of such form will be filed with the IRS. The winner is solely responsible for reporting and paying any and all applicable taxes related to the prizes and paying any expenses associated with any prize which are not specifically provided for in the official rules.

Our lucky winner will receive:

A free ticket to MozCon 2018, including optional VIP front-row seating and an invitation to our speakers’ dinner (valued at $1,500+)
Accommodations with a suite upgrade at the Grand Hyatt from July 8–12, 2018 (valued at $1,300+)

Alright, that’s wrap. I can’t wait to see what you folks come up with! Happy creating!

Continue reading →

How We Got a 32% Organic Traffic Boost from 4 On-Page SEO Changes [Case Study]

Posted by WallStreetOasis.com

My name is Patrick Curtis, and I’m the founder and CEO of Wall Street Oasis, an online community focused on careers in finance founded in 2006 with over 2 million visits per month.

User-generated content and long-tail organic traffic is what has built our business and community over the last 12+ years. But what happens if you wake up one day and realize that your growth has suddenly stopped? This is what happened to us back in November 2012.

In this case study, I’ll highlight two of our main SEO problems as a large forum with over 200,000 URLs, then describe two solutions that finally helped us regain our growth trajectory — almost five years later.

Two main problems

1. Algorithm change impacts

Ever since November 2012, Google’s algo changes have seemed to hurt many online forums like ours. Even though our traffic didn’t decline, our growth dropped to the single-digit percentages. No matter what we tried, we couldn’t break through our “plateau of pain” (I call it that because it was a painful ~5 years trying).

Plateau of pain: no double-digit growth from late 2012 onward

2. Quality of user-generated content

Related to the first problem, 99% of our content is user-generated (UGC) which means the quality is mixed (to put it kindly). Like most forum-based sites, some of our members create incredible pieces of content, but a meaningful percentage of our content is also admittedly thin and/or low-quality.

How could we deal with over 200,000 pieces of content efficiently and try to optimize them without going bankrupt? How could we “clean the cruft” when there was just so much of it?

Fighting back: Two solutions (and one statistical analysis to show how it worked)

1. “Merge and Purge” project

Our goal was to consolidate weaker “children” URLs into stronger “master” URLs to utilize some of the valuable content Google was ignoring and to make the user experience better.

For example, instead of having ~20 discussions on a specific topic (each with an average of around two to three comments) across twelve years, we would consolidate many of those discussions into the strongest two or three URLs (each with around 20–30 comments), leading to a much better user experience with less need to search and jump around the site.

Changes included taking the original post and comments from a “child” URL and merging them into the “master” URL, unpublishing the child URL, removing the child from sitemap, and adding a 301 redirect to the master.

Below is an example of how it looked when we merged a child into our popular Why Investment Banking discussion. We highlighted the original child post as a Related Topic with a blue border and included the original post date to help avoid confusion:

Highlighting a related topic child post

This was a massive project that involved some complex Excel sorting, but after 18 months and about $50,000 invested (27,418 children merged into 8,515 masters to date), the user experience, site architecture, and organization is much better.

Initial analysis suggests that the percentage gain from merging weak children URLs into stronger masters has given us a boost of ~10–15% in organic search traffic.

2. The Content Optimization Team

The goal of this initiative was to take the top landing pages that already existed on Wall Street Oasis and make sure that they were both higher quality and optimized for SEO. What does that mean, exactly, and how did we execute it?

We needed a dedicated team that had some baseline industry knowledge. To that end, we formed a team of five interns from the community, due to the fact that they were familiar with the common topics.

We looked at the top ~200 URLs over the previous 90 days (by organic landing page traffic) and listed them out in a spreadsheet:

Spreadsheet of organic traffic to URLs

We held five main hypotheses of what we believed would boost organic traffic before we started this project:

Longer content with subtitles: Increasing the length of the content and adding relevant H2 and H3 subtitles to give the reader more detailed and useful information in an organized fashion.
Changing the H1 so that it matched more high-volume keywords using Moz’s Keyword Explorer.
Changing the URL so that it also was a better match to high-volume and relevant keywords.
Adding a relevant image or graphic to help break up large “walls of text” and enrich the content.
Adding a relevant video similar to the graphic, but also to help increase time on page and enrich the content around the topic.

We tracked all five of these changes across all 200 URLs (see image above). After a statistical analysis, we learned that four of them helped our organic search traffic and one actually hurt.

Summary of results from our statistical analysis

Increasing the length of the articles and adding relevant subtitles (H2s, H3s, and H4s) to help organize the content gives an average boost to organic traffic of 14%
Improving the title or H1 of the URLs yields a 9% increase on average
Changing the URL decreased traffic on average by 38% (this was a smaller sample size — we stopped doing this early on for obvious reasons)
Including a relevant video increases the organic traffic by 4% on average, while putting an image up increases it by 5% on average.

Overall, the boost to organic traffic — should we continue to make these four changes (and avoid changing the URL) — is 32% on average.

Key takeaway:

Over half of that gain (~18%) comes from changes that require a minimal investment of time. For teams trying to optimize on-page SEO across a large number of pages, we recommend focusing on the top landing pages first and easy wins before deciding if further investment is warranted.

We hope this case study of our on-page SEO efforts was interesting, and I’m happy to answer any questions you have in the comments!

Continue reading →

Page 30 of 925
« First
…
10
20
«
28
29
30
31
32
»
40
50
60
…
Last »