About frans

Website:
frans has written 4625 articles so far, you can find them below.

Big Data, Big Problems: 4 Major Link Indexes Compared

Posted by russangular

Given this blog’s readership, chances are good you will spend some time this week looking at backlinks in one of the growing number of link data tools. We know backlinks continue to be one of, if not the most important parts of Google’s ranking algorithm. We tend to take these link data sets at face value, though, in part because they are all we have. But when your rankings are on the line, is there a better way to get at which data set is the best? How should we go about assessing these different link indexes like Moz, Majestic, Ahrefs and SEMrush for quality? Historically, there have been 4 common approaches to this question of index quality…

  • Breadth: We might choose to look at the number of linking root domains any given service reports. We know that referring domains correlates strongly with search rankings, so it makes sense to judge a link index by how many unique domains it has discovered and indexed.
  • Depth: We also might choose to look at how deep the web has been crawled, looking more at the total number of URLs in the index, rather than the diversity of referring domains.
  • Link Overlap: A more sophisticated approach might count the number of links an index has in common with Google Webmaster Tools.
  • Freshness: Finally, we might choose to look at the freshness of the index. What percentage of links in the index are still live?

There are a number of really good studies (some newer than others) using these techniques that are worth checking out when you get a chance:

  • BuiltVisible analysis of Moz, Majestic, GWT, Ahrefs and Search Metrics
  • SEOBook comparison of Moz, Majestic, Ahrefs, and Ayima
  • MatthewWoodward study of Ahrefs, Majestic, Moz, Raven and SEO Spyglass
  • Marketing Signals analysis of Moz, Majestic, Ahrefs, and GWT
  • RankAbove comparison of Moz, Majestic, Ahrefs and Link Research Tools
  • StoneTemple study of Moz and Majestic

While these are all excellent at addressing the methodologies above, there is a particular limitation with all of them. They miss one of the most important metrics we need to determine the value of a link index: proportional representation to Google’s link graph . So here at Angular Marketing, we decided to take a closer look.

Proportional representation to Google Search Console data

So, why is it important to determine proportional representation? Many of the most important and valued metrics we use are built on proportional models. PageRank, MozRank, CitationFlow and Ahrefs Rank are proportional in nature. The score of any one URL in the data set is relative to the other URLs in the data set. If the data set is biased, the results are biased.

A Visualization

Link graphs are biased by their crawl prioritization. Because there is no full representation of the Internet, every link graph, even Google’s, is a biased sample of the web. Imagine for a second that the picture below is of the web. Each dot represents a page on the Internet, and the dots surrounded by green represent a fictitious index by Google of certain sections of the web.

Of course, Google isn’t the only organization that crawls the web. Other organizations like Moz, Majestic, Ahrefs, and SEMrush have their own crawl prioritizations which result in different link indexes.

In the example above, you can see different link providers trying to index the web like Google. Link data provider 1 (purple) does a good job of building a model that is similar to Google. It isn’t very big, but it is proportional. Link data provider 2 (blue) has a much larger index, and likely has more links in common with Google that link data provider 1, but it is highly disproportional. So, how would we go about measuring this proportionality? And which data set is the most proportional to Google?

Methodology

The first step is to determine a measurement of relativity for analysis. Google doesn’t give us very much information about their link graph. All we have is what is in Google Search Console. The best source we can use is referring domain counts. In particular, we want to look at what we call referring domain link pairs. A referring domain link pair would be something like ask.com->mlb.com: 9,444 which means that ask.com links to mlb.com 9,444 times.

Steps

  1. Determine the root linking domain pairs and values to 100+ sites in Google Search Console
  2. Determine the same for Ahrefs, Moz, Majestic Fresh, Majestic Historic, SEMrush
  3. Compare the referring domain link pairs of each data set to Google, assuming a Poisson Distribution
  4. Run simulations of each data set’s performance against each other (ie: Moz vs Maj, Ahrefs vs SEMrush, Moz vs SEMrush, et al.)
  5. Analyze the results

Results

When placed head-to-head, there seem to be some clear winners at first glance. In head-to-head, Moz edges out Ahrefs, but across the board, Moz and Ahrefs fare quite evenly. Moz, Ahrefs and SEMrush seem to be far better than Majestic Fresh and Majestic Historic. Is that really the case? And why?

It turns out there is an inversely proportional relationship between index size and proportional relevancy. This might seem counterintuitive, shouldn’t the bigger indexes be closer to Google? Not Exactly.

What does this mean?

Each organization has to create a crawl prioritization strategy. When you discover millions of links, you have to prioritize which ones you might crawl next. Google has a crawl prioritization, so does Moz, Majestic, Ahrefs and SEMrush. There are lots of different things you might choose to prioritize…

  • You might prioritize link discovery. If you want to build a very large index, you could prioritize crawling pages on sites that have historically provided new links.
  • You might prioritize content uniqueness. If you want to build a search engine, you might prioritize finding pages that are unlike any you have seen before. You could choose to crawl domains that historically provide unique data and little duplicate content.
  • You might prioritize content freshness. If you want to keep your search engine recent, you might prioritize crawling pages that change frequently.
  • You might prioritize content value, crawling the most important URLs first based on the number of inbound links to that page.

Chances are, an organization’s crawl priority will blend some of these features, but it’s difficult to design one exactly like Google. Imagine for a moment that instead of crawling the web, you want to climb a tree. You have to come up with a tree climbing strategy.

  • You decide to climb the longest branch you see at each intersection.
  • One friend of yours decides to climb the first new branch he reaches, regardless of how long it is.
  • Your other friend decides to climb the first new branch she reaches only if she sees another branch coming off of it.

Despite having different climb strategies, everyone chooses the same first branch, and everyone chooses the same second branch. There are only so many different options early on.

But as the climbers go further and further along, their choices eventually produce differing results. This is exactly the same for web crawlers like Google, Moz, Majestic, Ahrefs and SEMrush. The bigger the crawl, the more the crawl prioritization will cause disparities. This is not a deficiency; this is just the nature of the beast. However, we aren’t completely lost. Once we know how index size is related to disparity, we can make some inferences about how similar a crawl priority may be to Google.

Unfortunately, we have to be careful in our conclusions. We only have a few data points with which to work, so it is very difficult to be certain regarding this part of the analysis. In particular, it seems strange that Majestic would get better relative to its index size as it grows, unless Google holds on to old data (which might be an important discovery in and of itself). It is most likely that at this point we can’t make this level of conclusion.

So what do we do?

Let’s say you have a list of domains or URLs for which you would like to know their relative values. Your process might look something like this…

  • Check Open Site Explorer to see if all URLs are in their index. If so, you are looking metrics most likely to be proportional to Google’s link graph.
  • If any of the links do not occur in the index, move to Ahrefs and use their Ahrefs ranking if all you need is a single PageRank-like metric.
  • If any of the links are missing from Ahrefs’s index, or you need something related to trust, move on to Majestic Fresh.
  • Finally, use Majestic Historic for (by leaps and bounds) the largest coverage available.

It is important to point out that the likelihood that all the URLs you want to check are in a single index increases as the accuracy of the metric decreases. Considering the size of Majestic’s data, you can’t ignore them because you are less likely to get null value answers from their data than the others. If anything rings true, it is that once again it makes sense to get data from as many sources as possible. You won’t get the most proportional data without Moz, the broadest data without Majestic, or everything in-between without Ahrefs.

What about SEMrush? They are making progress, but they don’t publish any relative statistics that would be useful in this particular case. Maybe we can hope to see more from them soon given their already promising index!

Recommendations for the link graphing industry

All we hear about these days is big data; we almost never hear about good data. I know that the teams at Moz, Majestic, Ahrefs, SEMrush and others are interested in mimicking Google, but I would love to see some organization stand up against the allure of more data in favor of better data—data more like Google’s. It could begin with testing various crawl strategies to see if they produce a result more similar to that of data shared in Google Search Console. Having the most Google-like data is certainly a crown worth winning.

Credits

Thanks to Diana Carter at Angular for assistance with data acquisition and Andrew Cron with statistical analysis. Thanks also to the representatives from Moz, Majestic, Ahrefs, and SEMrush for answering questions about their indices.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Help Us Improve the Moz Blog: 2015 Reader Survey

Posted by Trevor-Klein

In late 2013, we asked you all about your experience with the Moz Blog. It was the first time we’d collected direct feedback from our readers in more than three years—an eternity in the marketing industry. With the pace of change in our line of work (not to mention your schedules and reading habits) we didn’t want to wait that long again, so we’re taking this opportunity to ask you how well we’re keeping up.

Our mission is to help you all become better marketers, and to do that, we need to know more about you. What challenges do you all face? What are your pain points? Your day-to-day frustrations? If you could learn more about one or two (or three) topics, what would those be?

If you’ll help us out by taking this five-minute survey, we can make sure we’re offering the most useful and valuable content we possibly can. When we’re done looking through the responses, we’ll follow up with a post about what we learned.

Thanks, everyone; we’re excited to see what you have to say!

Can’t see the survey? Click here to take it in a new tab.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

How to Rid Your Website of Six Common Google Analytics Headaches

Posted by amandaecking

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of Moz, Inc.

I’ve been in and out of Google Analytics (GA) for the past five or so years agency-side. I’ve seen three different code libraries, dozens of new different features and reports roll out, IP addresses stop being reported, and keywords not-so-subtly phased out of the free platform.

Analytics has been a focus of mine for the past year or so—mainly, making sure clients get their data right. Right now, our new focus is closed loop tracking, but that’s a topic for another day. If you’re using Google Analytics, and only Google Analytics for the majority of your website stats, or it’s your primary vehicle for analysis, you need to make sure it’s accurate.

Not having data pulling in or reporting properly is like building a house on a shaky foundation: It doesn’t end well. Usually there are tears.

For some reason, a lot of people, including many of my clients, assume everything is tracking properly in Google Analytics… because Google. But it’s not Google who sets up your analytics. People do that. And people are prone to make mistakes.

I’m going to go through six scenarios where issues are commonly encountered with Google Analytics.

I’ll outline the remedy for each issue, and in the process, show you how to move forward with a diagnosis or resolution.

1. Self-referrals

This is probably one of the areas we’re all familiar with. If you’re seeing a lot of traffic from your own domain, there’s likely a problem somewhere—or you need to extend the default session length in Google Analytics. (For example, if you have a lot of long videos or music clips and don’t use event tracking; a website like TEDx or SoundCloud would be a good equivalent.)

Typically one of the first things I’ll do to help diagnose the problem is include an advanced filter to show the full referrer string. You do this by creating a filter, as shown below:

Filter Type: Custom filter > Advanced
Field A: Hostname
Extract A: (.*)
Field B: Request URI
Extract B: (.*)
Output To: Request URI
Constructor: $A1$B1

You’ll then start seeing the subdomains pulling in. Experience has shown me that if you have a separate subdomain hosted in another location (say, if you work with a separate company and they host and run your mobile site or your shopping cart), it gets treated by Google Analytics as a separate domain. Thus, you ‘ll need to implement cross domain tracking. This way, you can narrow down whether or not it’s one particular subdomain that’s creating the self-referrals.

In this example below, we can see all the revenue is being reported to the booking engine (which ended up being cross domain issues) and their own site is the fourth largest traffic source:

self-referrals-2.png

I’ll also a good idea to check the browser and device reports to start narrowing down whether the issue is specific to a particular element. If it’s not, keep digging. Look at pages pulling the self-referrals and go through the code with a fine-tooth comb, drilling down as much as you can.

2. Unusually low bounce rate

If you have a crazy-low bounce rate, it could be too good to be true. Unfortunately. An unusually low bounce rate could (and probably does) mean that at least on some pages of your website have the same Google Analytics tracking code installed twice.

Take a look at your source code, or use Google Tag Assistant (though it does have known bugs) to see if you’ve got GA tracking code installed twice.

While I tell clients having Google Analytics installed on the same page can lead to double the pageviews, I’ve not actually encountered that—I usually just say it to scare them into removing the duplicate implementation more quickly. Don’t tell on me.

3. Iframes anywhere

I’ve heard directly from Google engineers and Google Analytics evangelists that Google Analytics does not play well with iframes, and that it will never will play nice with this dinosaur technology.

If you track the iframe, you inflate your pageviews, plus you still aren’t tracking everything with 100% clarity.

If you don’t track across iframes, you lose the source/medium attribution and everything becomes a self-referral.

Damned if you do; damned if you don’t.

My advice: Stop using iframes. They’re Netscape-era technology anyway, with rainbow marquees and Comic Sans on top. Interestingly, and unfortunately, a number of booking engines (for hotels) and third-party carts (for ecommerce) still use iframes.

If you have any clients in those verticals, or if you’re in the vertical yourself, check with your provider to see if they use iframes. Or you can check for yourself, by right-clicking as close as you can to the actual booking element:

iframe-booking.png

There is no neat and tidy way to address iframes with Google Analytics, and usually iframes are not the only complicated element of setup you’ll encounter. I spent eight months dealing with a website on a subfolder, which used iframes and had a cross domain booking system, and the best visibility I was able to get was about 80% on a good day.

Typically, I’d approach diagnosing iframes (if, for some reason, I had absolutely no access to viewing a website or talking to the techs) similarly to diagnosing self-referrals, as self-referrals are one of the biggest symptoms of iframe use.

4. Massive traffic jumps

Massive jumps in traffic don’t typically just happen. (Unless, maybe, you’re Geraldine.) There’s always an explanation—a new campaign launched, you just turned on paid ads for the first time, you’re using content amplification platforms, you’re getting a ton of referrals from that recent press in The New York Times. And if you think it just happened, it’s probably a technical glitch.

I’ve seen everything from inflated pageviews result from including tracking on iframes and unnecessary implementation of virtual pageviews, to not realizing the tracking code was installed on other microsites for the same property. Oops.

Usually I’ve seen this happen when the tracking code was somewhere it shouldn’t be, so if you’re investigating a situation of this nature, first confirm the Google Analytics code is only in the places it needs to be.Tools like Google Tag Assistant and Screaming Frog can be your BFFs in helping you figure this out.

Also, I suggest bribing the IT department with sugar (or booze) to see if they’ve changed anything lately.

5. Cross-domain tracking

I wish cross-domain tracking with Google Analytics out of the box didn’t require any additional setup. But it does.

If you don’t have it set up properly, things break down quickly, and can be quite difficult to untangle.

The older the GA library you’re using, the harder it is. The easiest setup, by far, is Google Tag Manager with Universal Analytics. Hard-coded universal analytics is a bit more difficult because you have to implement autoLink manually and decorate forms, if you’re using them (and you probably are). Beyond that, rather than try and deal with it, I say update your Google Analytics code. Then we can talk.

Where I’ve seen the most murkiness with tracking is when parts of cross domain tracking are implemented, but not all. For some reason, if allowLinker isn’t included, or you forget to decorate all the forms, the cookies aren’t passed between domains.

The absolute first place I would start with this would be confirming the cookies are all passing properly at all the right points, forms, links, and smoke signals. I’ll usually use a combination of the Real Time report in Google Analytics, Google Tag Assistant, and GA debug to start testing this. Any debug tool you use will mean you’re playing in the console, so get friendly with it.

6. Internal use of UTM strings

I’ve saved the best for last. Internal use of campaign tagging. We may think, oh, I use Google to tag my campaigns externally, and we’ve got this new promotion on site which we’re using a banner ad for. That’s a campaign. Why don’t I tag it with a UTM string?

Step away from the keyboard now. Please.

When you tag internal links with UTM strings, you override the original source/medium. So that visitor who came in through your paid ad and then who clicks on the campaign banner has now been manually tagged. You lose the ability to track that they came through on the ad the moment they click on the tagged internal link. Their source and medium is now your internal campaign, not that paid ad you’re spending gobs of money on and have to justify to your manager. See the problem?

I’ve seen at least three pretty spectacular instances of this in the past year, and a number of smaller instances of it. Annie Cushing also talks about the evils of internal UTM tags and the odd prevalence of it. (Oh, and if you haven’t explored her blog, and the amazing spreadsheets she shares, please do.)

One clothing company I worked with tagged all of their homepage offers with UTM strings, which resulted in the loss of visibility for one-third of their audience: One million visits over the course of a year, and $2.1 million in lost revenue.

Let me say that again. One million visits, and $2.1 million. That couldn’t be attributed to an external source/campaign/spend.

Another client I audited included campaign tagging on nearly every navigational element on their website. It still gives me nightmares.

If you want to see if you have any internal UTM strings, head straight to the Campaigns report in Acquisition in Google Analytics, and look for anything like “home” or “navigation” or any language you may use internally to refer to your website structure.

And if you want to see how users are moving through your website, go to the Flow reports. Or if you really, really, really want to know how many people click on that sidebar link, use event tracking. But please, for the love of all things holy (and to keep us analytics lovers from throwing our computers across the room), stop using UTM tagging on your internal links.

Now breathe and smile

Odds are, your Google Analytics setup is fine. If you are seeing any of these issues, though, you have somewhere to start in diagnosing and addressing the data.

We’ve looked at six of the most common points of friction I’ve encountered with Google Analytics and how to start investigating them: self-referrals, bounce rate, iframes, traffic jumps, cross domain tracking and internal campaign tagging.

What common data integrity issues have you encountered with Google Analytics? What are your favorite tools to investigate?


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →