How Do I Successfully Run SEO Tests On My Website?

Posted by randfish

By now, most of us have gotten around to doing testing of some sort on our websites, but testing specifically for SEO can be extremely difficult and requires extra vigilance. In today’s Whiteboard Friday, Rand explains three major things we need to think about when performing these tests, and offers up several ideas for experiments we all can run!

For reference, here’s a still of this week’s whiteboard!

Video transcription

Howdy Moz fans, and welcome to another edition of Whiteboard Friday. Today we’re going to talk about running some SEO tests. So it is the case that many of us in the SEO world, and in the web marketing world overall, love to run tests on our websites. Of course, there’s great software, like Unbounce or Optimizely, if you want to run conversion-style tests, tests that kind of determine whether users perform better through conversion Funnel X or Funnel Y, or if Title X or Title Y convinces more people to buy. But SEO tests are particularly insidiously challenging because there are so many components and variables that can go into a test.

So let’s say, for example, that I’ve got this recipes website, and I have my page on spaghetti carbonara, which is one of my favorite spaghetti dishes. Geraldine, my wife, makes a phenomenal carbonara as she grew up Italian.

So here we go. We’ve got our ingredients list. There’s a photo. There are the steps. What if I’m thinking to myself: Gosh, you know, people on this webpage might want to check out other pasta recipes from here. I wonder if, by linking to other pasta recipes, I can get more people exposed to that and get them visiting that, but not just that. I wonder if I can send some extra link juice and link value, link equity over to these other pasta recipe pages that might not be as popular as my spaghetti carbonara page. Can I help them rank better by linking to them from this module here?
Should I be putting this on lots of my pages?

So we might think: Well, I could easily put this on here and figure out user metrics. But how do I determine whether it’s good or bad for SEO? That is a really, really challenging problem. It is.

So I wanted to take some time and not say this is a foolproof methodology, but rather here are some big things to think about that those of us who have done this a lot in the SEO world, run a lot of experiments, have seen challenges around and have found some solutions. This isn’t going to be perfect. It’s not a checklist to tell you everything you need to know about, and I’m sure there will be some additional items in the comments. But try these things at the very least.

#1 Experiments need control groups

What’s unfortunate about the SEO world is we can’t just do something like, “Hey, let’s add this to all of our recipe pages and see how it does over the next month.” You don’t know whether that’s going to help, even if you could prove to yourself that the user and usage tricks look a little bit better, but you’re not sure about the ranking impact.

Well, gosh, you roll it out on every page, and you say, “Hey, over the last month things have gotten better. Now we know for sure that adding modules that interlink between our web pages is always better. Let’s do that across lots of things.” That’s not an accurate conclusion to come to, and that’s why we need a control group.

It could the case in the SEO world that maybe these pages are getting more links all of a sudden, maybe your domain authority has risen, maybe some of your competitors have done some bad stuff, and they have fallen in the rankings. It’s just too hard to say. There are just too many inputs going in, and that’s why this control group is so essential.

So I might take a group of pages and say those pages get the module, and at the same time these other pages don’t get the module. Now if we have something like a rising domain authority or a bunch of competitors falling out, it will be fine, because we’ll still see how the group with the module performed against the group without the module. If they both rise in the rankings, we can say reasonably that, “Well, this didn’t appear to do enough to change the ranking. So if the user metrics are good, let’s keep it, and if the user metrics are bad, let’s not keep it, because ranking wise it seems like a wash.” But if we observe differences in these two groups, assuming that there are no other differences in those groups, we can be reasonably assured that it was this that helped them rank better.

Now, be careful here. If I were doing this experiment, what I would want to make sure is that I didn’t add the module to all of my recipe pages or to some types of recipe pages and not others. I would want to make sure that it is all pasta recipe pages, pages with people, visitors, metrics that are as close as possible to each other so that the control and the test group are as similar as possible.

By the way, when you’re doing this, you also want to find a suitable target. What I mean by suitable target is I really like things for SEO experiments where I’m paying attention to the rankings in particular, I like search results with very low outside activity. Meaning, let’s say spaghetti carbonara was one of them, I would watch the search results for spaghetti carbonara for a couple of weeks, and if I saw a lot of movement, my page bouncing around in the rankings, other people’s pages bouncing around in the rankings, I wouldn’t use it. I’d go to a much less active SERP where churn in the search results and movement in the search result was likely to be very low. That’s where I love running experiments.

I’d also look for low competition that tends to go with low churn, and I’d try and find pages where I rank between number 8 and number 30. You might say, “Well, why do you care about ranking in number 8 to 30?” Well, I don’t like ranking way at the tail end of the search results because any little thing, if you’re ranking page 5 and result number 62 or something, hey man, any little thing could move you up 10 positions, 20 positions. Churn and movement that far back in the search results is much, much higher.

It’s also the case that I don’t love ranking number 1, 2, 3, 4, or 5, because it can be really hard to move results. You might need, gosh, without a ton of external links with anchor tags, blah, blah, blah, I’m not going to move 1 or 2 positions from number 3 or 4. This is why I like something between 8 and 30. That’s what I mean by finding a suitable test result, and your selection may vary on this.

#2 Every test should be repeated multiple times

Every test should be repeatable and repeated multiple times. Preferably, if you can, you actually want to turn the test on and off for the same kinds of pages. This gets tough. Now, the reason you want to do the multiple tests is because you want to be assured, confident that it was what you changed that did it.

So after checking this on pasta recipes and seeing that, hey, my recipe site is getting better, the metrics looks good, the rankings are rising a few results each time that I put this on different pages, I feel confident that we can move ahead with this, great. Now run it on your dinner recipe pages or your risotto recipe pages, something similar to pasta. Run it on your salad recipe pages. If you repeat it on your risotto pages and your salad pages, and you’re getting the same results each time, now you can feel pretty confident that probably it was the case that this module was, in fact, the impacting factor that moved the needle. If you just do it once on one set of results, it’s much harder to say that with any kind of confidence.

With the turning on and off bit, so what I would want to do, let’s say we have my group of pasta recipe pages that get the module. What if I take it off there? Will I see them fall back down in the results? The answer is kind of, well, hopefully I would because then I could be more sure that this was happening. In SEO though, this is really hard. In fact, the search engines make this kind of frustratingly impossible for a lot of link-based stuff, and the reason is something called the ghost effect.

I will probably do a Whiteboard Friday in the future on the ghost link effect. We’ve been testing this quite a bit with a project that I’m involved in called iMac Lab and here at Moz as well. We’ve seen people over the years report this. Essentially, you point a link to a page, and you see that page go up in the rankings, which makes sense. The link is helping it rank better. You remove the link and the page takes weeks, sometimes even months to fall back down. Google knows the link is gone. They’ve re-indexed that page. Why isn’t the page that it helped rank falling right back down?

The answer is this ghost effect.

So ghost effects seem to be a real thing that Google really does around links that used to point somewhere, and so it makes testing with link-based stuff really hard. That’s why you want to do the multiple times. That’s why you want to do the control group and to test it in multiple different sections as opposed to relying on, “Well, I turned it on and it did this. I turned it off, and it went back to the original. So I know that that happened.” Ghost effect will prevent you from observing that.

#3 Rankings have to be part of the test, but they can’t be the only part

In fact, I would argue that if they are the only part of your test, you might do some things that could actually mess you up in the long run and mess you up with your users.

So the two other things that I really look at are, number one, how do users perform, user experience, and that can be everything from are my browse rate and my visits and traffic sources rate, are those staying relatively similar to the patterns that I’ve seen in the past? Traffic performance, I want to see that stay relatively stable or improve. If both of those things are improving, hey, maybe you have a real winner on your hands. With a lot of these tests, that could happen. You might see that more people are clicking on those. Maybe more people are liking different pasta recipes. Linking to those, that’s helping you all across the board. Wonderful, wonderful.

Then I’m looking at rankings as well. Weirdly, even though I’m a hardcore SEO guy and I love SEO, I think rankings are the least important of these three. If I see something perform well for my users and I see my traffic improving, it doesn’t matter too much what I see going on with my rankings. In fact, usually it’s only when there’s not much delta in these two, and the rankings performance is the only indicator that things are getting better that I would care about that deeply. When you’re watching rankings performance, always, of course, watch logged out, non-personalized, non geo-

biased results. It’s relatively easy with something like recipes, but could be very hard with something that’s in the local world or that has local indicators in it.

Now for some example tests!

Okay. Now you’ve got these one, two, three. What are some interesting tests that you might actually want to run? Well, these are some of the ones that we’ve run here, or I’ve seen other companies run and observed interesting results. So making titles more narrative versus more keyword driven. I’ve seen this test performed. I’ve actually seen this positively and negatively performed. I think people who did the more narrative sort of click-baity style titles, and that hurt their rankings, hurt their traffic, and I’ve seen people improve with it as well.

Adding or removing links or blocks of links, it might surprise you to learn that I’ve seen people remove blocks of links like this and perform better. They find that user metrics stay the same or even improve a little, because people aren’t dragged off to other sections, or maybe it helps make the content stand out better and the search engines seems to like it too. I think in particular, Google might be looking at some of that Panda-style stuff and saying, “Man, this chunky block of unrelated links or of links that no one is clicking or of links that look keyword stuffed, we don’t like that.” In particular, I’ve seen people remove links from their footer and get better rankings.

Adding or removing social or comment buttons and share accounts. So I’ve seen folks have share on Twitter, share on Facebook. Here’s how many likes it has. I’ve had people say, “Comment on this”, “Add a comment or don’t add a comment”, and those have actually moved the needle. The comment one is particularly fascinating. I’ve seen people remove comments and perform better, I think oftentimes because zero comments is sort of a negative psychological indicator for a lot of folks. So people don’t share things that have zero shares or zero comments. But if they see the button to share something socially and no comments, because comments aren’t allowed, sometimes that actually improves things. So interesting.

I’ve seen adding descriptive content, images, and videos help and hurt rankings at times. Sometimes people take a big block of text, they think I need more good unique content to rank this page. They shove it on the page, and the performance stays the same or goes down. I’ve seen people say,

“Hey, we need more good, unique content on this page.” They write something really compelling, put it on there, and it helps the rankings results.

This is why these tests exist. This is why we have these kinds of principles of some testing in the SEO world. With that, hopefully, you’ll run some fantastic tests of your own, learn something amazing, improve the performance of your site and rankings.

And we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!