Archives for 

seo

Local SEO Spam Tactics Are Working: How You Can Fight Back

Posted by Casey_Meraz

For years, I’ve been saying that if you have a problem with spammers in local results, you can just wait it out. I mean, if Google cared about removing spam and punishing those who are regular spammers we’d see them removed fast and often, right?

While there are instances where spam has been removed, it seems these are not fast fixes, permanent fixes, or even very common. In fact, they seem few and far between. So today I’m changing my tune a bit to call more attention to the spam issues people employ that violate Google My Business terms and yet continue to win in the SERPs.

The problems are rampant and blatant. I’ve heard and seen many instances of legitimate businesses changing their names just to rank better and faster for their keywords.

Another problem is that Google is shutting down MapMaker at the end of March. Edits will still be allowed, but they’ll need to be made through Google Maps.

If Google is serious about rewarding brands in local search, they need to encourage it through their local search algorithms.

For some people, it’s gotten so bad that they’re actually suing Google. On January 13, 2017, for instance, a group of fourteen locksmiths sued Google, Yahoo, and Bing over fake spam listings, as reported by Joy Hawkins.

While some changes — like the Possum update — seemed to have a positive impact overall, root problems (such as multiple business listings) and many other issues still exist in the local search ecosystem.

And there are other technically non-spammy ways that users are also manipulating Google results. Let’s look at a couple of these examples.

It’s not all spam. Businesses are going to great lengths to stay within the GMB guidelines & manipulate results.

Let’s look at an example of a personal injury attorney in the Denver market. Recently, I came across these results when doing a search for trial attorneys:

2017-02-28_1137.png

Look at the #2 result listing, entitled “Denver Trial Lawyers.” I originally thought this was spam and wanted to report it, but I had to do my due diligence first.

To start, I needed to verify that the listing was actually spam by looking at the official business name. I pulled up their website and, to my surprise, the business name in the logo is actually “Denver Trial Lawyers.”

business name.png

This intrigued me, so I decided to see if they were using a deceptive logo to advertise the business name or if this was the actual business name.

I checked out the Colorado Secretary of State’s website and did a little digging around. After a few minutes I found the legally registered trade name through their online search portal. The formation date of this entity was 7/31/2008, so they appear to have been planning on using the name for some time.

I also reviewed their MapMaker listing history to see when this change was made and whether it reflected the trade name registration. I saw that on October 10, 2016 the business updated their MapMaker listing to reflect the new business name.

mapmaker-history.png

After all of this, I decided to take this one step further and called the business. When I did, the auto-attendant answered with “Thank you for calling Denver Trial Lawyers,” indicating that this is their legitimate business name.

I guess that, according to the Google My Business Guidelines, this can be considered OK. They state:

“Your name should reflect your business’ real-world name, as used consistently on your storefront, website, stationery, and as known to customers. Accurately representing your business name helps customers find your business online.”

But what does that mean for everyone else?

Recently, Gyi Tsakalakis also shared this beautiful screenshot on Twitter of a SERP with three businesses using their keywords in the business name:

It seems they’re becoming more and more prominent because people see they’re working.

To play devil’s advocate, there are also businesses that legitimately sport less-than-creative names, so where do you draw the line? (Note: I’ve been following some of above businesses for years; I can confirm they’ve changed their business names to include keywords).

Here’s another example

If you look closely, you’ll find more keyword- and location-stuffed business names popping up every day.

Here’s an interesting case of a business (also located in Denver) that might have been trying to take advantage of Near Me searches, as pointed out by Matt Lacuesta:

lacquesta.png

Do you think this business wanted to rank for Near Me searches in Denver? Maybe it’s just a coincidence. It’s funny, nonetheless.

How are people actively manipulating local results?

While there are many ways to manipulate a Google My Business result, today we’re going to focus on several tactics and identify the steps you can take to help fight back.

Tactic #1: Spammy business names

Probably the biggest problem in Google’s algorithm is the amount of weight they put into a business name. At a high level, it makes sense that they would treat this with a lot of authority. After all, if I’m looking for a brand name, I want to find that specific brand when I’m doing a search.

The problem is that people quickly figured out that Google gives a massive priority to businesses with keywords or locations in their business names.

In the example below, I did a search for “Fresno Personal Injury Lawyers” and was given an exact match result, as you can see in the #2 position:

fresno-.png

However, when I clicked through to the website, I found it was for a firm with a different name. In this case, they blatantly spammed their listing and have been floating by with nice rankings for quite some time.

I reported their listing a couple of times and nothing was done until I was able to escalate this. It’s important to note that the account I used to edit this listing didn’t have a lot of authority. Once an authoritative account approved my edit, it went live.

The spam listing below has the keyword and location in the business name.

We reported this listing using the process outlined below, but sadly the business owner noticed and changed it back within hours.

How can you fight back against spammy business names?

Figuring out how to fight back against people manipulating results is now your job as an SEO. In the past, some in the industry have given the acronym “SEO” a bad name due to the manipulative practices they performed. Now it’s our job to give us a better name by helping to police these issues.

Since Google MapMaker is now disappearing, you’ll need to make edits in Google Maps directly. This is also a bit of a problem, as there’s no room to leave comments for evidence.

Here are the steps you should take to report a listing with incorrect information:

  1. Make sure you’re signed into Google
  2. Locate the business on maps.google.com
  3. Once the business is located, open it up and look for the “Suggest an edit” option:

    suggest-edit.png
  4. Once you select it, you’ll be able to choose the field you want to change:
    click on what you want to edit.png
  5. Make the necessary change and then hit submit! (Don’t worry — I didn’t make the change above.)

Now, don’t expect anything to happen right away. It can take time for changes to take place. Also, the trust level of your profile seems to play a big role in how Google evaluates these changes. Getting the approval by someone with a high level of trust can make your edits go live quickly.

Make sure you check out all of these great tips from Joy Hawkins on The Ultimate Guide to Fighting Spam on Google Maps, as well.

Tactic #2: Fake business listings

Another issue that we see commonly with maps spam is fake business listings. These listings are completely false businesses that black-hat SEOs build just to rank and get more leads.

Typically we see a lot of these in the locksmith niche — it’s full of people creating fake listings. This is one of the reasons Google started doing advanced verification for locksmiths and plumbers. You can read more about that on Mike Blumenthal’s blog.

Joy Hawkins pointed out a handy tip for identifying these listings on her blog, saying:

“Many spammers who create tons of fake listings answer their phone with something generic like ‘Hello, locksmith’ or ‘Hello, service.'”

I did a quick search in Denver for a plumber and it wasn’t long before I found a listing with an exact match name. Using Joy’s tips, I called the number and it was disconnected. This seemed like an illegitimate listing to me.

Thankfully, in this case, the business wasn’t ranking highly in the search results:

2017-02-28_1254.png

When you run into these types of listings, you’ll want to take a similar approach as we did above and report the issue.

Tactic #3: Review spam

Review spam can come in many different forms. It’s clear that Google’s putting a lot of attention into reviews by adding sorting features and making stars more prominent. I think Google knows they can do a better job with their reviews overall, and I hope we see them take it a little bit more seriously.

Let’s look at a few different ways that review spam appears in search results.

Self-reviews & competitor shaming

Pretty much every business knows they need reviews, but they have trouble getting them. One way people get them is to leave them on their own business.

Recently, we saw a pretty blatant example where someone left a positive five-star review for a law firm and then five other one-star reviews for all of their competitors. You can see this below:

review-spam.png

Although it’s very unethical for these types of reviews to show up, it happens everyday. According to Google’s review and photo policies, they want to:

“Make sure that the reviews and photos on your business listing, or those that you leave at a business you’ve visited, are honest representations of the customer experience. Those that aren’t may be removed.”

While I’d say that this does violate the policies, figuring out which rule applies best is a little tricky. It appears to be a conflict of interest, as defined by Google’s review guidelines below:

"Conflict of interest: Reviews are most valuable when they are honest and unbiased. If you own or work at a place, please don’t review your own business or employer. Don’t offer or accept money, products, or services to write reviews for a business or to write negative reviews about a competitor. If you're a business owner, don't set up review stations or kiosks at your place of business just to ask for reviews written at your place of business."

In this particular case, a member of our staff, Dillon Brickhouse, reached out to Google to see what they would say.

Unfortunately, Google told Dillon that since there was no text in the review, nothing could be done. They refused to edit the review.

And, of course, this is not an isolated case. Tim Capper recently wrote an article — “Are Google My Business Guidelines & Spam Algos Working?” — in which he identified similar situations and nothing had been done.

How can you fight against review stars?

Although there will still be cases where spammy reviews are ignored until Google steps up their game, there is something you can try to remove bad reviews. In fact, Google published the exact steps on their review guidelines page here.

You can view the steps and flag a review for removal using the method below:

1. Navigate to Google Maps. 2. Search for your business using its name or address. 3. Select your business from the search results. 4. In the panel on the left, scroll to the “Review summary” section. 5. Under the average rating, click [number of] reviews. 6. Scroll to the review you’d like to flag and click the flag icon. 7. Complete the form in the window that appears and click Submit.

What can you do if the basics don’t work?

There are a ton of different ways to spam local listings. What can you do if you’ve reported the issue and nothing changes?

While edits may take up to six weeks to go live, the next step involves you getting more public about the issue. The key to the success of this approach is documentation. Take screenshots, record dates, and keep a file for each issue you’re fighting. That way you can address it head-on when you finally get the appropriate exposure.

Depending on whether or not the listing is verified, you’ll want to try posting in different forums:

Verified listings

If the listing you’re having trouble with is a verified listing, you’ll want to make a public post about it in the Google My Business Community forum. When posting, make sure to provide all corresponding evidence, screenshots, etc. to make the case very clear to the moderators. There’s a Spam and Policy section on the forum where you can do this.

Unverified listings

However, some spam listings are not verified listings. In these cases ,Joy Hawkins recommends that you engage with the Local Guides Connect Forum here.

Key takeaways

Sadly, there’s not a lot we can do outside of the basics of reporting results, but hopefully being more proactive about it and making some noise will encourage Google to take steps in the right direction.

  1. Start being more proactive about reporting listings and reviews that are ignoring the guidelines. Be sure to record the screenshots and take evidence.
  2. If the listings still aren’t being fixed after some time, escalate them to the Google My Business Community forum.
  3. Read Joy Hawkins’ post from start to finish on The Ultimate Guide to Fighting Spam in Google Maps
  4. Don’t spam local results. Seriously. It’s annoying. Continually follow and stay up-to-date on the Google My Business guidelines.
  5. Lastly, don’t think the edit you made is the final say or that it’ll stay around forever. The reality is that they could come back. During testing for this post, the listing for “Doug Allen Personal Injury Attorney Colorado Springs” came back within hours based on an owner edit.

In the future, I’m personally looking forward to seeing some major changes from Google with regards to how they rank local results and how they monitor reviews. I would love to see local penalties become as serious as manual penalties.

How do you think Google can fight this better? What are your suggestions? Let me know in the comments below.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Structuring URLs for Easy Data Gathering and Maximum Efficiency

Posted by Dom-Woodman

Imagine you work for an e-commerce company.

Wouldn’t it be useful to know the total organic sessions and conversions to all of your products? Every week?

If you have access to some analytics for an e-commerce company, try and generate that report now. Give it 5 minutes.

Done?

Or did that quick question turn out to be deceptively complicated? Did you fall into a rabbit hole of scraping and estimations?

Not being able to easily answer that question — and others like it — is costing you thousands every year.

Let’s jump back a step

Every online business, whether it’s a property portal or an e-commerce store, will likely have spent hours and hours agonizing over decisions about how their website should look, feel, and be constructed.

The biggest decision is usually this: What will we build our website with? And from there, there are hundreds of decisions, all the way down to what categories should we have on our blog?

Each of these decisions will generate future costs and opportunities, shaping how the business operates.

Somewhere in this process, a URL structure will be decided on. Hopefully it will be logical, but the context in which it’s created is different from how it ends up being used.

As a business grows, the desire for more information and better analytics grows. We hire data analysts and pay agencies thousands of dollars to go out, gather this data, and wrangle it into a useful format so that smart business decisions can be made.

It’s too late. You’ve already wasted £1000s a year.

It’s already too late; by this point, you’ve already created hours and hours of extra work for the people who have to analyze your data and thousands will be wasted.

All because no one structured the URLs with data gathering in mind.

How about an example?

Let’s go back to the problem we talked about at the start, but go through the whole story. An e-commerce company goes to an agency and asks them to get total organic sessions to all of their product pages. They want to measure performance over time.

Now this company was very diligent when they made their site. They’d read Moz and hired an SEO agency when they designed their website and so they’d read this piece of advice: products need to sit at the root. (E.g. mysite.com/white-t-shirt.)

Apparently a lot of websites read this piece of advice, because with minimal searching you can find plenty of sites whose product pages that rank do sit at the root: Appleyard Flowers, Game, Tesco Direct.

At one level it makes sense: a product might be in multiple categories (LCD & 42” TVs, for example), so you want to avoid duplicate content. Plus, if you changed the categories, you wouldn’t want to have to redirect all the products.

But from a data gathering point of view, this is awful. Why? There is now no way in Google Analytics to select all the products unless we had the foresight to set up something earlier, like a custom dimension or content grouping. There is nothing that separates the product URLs from any other URL we might have at the root.

How could our hypothetical data analyst get the data at this point?

They might have to crawl all the pages on the site so they can pick them out with an HTML footprint (a particular piece of HTML on a page that identifies the template), or get an internal list from whoever owns the data in the organization. Once they’ve got all the product URLs, they’ll then have to match this data to the Google Analytics in Excel, probably with a VLOOKUP or, if the data set is too large, a database.

Shoot. This is starting to sound quite expensive.

And of course, if you want to do this analysis regularly, that list will constantly change. The range of products being sold will change. So it will need to be a scheduled scrape or automated report. If we go the scraping route, we could do this, but crawling regularly isn’t possible with Screaming Frog. Now we’re either spending regular time on Screaming Frog or paying for a cloud crawler that you can schedule. If we go the other route, we could have a dev build us an internal automated report we can go to once we can get the resource internally.

Wow, now this is really expensive: a couple days’ worth of dev time, or a recurring job for your SEO consultant or data analyst each week.

This could’ve been a couple of clicks on a default report.

If we have the foresight to put all the products in a folder called /products/, this entire lengthy process becomes one step:

Load the landing pages report in Google Analytics and filter for URLs beginning with /product/.

Congratulations — you’ve just cut a couple days off your agency fee, saved valuable dev time, or gained the ability to fire your second data analyst because your first is now so damn efficient (sorry, second analysts).

As a data analyst or SEO consultant, you continually bump into these kinds of issues, which suck up time and turn quick tasks into endless chores.

What is unique about a URL?

For most analytics services, it’s the main piece of information you can use to identify the page. Google Analytics, Google Search Console, log files, all of these only have access to the URL most of the time and in some cases that’s all you’ll get — you can never change this.

The vast majority of site analyses requires working with templates and generalizing across groups of similar pages. You need to work with templates and you need to be able to do this by URL.

It’s crucial.

There’s a Jeff Bezos saying that’s appropriate here:

“There are two types of decisions. Type 1 decisions are not reversible, and you have to be very careful making them. Type 2 decisions are like walking through a door — if you don’t like the decision, you can always go back.”

Setting URLs is very much a Type 1 decision. As anyone in SEO knows, you really don’t want to be constantly changing URLs; it causes a lot of problems, so when they’re being set up we need to take our time.

How should you set up your URLs?

How do you pick good URL patterns?

First, let’s define a good pattern. A good pattern is something which we can use to easily select a template of URLs, ideally using contains rather than any complicated regex.

This usually means we’re talking about adding folders because they’re easiest to find with just a contains filter, i.e. /products/, /blogs/, etc.

We also want to keep things human-readable when possible, so we need to bear that in mind when choosing our folders.

So where should we add folders to our URLs?

I always ask the following two questions:

  • Will I need to group the pages in this template together?
    • If a set of pages needs grouping I need to put them in the same folder, so we can identify this by URL.
  • Are there crucial sub-groupings for this set of pages? If there are, are they mutually exclusive and how often might they change?
    • If there are common groupings I may want to make, then I should consider putting this in the URL, unless those data groupings are liable to change.

Let’s look at a couple examples.

Firstly, back to our product example: let’s suppose we’re setting up product URLs for a fashion e-commerce store.

Will I need to group the products together? Yes, almost certainly. There clearly needs to be a way of grouping in the URL. We should put them in a /product/ folder.

Within in this template, how might I need to group these URLs together? The most plausible grouping for products is the product category. Let’s take a black midi dress.

What about putting “little black dress” or “midi” as a category? Well, are they mutually exclusive? Our dress could fit in the “little black dress” category and the “midi dress” category, so that’s probably not something we should add as a folder in the URL.

What about moving up a level and using “dress” as a category? Now that is far more suitable, if we could reasonably split all our products into:

  • Dresses
  • Tops
  • Skirts
  • Trousers
  • Jeans

And if we were happy with having jeans and trousers separate then this might indeed be an excellent fit that would allow us to easily measure the performance of each top-level category. These also seem relatively unlikely to change and, as long as we’re happy having this type of hierarchy at the top (as opposed to, say, “season,” for example), it makes a lot of sense.

What are some common URL patterns people should use?

Product pages

We’ve banged on about this enough and gone through the example above. Stick your products in a /products/ folder.

Articles

Applying the same rules we talked about to articles and two things jump out. The first is top-level categorization.

For example, adding in the following folders would allow you to easily measure the top-level performance of articles:

  • Travel
  • Sports
  • News

You should, of course, be keeping them all in a /blog/ or /guides/ etc. folder too, because you won’t want to group just by category.

Here’s an example of all 3:

  • A bad blog article URL: example.com/this-is-an-article-name/
  • A better blog article URL: example.com/blog/this-is-an-article-name/
  • An even better blog article URL: example.com/blog/sports/this-is-an-article-name

The second, which obeys all our rules, is author groupings, which may be well-suited for editorial sites with a large number of authors that they want performance stats on.

Location grouping

Many types of websites often have category pages per location. For example:

  • Cars for sale in Manchester – /for-sale/vehicles/manchester
  • Cars for sale in Birmingham. – /for-sale/vehicles/birmingham

However, there are many different levels of location granularity. For example, here are 4 different URLs, each a more specific location in the one above it (sorry to all our non-UK readers — just run with me here).

  • Cars for sale in Suffolk – /for-sale/vehicles/suffolk
  • Cars for sale in Ipswich – /for-sale/vehicles/ipswich
  • Cars for sale in Ipswich center – /for-sale/vehicles/ipswich-center
  • Cars for sale on Lancaster road – /for-sale/vehicles/lancaster-road

Obviously every site will have different levels of location granularity, but a grouping often missing here is providing the level of location granularity in the URL. For example:

  • Cars for sale in Suffolk – /for-sale/cars/county/suffolk
  • Cars for sale in Ipswich – /for-sale/vehicles/town/ipswich
  • Cars for sale in Ipswich center – /for-sale/vehicles/area/ipswich-center
  • Cars for sale on Lancaster road – /for-sale/vehicles/street/lancaster-road

This could even just be numbers (although this is less ideal because it breaks our second rule):

  • Cars for sale in Suffolk – /for-sale/vehicles/04/suffolk
  • Cars for sale in Ipswich – /for-sale/vehicles/03/ipswich
  • Cars for sale in Ipswich center – /for-sale/vehicles/02/ipswich-center
  • Cars for sale on Lancaster road – /for-sale/vehicles/01/lancaster-road

This makes it very easy to assess and measure the performance of each layer so you can understand if it’s necessary, or if perhaps you’ve aggregated too much.

What other good (or bad) examples of this has the community come across? Let’s here it!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Your Daily SEO Fix: Keywords, Concepts, Page Optimization, and Happy NAPs

Posted by FeliciaCrawford

Howdy, readers! We’re back with our last round of videos for this go of the Daily SEO Fix series. To recap, here are the other topics we’ve covered previously:

Today we’ll be delving into more keyword and concept research, quick wins for on-page optimization, and a neat way to stay abreast of duplicates and inaccuracies in your local listings. We use Moz Pro, the MozBar, and Moz Local in this week’s fixes.


Fix #1: Grouping and analyzing keywords by label to judge how well you’re targeting a concept

The idea of “concepts over keywords” has been around for a little while now, but tracking rankings for a concept isn’t quite as straightforward as it is for keywords. In this fix, Kristina shows you how to label groups of keywords to track and sort their rankings in Moz Pro so you can easily see how you’re ranking for grouped terms, chopping and analyzing the data as you see fit.


Fix #2: Adding alternate NAP details to uncover and clean up duplicate or inaccurate listings

If you work in local SEO, you know how important it is for listings to have an accurate NAP (name, address, phone number). When those details change for a business, it can wreak absolute havoc and confuse potential searchers. Jordan walks you through adding alternate NAP details in Moz Local to make sure you uncover and clean up old and/or duplicate listings, making closure requests a breeze. (This Whiteboard Friday is an excellent explanation of why that’s really important; I like it so much that I link to it in the resources below, too. 😉

Remember, you can always use the free Check Listing tool to see how your local listings and NAP are popping up on search engines:

Is my NAP accurate?


Fix #3: Research keywords and concepts to fuel content suggestions — on the fly

You’re already spying on your competitors’ sites; you might as well do some keyword research at the same time, right? Chiaryn walks you through how to use MozBar to get keyword and content suggestions and discover how highly ranking competitor sites are using those terms. (Plus a cameo from Lettie Pickles, star of our 2015 Happy Holidays post!)


Fix #4: Discover whether your pages are well-optimized as you browse — then fix them with these suggestions

A fine accompaniment to your on-the-go keyword research is on-the-go on-page optimization. (Try saying that five times fast.) Janisha gives you the low-down on how to check whether a page is well-optimized for a keyword and identify which fixes you should make (and how to prioritize them) using the SEO tool bar.


Further reading & fond farewells

I’ve got a whole passel of links if you’re interested in reading more educational content around these topics. And by “reading,” I mean “watching,” because I really stacked the deck with Whiteboard Fridays this time. Here you are:

And of course, if you need a better handle on all this SEO stuff and reading blog posts just doesn’t cut the mustard, we now offer classes that cover all the essentials.

My sincere thanks to all of you tuning in to check out our Daily SEO Fix video series over the past couple of weeks — it’s been fun writing to you and hearing from you in the comments! Be sure to keep those ideas and questions comin’ — we’re listening.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

How to Do a Content Audit [Updated for 2017]

Posted by Everett

This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.

{Expand for more background}

It’s been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.

It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.

We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.


TABLE OF CONTENTS


What is a content audit?

A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.

What is the purpose of a content audit?

A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:

  • How to escape a content-related search engine ranking filter or penalty
  • Content that requires copywriting/editing for improved quality
  • Content that needs to be updated and made more current
  • Content that should be consolidated due to overlapping topics
  • Content that should be removed from the site
  • The best way to prioritize the editing or removal of content
  • Content gap opportunities
  • Which content is ranking for which keywords
  • Which content should be ranking for which keywords
  • The strongest pages on a domain and how to leverage them
  • Undiscovered content marketing opportunities
  • Due diligence when buying/selling websites or onboarding new clients

While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:

The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.

Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.

How & why “pruning” works

{Expand for more on pruning}

Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.

Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.

Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really…

Does it matter?

¯\_()_/¯


How to do a content audit

Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.

Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.

The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:

You now have a basic overview of how to perform a content audit. More specific instructions can be found below.

The process can be roughly split into three distinct phases:

  1. Inventory & audit
  2. Analysis & recommendations
  3. Summary & reporting

The inventory & audit phase

Taking an inventory of all content, and related metrics, begins with crawling the site.

One difference between crawling for content audits and technical audits:

Technical SEO audit crawls are concerned with all crawlable content (among other things).

Content audit crawls for the purpose of SEO are concerned with all indexable content.

{Expand for more on crawlable vs. indexable content}

The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.

I’m not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the “supplemental index” of yesteryear. But that’s another post, more suitably written by a brighter mind like Bill Slawski.

A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”

A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.

Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.

It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.

Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.

Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.

However, if you’re having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).

All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.

Until then, the process below should handle most situations.

Step 1: Crawl all indexable URLs

A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.

In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.

Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.

Crawling roadblocks & new technologies

Crawling very large websites

First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.

{Expand for more about crawling very large websites}

Avoid crawling unnecessary URLs

Some of the things you can avoid crawling and adding to the content audit in many cases include:

  • Noindexed or robots.txt-blocked URLs
  • 4XX and 5XX errors
  • Redirecting URLs and those that canonicalize to a different URL
  • Images, CSS, JavaScript, and SWF files

Segment the site into crawlable chunks

You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.

Filter out URL patterns you plan to remove from the index

Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.

Upgrade your machine

Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.

You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.

Tune up your tools

Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.

Try other tools

I’m convinced that there’s a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.

That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.


Crawling dynamic mobile sites

This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.

{Expand for more on crawling dynamic websites}

Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure —> HTTP Header” menu:

The important thing to remember when working on mobile dynamic websites is that you’re only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.

Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It’s vital that the mobile and desktop version of each page have parity when it comes to these essentials.

It’s easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it’s not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME – MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.


Crawling and rendering JavaScript

One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.

{Expand for more on crawling Javascript websites}

Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.

When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.

How do you know if you’re dealing with a JavaScript website?

First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.

What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.

JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:

  1. The URLs contain #! (hashbangs). For example: example.com/page#!key=value (AJAX)
  2. Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
  3. What looks like server-side code in the meta tags instead of the actual content of the tag. For example:

You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.

Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.


Step 2: Gather additional metrics

Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.

Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.

Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).

This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.

Once URL Profiler is finished, you should end up with something like this:

Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.

The risk of getting analytics data from a third-party tool

We’ve noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.

Metrics to pull for each URL:

  • Indexed or not?
    • If crawlers are set up properly, all URLs should be “indexable.”
    • A non-indexed URL is often a sign of an uncrawled or low-quality page.
  • Content uniqueness
    • Copyscape, Siteliner, and now URL Profiler can provide this data.
  • Traffic from organic search
    • Typically 90 days
    • Keep a consistent timeframe across all metrics.
  • Revenue and/or conversions
    • You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
  • Publish date
    • If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
  • Internal links
    • Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
  • External links
  • Landing pages resulting in low time-on-site
    • Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
  • Landing pages resulting in Low Pages-Per-Visit
    • Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
  • Response code
    • Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that’s the case on your domain.
  • Canonical tag
    • Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that’s the case on your domain.
  • Page speed and mobile-friendliness

Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).


Things you don’t need when analyzing the data

{Expand for more on removing unnecessary data}

URL Profiler and Screaming Frog tabs
Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.

Content Type
Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.

Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.

Filtering unnecessary file types out like I’ve done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”

Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.

Status Code and Status
You only need one or the other. I prefer to keep the Code, and delete the Status column.

Length and Pixels
You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.

Meta Keywords
Delete the columns. If those cells have content, consider removing that tag from the site.

DNS Safe URL, Path, Domain, Root, and TLD
You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.

Duplicate Columns
You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.

Blank Columns
Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.

[You can save a lot of headspace by hiding or removing blank columns.]

Single-Value Columns
If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.

Hopefully by now you’ve made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.

The analysis & recommendations phase

Here’s where the fun really begins. In a large organization, it’s tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.

Step 3: Put it all into a dashboard

Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it’s ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.

Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.

Use Data Validation and a drop-down selector to limit Action options.

Step 4: Work the content audit dashboard

All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That’s all fine, as long as you’re working toward the goal of determining what to do, if anything, for each piece of content on the website.

A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.

Causes of content-related penalties

These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.

{Expand to learn more about quality, duplication, and relevancy issues}
  • Typical low-quality content
    • Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate…
  • Completely irrelevant content
    • OK in small amounts, but often entire blogs are full of it.
    • A typical example would be a “linkbait” piece circa 2010.
  • Thin/short content
    • Glossed over the topic, too few words, or all image-based content.
  • Curated content with no added value
    • Comprised almost entirely of bits and pieces of content that exists elsewhere.
  • Misleading optimization
    • Titles or keywords targeting queries for which content doesn’t answer or deserve to rank.
    • Generally not providing the information the visitor was expecting to find.
  • Duplicate content
    • Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
    • Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
  • Stub pages (e.g., “No content is here yet, but if you sign in and leave some user-generated-content, then we’ll have content here for the next guy.” By the way, want our newsletter? Click an AD!)
  • Indexable internal search results
  • Too many indexable blog tag or blog category pages
  • And so on and so forth…

It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.

{Expand to learn more about what to look for}

Sort by duplicate content risk

URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).

  • Which of these pages should be rewritten?
    • Rewrite key/important pages, such as categories, home page, top products
    • Rewrite pages with good link and social metrics
    • Rewrite pages with good traffic
    • After selecting “Improve” in the Action column, elaborate in the Details column:
      • “Improve these pages by writing unique, useful content to improve the Copyscape risk score.”
  • Which of these pages should be removed/pruned?
    • Remove guest posts that were published elsewhere
    • Remove anything the client plagiarized
    • Remove content that isn’t worth rewriting, such as:
      • No external links, no social shares, and very few or no entrances/visits
    • After selecting “Remove” from the Action column, elaborate in the Details column:
      • “Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap.”
  • Which of these pages should be consolidated into others?
    • Presumably none, since the content is already externally duplicated.
  • Which of these pages should be left “As-Is”?
    • Important pages which have had their content stolen

Sort by entrances or visits (filtering out any that were already finished)

  • Which of these pages should be marked as “Improve”?
    • Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
    • Key pages that require improvement determined after a manual review of the page.
  • Which of these pages should be marked as “Consolidate”?
    • When you have overlapping topics that don’t provide much unique value of their own, but could make a great resource when combined.
      • Mark the page in the set with the best metrics as “Improve” and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
      • Mark the pages that are to be consolidated into the canonical page as “Consolidate” and provide further instructions in the Details column, such as:
        • Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
        • Update all internal links.
    • Campaign-based or seasonal pages that could be consolidated into a single “Evergreen” landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 —> Best Sellers).
  • Which of these pages should be marked as “Remove”?
    • Pages with poor link, traffic, and social metrics related to low-quality content that isn’t worth updating
      • Typically these will be allowed to 404/410.
    • Irrelevant content
      • The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
    • Out-of-date content that isn’t worth updating or consolidating
      • The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
  • Which of these pages should be marked as “Leave As-Is”?
    • Pages with good traffic, conversions, time on site, etc. that also have good content.
      • These may or may not have any decent external links.

Taking the hatchet to bloated websites

For big sites, it’s best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you’ll spend way too much time on the project, which eats into the ROI.

This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.

{Expand for examples of hatchet approaches}

Parameter-based URLs that shouldn’t be indexed

  • Defer to the technical audit, if applicable. Otherwise, use your best judgment:
    • e.g., /?sort=color, &size=small
  • Assuming the tech audit didn’t suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
    • Action = Remove
    • Details = Rel canonical to the base page without the parameter

Internal search results

  • Defer to the technical audit if applicable. Otherwise, use your best judgment:
    • e.g., /search/keyword-phrase/
  • Assuming the tech audit didn’t suggest otherwise:
    • Action = Remove
    • Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.

Blog tag pages

  • Defer to the technical audit if applicable. Otherwise:
    • e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
  • Assuming the tech audit didn’t suggest otherwise:
    • Action = Remove
    • Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.

E-commerce product pages with manufacturer descriptions

  • In cases where the “Page Type” is known (i.e., it’s in the URL or was provided in a CMS export) and Risk Score indicates duplication:
    • e.g., /product/product-name/
  • Assuming the tech audit didn’t suggest otherwise:
    • Action = Improve
    • Details = Rewrite to improve product description and avoid duplicate content

E-commerce category pages with no static content

  • In cases where the “Page Type” is known:
    • e.g. /category/category-name/ or category/cat1/cat2/
  • Assuming NONE of the category pages have content:
    • Action = Improve
    • Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.

Out-of-date blog posts, articles, and other landing pages

  • In cases where the title tag includes a date, or…
  • In cases where the URL indicates the publishing date:
    • Action = Improve
    • Details = Update the post to make it more current, if applicable. Otherwise, change Action to “Remove” and customize the Strategy based on links and traffic (i.e., 301 or 404).

Content marked for improvement should lay out more specific instructions in the “Details” column, such as:

  • Update the old content to make it more relevant
  • Add more useful content to “beef up” this thin page
  • Incorporate content from overlapping URLs/pages
  • Rewrite to avoid internal duplication
  • Rewrite to avoid external duplication
  • Reduce image sizes to speed up page load
  • Create a “responsive” template for this page to fit on mobile devices
  • Etc.

Content marked for removal should include specific instructions in the “Details” column, such as:

  • Consolidate this content into the following URL/page marked as “Improve”
    • Then redirect the URL
  • Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
  • Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”… Do not incorporate the content into the new page. It is low-quality.
  • Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
  • Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.

As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.

After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.

URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.

Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.

WARNING!

As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:

Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.

The reporting phase

The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.

Counting actions from Column B

It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.

Step 5: Writing up the report

Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.

Here is a real example of an executive summary from one of Inflow’s content audit strategies:

As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:

Removal of about 624 pages from Google index by deletion or consolidation:

  • 203 Pages were marked for Removal with a 404 error (no redirect needed)
  • 110 Pages were marked for Removal with a 301 redirect to another page
  • 311 Pages were marked for Consolidation of content into other pages
    • Followed by a redirect to the page into which they were consolidated

Rewriting or improving of 668 pages

  • 605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
  • 63 “Other” pages to be rewritten due to low-quality or duplicate content.

Keeping 226 pages as-is

  • No rewriting or improvements needed

These changes reflect an immediate need to “improve or remove” content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.

The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.

We recommend the following three projects in order of their urgency and/or potential ROI for the site:

Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the “Details” column of the Content Audit Dashboard.

Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.

Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the “Details” column

Content audit resources & further reading

Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum
This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.

Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.

Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow
An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.

The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic
Praise for the life-changing powers of a good content audit inventory.

http://spot.goinflow.com/ecommerce-content-audit-toolkit

Everything You Need to Perform Content Audits


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →