Archives for 

seo

Finding and Building Citations Like an Agency

Posted by Casey Meraz

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

So you want to rank locally? If you have already worked hard to add a few citations, complete your on-site local optimization, acquire customer reviews, and build some locally relevant links, well, now it’s time to shift your focus. According to David Mihm, citations make-up roughly 25% of the overall local ranking factors.

Why It’s Time to Change Your Thinking…

I’ve mentioned before that it’s time to stop chasing links, and for local SEO it’s time to stop chasing citations! What do I mean by that? If your whole purpose for creating citations is to improve your local rankings, then you are probably relying too much on Google. What would happen if those rankings were to suddenly go away? Instead of viewing the process building your brand in the local ecosystem as a laborious task that needs to get done so that you can rank, then you aren’t seeing the big picture.

Each of the citation sites that you’re trying to get listed on were created with goals far beyond just helping businesses rank for Google’s local results. In most cases, they were created to provide a good customer experience and send potential shoppers to worthy vendors. Each of these sites gets their own traffic, and setting up your business listing on them is another place for potential customers to find you.

Below is a quick example from a fairly low traffic attorney site. In a one month period, they are getting traffic from other websites where their citations also reside.

Local Marketing Sources

It’s time to change your mindset and get motivated to start building citations for the right purpose. If you do that, the rankings you long for will come with it. Now, here’s how we find and get our business listed in these citations in an organized and speedy fashion at my firm.

Make Sure Your Information (N.A.P.) is Accurate.

Having accurate information that correlates across your website, Google Plus page, and local ecosystem citations is the most important part of building and fixing your businesses citations. Your business Name, Address, and Phone Number (referred to as N.A.P. format) is essential for local rankings. Make sure this information is 100% consistent before moving forward!

Below is an example of the appropriate NAP format for a Law Firm:

The Reeves Law Group 515 S Flower St Los Angeles, CA 90071‎ (213) 271-9318

You will notice that most directories display information like the example above. Some will allow you to add a link to your website, but some will not. In this case, the link is not the important information. The accurate listing of the business in the NAP format is. 

We’ve established that having accurate and consistent listing information is critical, so how do we do it?

The Easy Way May Not Be the Best Way

One easy way to get listed consistently on multiple directories is by using a service like Yext. While that can be a great option, depending on your situation, make sure you know what you are getting into. Yext, for example, will easily publish to dozens directories with the information you submit. Some will start showing instantly, and some will come up within a few days with very little work. But at over $475 a year (yes, annually) for the retail version, you might think twice about it.

If you are not looking to purchase services like Yext…

Here are Three Fundamental Steps to a Great Alternative Approach:

Prepare Your Information

I always like to start by creating a quick Google Doc with the client’s NAP information at the top. This allows me to easily copy and paste the fields if I need them while I’m building citations. It also allows me to keep the data consistent across the board. Typically, I ensure my Google Plus page is 100% accurate with my business information, and then copy and paste the information from Google Places. I will also use this same Google Doc for tracking my citation sources in one easy to use place. 

Feel free to download this free Local Citation Building Template.

In case you decide not to use the spreadsheet I created, you will see I have fields for some of the most common information that citation sources ask for- including:

  • Your Name – Your actual name or the name of business owner
  • Email Address – The Email Address that will be checked by the business
  • Company Name – The company’s exact name as it appears correctly on Google Plus
  • Address – The company’s exact address as it appears correctly on Google Plus
  • Suite or Floor Number – Only use if there is a Suite or Floor number
  • City – The company’s exact city name as it appears correctly on Google Plus
  • State- The state the company resides in
  • Zip – The zip code of the company
  • Phone Number – The LOCAL phone number of the exact business location
  • Landing Page For Location – The landing page for that office or physical location

I also added some advanced fields that I also see on some submission sites. Here are some examples: 

  • 800 Number – The 800 Number of the Business
  • Logo URL – The URL of the company’s logo hosted on your website
  • Facebook URL – The Facebook URL of the company
  • Twitter Handle – The company’s Twitter Handle
  • Places Page Link – A Link to their G+ Local Page or Google Places Page

**Below is an example of the header from my Local Citation Building Template.

The Citation Building Spreadsheet NAP Information

Citation Building Can Be a Bit Tedious, So Here’s an Easier Way…

Typing can be a bit tedious

If you’re like me and you have the attention span of a lemming, then you need some reinforcements. But when dealing with something that’s so important, how do you prevent data corruption and ensure accuracy at the same time? 

My answer is Roboform and it costs between $9.95 and $39.95. To be clear I am not affiliated in any way shape or form, it’s just the program that I found works best for me. So, I will share how I use it.

Roboform allows me to input the information about a location and have it autofill on many of the submission sites. It’s not perfect and it requires a manual review, but spending a couple of minutes setting this up is worth its weight in gold. Not only will it ensure it outputs what you put into it, but it will also store the information and you can share the data with your team. It will also integrate into your browser where you can use a drop down and select the auto fill information you want. Basically it just saves a ton of time.

How to use Roboform for Citation Building 

Once you’ve downloaded the program from Roboform.com and installed it, you can open it up and go to File > New > Identity to create a new identity. You will end up creating and naming a new Identity for each different business location you have. You can then click the edit button and spend a few minutes and fill out all of the information you want to your heart’s desire. If you’re just building citations through Roboform, then you can stick to the Person, Business and Address sections and only fill out the fields I have listed in my spreadsheet.

Start off with the Person section and fill out the following fields that are circled below including:

  • First Name – The first name you want to display on the listing. Typically, it is the same as the person registering the account. 
  • Last Name – The last name you want to display on the listing. Typically, it is the same as the person registering the account.
  • Phone – The Business Phone Number for that location (Your NAP)
  • Email – The mail address that is going to register the account and be the contact email. Use this if they’re going to be the same email. 

Roboform Person Fields You Need To Fill Out

Next Move On To the Business Section

On this page, I typically only use the company name and website. The company name will be the actual company name in your NAP format and the website will be the landing page of that physical location. Sometimes these are truncated to just the domain, but it’s always better to try and get the link you receive to go to the actual landing page for that location.  

Roboform Business Category Field To Use

Lastly, You Can Move onto the Address Section In this section you will add your address from the NAP format. 

Using the address section in Roboform

And that takes care of that part!

Now you are setup to start finding citations and knocking them out! We will use Roboform to auto fill the fields instead of typing them each time. They will still require manual review but it will save a lot of time!

Now, Let’s Get Listed on Some Local Directories, AKA: Build Some Citations

The goal of doing all of this citation is work is to make sure we end up with good data. Check it to make sure you’re not already listed before you add your listing to each of these websites. Spamming the web is not cool; even if it is unintentional. So follow this quick three step process called CHECK, FIX, ADD.

  1. Check to see if the listing is there
  2. If the listing is there, make sure the NAP is 100% accurate. If not, fix it!
  3. If the listing does not exist, add it

If you are using the free Local Citation Building Template I created, you will see a list where you can easily add the information along with notes about your new citation sources. I highly suggest keeping track of this information. Remember that you’re not just doing this to impress the search engines. You want to have access to this information in the future. What if you decide to move one day and didn’t have this?

A Screenshot from the spreadsheet:

A screenshot from my free citation building spreadsheet

At my company, I also give this information to our clients in the unlikely case that they felt we were doing a bad job and wanted to fire us. 

Make Sure You Have the Top Citations

Whether your business is brand new or old and established, I suggest you start off by adding a new listing or correcting your incorrect listing at the Top Citation Sources suggested by Getlisted.org. They worked hard to put together this list of citation sources they believe carry the most weight in different industries and geographic areas. They provide two great resources to act as a starting point:

View the Top Citation Sources by City

View the Top Citation Sources by Category

Just like with every citation source you come across, make sure to add them to your tracking spreadsheet. 

Next, Don’t Re-Invent The Wheel. Find Your Top Competitors

Do you already know who your top competition is? Check them out and see who ranks consistently for the keywords you want to rank for. 

Finding citations a year or two ago was a bit harder than it is today. These days you have some easy and affordable options to see where your competing businesses are listed. In this article I will discuss an easy way using Whitespark’s Local Citation Finder and another method for searching for them manually through Google. As with any data collection, I always recommend using multiple sources to ensure greater accuracy. 

Method #1: Using Whitespark to find your competitors citations

Start by navigating to the “Your Projects” tab. Click on the projects tab

Step 1: Create a new project. To keep things organized, I will typically create a new project by using the “+ Create new Project” button under the “Your Project” tab. It will ask you for your business Name and Phone Number and hit the Create Project button.

Create a New Project Button in Whitespark

Step 2: Find Citation Sources by Keyword – Use the option to “Search By Keyphrase” and enter the keyword information you want to rank for.

Whitespark Interface for Searching

Step 3: Wait For the Results – After starting the search, wait for a few minutes for it to compile the results. In my experience, it’s typically pretty fast. You will also get a confirmation email when the process is complete.

Whitespark Search Results

Step 4: See What Came Out and Start Getting Citations – After it’s complete, click back on the your “projects link” to see a list of your projects. Select the pink Citation Sources link to see what results came up for your listing. One of the best things about Whitespark is that they have also compiled site submission URL’s in their data.

For some listings, you can easily just click the link “Submit Your Business”. You can then just use the RoboForm drop down to autofill the information making citation building simple! You may not want to bring Whitespark home to mom because she’s so easy.

Method #2: Conduct a NAP Search in Google

You can also conduct the searches you want in a search engine, and come up with your top competitors. This is also a great way to do it because you can use the compare option to see which competitors have.

To do this, simply pull up Google and enter your competitors NAP information. Below I entered a company name, their address, and phone number that I found from their Google Plus Local page. 

Way to search competitors citation sources in google using NAP

 

With this information, I can now visit each one of these sources, and add my business to the same sources if they allow a submission. You will find some sites do not allow submissions, or are owned by the business themselves. Whitespark has a cool option to mark these as useless which makes their data very clean and accurate.  

Be Very Careful If You Outsource Citation Building

If you don’t have the time and are considering outsourcing citation building please be careful, and have some serious QA. If your people are not being meticulous with your data, you’re going to have a lot of data confusion on your hands, and spend twice the amount of time trying to fix it. On the other hand, some companies like Whitespark offer these services a la carte as well. 

Want to Learn More?

If you want to learn more advanced citation building after you have exhausted these resources, I suggest you read my write up of some tips from David Mihm’s presentation from Local U Advanced Baltimore. Better yet, if you have a chance make sure you attend the next Local U Advanced session. 


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

How to Build a Content Marketing Strategy

Posted by Stephanie Chang

Link building has fundamentally changed. Many types of link building activities that have previously been effective are now either short-term strategies or no longer considered best SEO practice. As a result, companies and clients alike are seeking to understand how certain forms of link building can be translated into longer-term content marketing campaigns. The purpose of this post is to help you develop a framework on how to start building a content marketing strategy for your or your client’s site.

Why should you care about content marketing?

According to a Content Marketing Institute (CMI) 2013 Survey, 86% of B2C (business to consumer) companies are planning to keep or increase their current content marketing spending this year. 54% of B2B (business to business) companies are planning to increase their content marketing spending in 2013. Knowing that the demand for content marketing is increasing, it’s worth investing resources to start researching and learning more about the opportunities content marketing can bring to a site. 

B2C Content Marketing Spending in 2013

B2B Content Marketing Spending in 2013

The growth of content marketing is also a concept that Fred Wilson of Union Square Ventures agrees with. Content marketing continues to see growth because it is the future of online marketing. He likes to think of content marketing as “moving the message from a banner to your brand and changing the engagement from a view to a conversation.”

Furthermore, Google’s algorithm is continuously changing, meaning this pretty much guarantees that the quick win strategies that may have worked in the past will no longer work in the future. For instance, Google has announced that in the future, they will no longer be announcing/confirming Panda updates because it will be integrated into the search engine’s existing algorithm (i.e. Panda is here to stay indefinitely). We’ve also seen recently the dangers of garnering links from paid advertorials (even on respected, high domain authority websites), a tactic considered as “buying links” in Google’s perspective.

Now is definitely the time to develop a new type of strategy to garner links and traffic. 

Inspirational examples of phenomenal content

Below are some examples of companies that have created phenomenal pieces of content. Hopefully this provides ample motivation to take your site/client’s site to the level!

1. Kickstarter: Best of 2012: An inspirational take on 2012.

Kickstarter

2. BuzzFeed lists: Heartwarming content that is easily shareable.

BuzzFeed List

3. Indeed Job Trends: Data-driven content that is direct and to the point.

Indeed Job Trends

4. Shopify’s Pinterest infographic and their new E-commerce University: Content that is effectively targeted towards their demographic and developing their brand as the E-commerce authority on the web.

Shopify Infographic

Ecommerce University

5. Airbnb Neighborhood Guides: A visually stimulating take on neighborhood guides, which differentiates them from other competitor’s guides.

Neighborhood Guides

6. HBOWatch’s April Fool’s Day joke: Content with a clear understanding of target audience as determined by the high engagement metrics. It gained 1129 comments!

HBOWatch

7. Epic Meal Time: Videos targeted towards a male demographic. Topic examples include fast food lasagna and whiskey syrup bacon pancakes.

Whisky Syrup Pancakes


The content marketing strategy framework

I’ve been fortunate enough to work closely with Distilled’s Head of Outreach, Adria Saracino, who’s been absolutely instrumental in defining the below content marketing strategy framework for a number of my clients (and has, subsequently, inspired my passion for content marketing). Adria has also written a great piece on how to get buy in from your company to invest in content marketing.

Adria Saracino

Below is the content strategy framework that Adria and I have implemented together for our clients. We’ve learned that this process isn’t a quick win and that our most successful content marketing strategies have relied on dedicating at least 3 months to just research – market research, site audits, content audits, customer surveys, and customer interviews to name just a few. In addition, I’ll also showcase a few specific examples of how we’ve built out each step of the content strategy process. 

Step 1: Researching the company

The first step in developing a content strategy framework is understanding the company. The type of questions we ask our clients before we even commence the strategy is to identify the following:

  • The company’s business model
    • How does the company bring in revenue?
    • What products bring in the most revenue? Why do these products bring in the most revenue (high profit margin, high demand, branding considerations)?
    • How is the sales team structured? What metrics are they measured on? 
  • The existing customer base
    • Who are the company’s existing customers?
    • How does the company currently attract customers? 
    • If the company’s marketing team has already done a market research survey, ask to see the results.
  • Marketing considerations
    • Understanding the existing content process
      • What are the editorial guidelines (if there are any)? What is the internal process to get content approved?
      • Who decides what type of content to produce?
      • What types of content does the team currently produce?
      • What are the company’s brand considerations?

Step 2: Data collection (and lots of it)

I believe in utilizing the data that we have available to make informed decisions. This applies specifically to content; the more we understand about the site and the customers, the more we are able to make informed and strategic decisions to the type(s) of content we want to produce. In order to do this, it’s important to gather relevant data. This data can come from a variety of the following sources:

  • Competitor analysis
    • What types of content are your competitors putting together? 
    • How are users engaging with the content?
    • Comparing/contrasting SEO metrics (DA, PA, external links, etc.)
  • Keyword research
    • ​What keywords bring traffic to the traffic (excluding not provided)?
    • What are the landing pages for those keywords?
    • What type of metrics does the keyword research and landing page combination currently bring to the site?
  • Market research and customer surveys
    • The surveys may vary depending on whether the company is b2b or b2c.
    • Traditionally, some of the survey questions we’ve asked b2b clients include:
      • Demographic-related questions like occupation, industry, job title, age, and gender.
      • How long have you been a customer?
      • How likely are you to recommend our services, products, etc.
      • Specific product/service-related questions
    • The survey questions we’ve asked b2c clients are very similar, but often contain more demographic questions like: highest level of education obtained, marital status, number of kids, household salary range, and occupation.
      • We also include specific product questions, like:
        • How often do you purchase our product?
        • Why do you purchase the product?

*Important Note* Be sure to test out your survey using other individuals unrelated to the survey before releasing it. This ensures that there are no ambiguous questions or that any questions have been framed in a way that would lead to biased answers. 

SurveyMonkey has also produced a variety of survey templates to at least help you gain some understanding of the type of questions you might want to ask your target audience depending on your goals for the survey.  

Survey Examples

Having these sample surveys is an excellent content strategy technique that SurveyMonkey has employed. 

Not only are the survey questions themselves important, but the email you send out in conjunction with the survey is a big indicator of your survey’s success. Ideally, the more data you have accessible, the more likely the survey will become statistically significant. As a result, you want to make sure that the email template catches the audience’s attention and also creates an incentive for them to fill out your survey. 

Below is an actual survey template that we’ve used for a client, which has generated 917 responses or approximately 50% of the client’s email list.

Survey Template

  • Phone Interviews with Existing Customers
    • As you can see from the survey template above, individuals voluntarily opt for phone interviews because there is a guaranteed prize incentive. 
    • Questions asked in the phone interview are much more detailed (allowing us to eventually use this information for target audience persona development). Fundamentally, the type of questions you ask in the interview must help you:
      • Identify the person’s day-to-day responsibilities, likes/dislikes, frustrations/pressures, needs, concerns, and function they play in the purchasing process.
        • Function they play in the purchasing process is based on the following roles:
          • Initiator: identifies the need to purchase the product
          • Influencer: evokes influence on the individuals who can make the decision to purchase the product
          • Decision-maker: decides whether or not to purchase the product
          • Buyer: selects who to buy from and the agreements that come alongside that
          • User: utilizes the product
          • Gatekeeper: has access or supplies information to both the decision maker and/or the influencer

Persona Development

Step 3: Preparation and assessment

Now that new data has been collected from various channels, it’s important to assess/analyze the data that has just been collected and see how it correlates with the data that you already have on-hand. During this stage, it’s also critical to take a step back and make sure that the goals for the content have been clearly defined. 

  • Create a benchmark audit using analytics
    • This provides an opportunity to compare/contrast results before and after the creation of the content 
    • Important analytics to include are:
      • Traffic
      • Pageviews
      • Pages per visit
      • Average time on site
      • Entrances/exits
      • Conversion rate
      • Bounce rate
      • Linking root domains
      • Page authority
      • Rankings
  • Putting together a content audit
    • ​The purpose of the content audit is evaluate how previous content on the site has performed, as well as organize the existing content on the site to determine additional opportunities. 
    • For one of my clients, Adria and I analyzed the top 500 landing pages on the client’s site and took a look at the content from three distinct lenses:
      • Analytics metrics: engagement (bounce rate, time on site) and number of visits (to identify potential keyword opportunities)
      • SEO metrics: linking root domains, page authority, etc.
      • Content perspective: is this useful for a user? What type of user would it attract?
        • We individually analyze each content page and determine where it sits on the content funnel.
          • Awareness: Content created for this part of the funnel is designed to target an audience that hasn’t even begun to consider the company’s product/services.
          • Trigger: Content created for this part of the funnel is when a user has become aware of the product/service and has started thinking about the possibility of needing it.
          • Search: User has decided to research the product/service in-more depth.
          • Consideration: User has decided to convert, but hasn’t decided which brand to choose.
          • Buy: User decides to convert to the company’s product/service.
          • Stay: Content targeted towards retaining clients, ensuring they remain a loyal customer/brand advocate.

Content Funnel

The purpose of labeling what stage of the funnel each piece of content is associated with is to ultimately assess the distribution of content on a site and determine if there are any gaps. For instance, this particular site had 180 unique content pages and the distribution of the site’s content looked like this:

Content Distribution

In this specific case, it is apparent that a majority of the site’s content sits at the bottom of the funnel. As a result, we recommended to the client that they create more content that targets higher up the funnel. However, it is also important to bear in mind that a site is not necessarily looking for an even distribution of content at each stage of the funnel. The amount needed is determined by various factors, like keyword research and an iterative approach in which content is built that targets a specific stage of the funnel. Afterwards, these pieces of content are analyzed to determine if they proved value based on the site’s pre-determined content goals and KPIs. This closely ties into our next point, which is:

  • Clarify the goals for this content strategy. Goals should be general like:
    • Increase in conversions
    • Increase in organic traffic to the site
    • Increase in audience engagement
    • increase in brand awareness
  • However, goals/metrics should also be specifically correlated to where that content sits in the content funnel:
    • ​This great article by Jay Baer explains it in more depth:
      • Consumption metrics: How many views/downloads did your content receive? 
      • Sharing metrics: How often does your content get shared? (Tweets, Likes…etc)
      • Lead generation metrics: How often do the consumers turn into leads?
      • Sales metrics: How often do the consumers turn into sales? 
    • Ideally, the consumption metrics would be correlated to content higher up in the funnel and the sales metrics correlated to content located further down the funnel. See diagram below:

Metrics and Content Funnel

  • Develop persona buckets
    • In order to achieve this, combine all the data that was derived from the content audit, customer surveys, and customer interviews. Once you’ve done so, segment individuals into different categories, like this: 

Persona Buckets

Image Courtesy of Kissmetrics

  • Solidify the editorial process for the company
    • Who needs to be included in the content development and implementation phase? When do they need to be included? 
    • Have a clear understanding of the dependencies (i.e. how long does it typically take to get sign off from relevant departments?)
    • Determine the site’s style guide/tone of voice/engagement standards
  • Define the content strategy
    • What types of content will be produced on the site? 
    • Where does this content sit in the funnel?
    • Where would they sit on the site? In a separate category on an existing category?
    • What keywords would the content target?

Going through this detailed, research-intensive process allows a company to clearly see the opportunities at hand from a high-level perspective. When we go through this process, we identify ways to improve not only the company’s organizational structure and create standardizations on how content and pages are released onto the site (static URLs, keyword targeting, content tone of voice/length). It’s also through this process that we’ve been able to engage/integrate multiple departments and define ways to work together seamlessly.

Furthermore, we also gain a concrete understanding of the big opportunities for the site. It’s impossible to go through this much research and not be able to discern multiple opportunities related to CRO, information architecture, keyword targeting, and analytics, to name a few. 

Step 4: Prospecting

This phase of the process is identifying individuals/sites who would be interested in the type of content the company will produce and engaging them at multiple points with the goal to develop relationships with key influencers.

  • Identify and reach out to influencers
  • Keep on top of industry news
  • Keep on top of the content that competitors are creating

Step 5: Create and promote the content

In this step, the “go” is to now create the pieces of content and follow both the internal protocols and sign off processes that were established in step three of the process. Ensure that editorial standards are being followed and assess that the content being created is actually phenomenal. 

  • Create the content and consistently reassess to make sure it is meeting the following checklist:
    • Is the content credible?
    • Is the content informative?
    • Is the content easy to understand? 
    • Is the content useful?
    • Is the content exceptional?
  • Promote and outreach the content to key influencers

Step 6: Assess content performance

After the content has been released and promoted, it’s time to assess how the content has performed and any other learnings that can be taken away from the process, including:

  • How has the piece performed?
  • What learnings were taken away from it? Any changes that need to be made to the process? 
  • What data have we received from the piece of content?

The long-term vision is that the content is able to fulfill the original goals of the content marketing strategy. Overtime, each piece of content produced should systematically become easier and easier, as learnings are developed and iterated each time. Although, the process appears very resource-intensive in the beginning, overtime, the goal is that producing effective and meaningful content becomes a crucial entity for the company.


In conclusion, the most valuable benefits of having a content strategy for your site is that, from a business standpoint, your site is no longer creating content for “content’s sake” or to build “link bait.” Moving forward, the site now has a framework of creating content that serves multiple purposes: to engage with current and future customers; to establish brand awareness and authority within the industry; and to consequently garner more traffic, conversions, and links to your site.

Furthermore, by integrating multiple individuals into the development of a site’s content strategy, it automatically provides the groundwork of integrating SEO seamlessly into the other online marketing activities of the site, such as CRO, social media, and PR. 


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

How Do You Know If Your Data is Accurate? A case study using search volume, CTR, and rankings

Posted by Matt Peters

Big Data and analytics has been called the “next big thing,” and it can certainly make a strong case with the explosion of easily accessible, high-quality data available today. In the inbound marketing world, we have access to backlinks and anchor text, traffic and click stream data, search volume and click through rate (CTR), social media metrics, and many more. There is huge value in this data, if we can unlock it.

But, there’s a problem:  real world data is messy, and processing it can be tricky. How do we know if our data is accurate, or if we can trust our final conclusions? If we want to use this data to find a better way to do marketing, we have to be careful about accuracy.

There are no hard and fast rules when it comes to data analysis. There are some best practices, but even these can get a little murky. The most important thing to do is to put on your detective cap and dive into the data. The more familiar you are with the data, the easier it is to spot something that seems strange. More than likely, your findings will be quality issues that need to be improved.

Throughout this post, we will use a data set from Google Webmaster Tools of keyword search referrals as a case study. Here’s a snippet of the data:

We also put all of our keyword analysis code on Github so you can run our analysis on your own site’s data.

The rest of this post discusses six best practices and suggestions for ensuring your data and results are accurate. Enjoy!


1. Separate data from analysis, and make analysis repeatable

It is best practice to separate the data and the process that analyzes the data. This also makes it possible to repeat the analysis on different data, either by you or by someone else. For this reason, most data scientists don’t use Excel since it couples the data with analysis and makes it difficult to repeat. Instead, they often use a high-level statistical oriented scripting language, like R, Matlab/Octave, SAS, or a general-purpose language like Python.

At Moz, the data science team uses Python. Our Big Data team also uses it heavily, which makes it easy to integrate our algorithms with their production code.

2. If possible, check your data against another source

In many cases this step may be impossible, but if you can, it’s the best way to make sure you data is accurate. In Moz’s case, we were able to check the Google Webmaster Tools data against data from Google Analytics.

Some pieces to focus on when you’re comparing data include total aggregate counts, counts in sub-categories, or averages. In our case, we checked both the total search visits and spot check the number of visits for a few different keywords.

3. Get down and dirty with the data

This is the fun part where we get to play with the data and do some exploratory data analysis. A good place to start is by looking at the raw data to see what jumps out. In the case of the Google Webmaster Tools data, I noticed that they don’t always give the search volume in long-tail cases with only a few searches. Instead, the data has “<10” or “-” instead of numbers that will need to be handled carefully since they will result in missing values.

This is also the time to put on your detective cap and start asking questions about the data. We looked at some keywords like “seomoz” and “page authority” that are branded, and some like “author rank” and “schema testing tool” that are not. After checking out the data, I asked myself, “Hmmm, I wonder if there is any difference in Click through rate between branded and non-branded keywords, or in average search position?”

Usually by this point I’m amped to start answering hard questions, but I try to resist the temptation to jump off the deep end until I run a few more sanity checks. Univariate analysis is a great tool to help you check yourself before going too far, especially since most software packages provide an easy way to do it and it often produces the first interesting results. The idea is to get a picture of what each variable “looks like” by plotting a histogram and calculating things like the mean.

The above chart shows an example of univariate analysis on our data. In each panel, we have plotted the distribution of one of the four variables in our data: Impressions, Average Position, Clicks, and CTR. We also included the mean of each distribution in the title. Immediately, we can see a few interesting comparisons. 

First, almost all of our keywords are “long-tail” with less then 100 searches/month. However, much of our traffic is also made up from a few high-volume keywords (>1000 searches/month). The average position is concentrated in the top 10 as expected (since results off the first page send very little traffic). This is also good check of our data. If we had seen a significant amount of keywords sending traffic at ranks lower then #10, we should investigate further. Finally, the CTR in the lower right is interesting. Most of the keywords have CTR less then 40%, but we do have a few high volume keywords with much higher CTR.

By now, I usually feel pretty comfortable with the data and can jump in. At this point, I’ve found that asking specific questions is often the most productive way to answer bigger questions, but everyone works differently, so you’ll need to find what works best for you. In the case of the Google Webmaster Tools data, I’m curious about the impact of branded vs non-branded keywords.

One way to examine this is to segment the data and then repeat the univariate analysis for each segment. Here’s the plot for impressions:

We can see that, overall, branded keywords have a higher search volume then non-branded words (means of 380 and 160, respectively). It gets more interesting if we look at average position and CTR:

We see a huge difference in Average Position and CTR between the branded and non-branded words. Most of our traffic from branded words is in the top two or three positions, with non-branded queries sending traffic throughout the top 10. The CTR is also significantly different with a few branded keywords having very high CTR (60%+).

We might also wonder about how the CTR changes with the search position. We expect that lower-ranking keywords will have a lower CTR. Can we see this in the data?

Indeed, the CTR drops off rapidly after the top five. There is an interesting bump up at position 15, but this is a data sparse region so this may not be a real signal.

4. Unit test your code (where it makes sense)

This is a software development best practice, but can get a little sticky in the data science world and often requires judgement on your part. Unit testing everything is a great way to catch many problems, but it will really slow you down. It’s a good idea to use unit test code that you think will be used again, has a general purpose outside the specific project, or has complicated enough logic that it would be easy to get wrong. It’s often not worthwhile to test code quickly written to check an idea.

In the case of the Google Webmaster Tools data, we decided to test the process that reads the data and fills missing values because the logic is somewhat complicated, but didn’t test our code to generate the plots since it was relatively simple. We used a small, synthetic data set to write the tests since it is easy to manage. Check out some of our tests here.

5. Document your process

This step can be annoying, but you will thank yourself a few months later when you need to revisit it. Documentation also communicates your thoughts to others who can check and validate your logic.

In our case, this blog post documents our process, and we provide some additional documentation in the README in the code.

6. Get feedback from others

Peer review is one of the cornerstones of the academic world, and other people’s insight is almost always beneficial to improving your analysis. Don’t hesitate to ask your team for feedback; most of the time, they’ll be happy to give it! 


Do you have any other helpful testing tips? What has worked for you and your team? I’d love to hear your thoughts in the comments below!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

How I Wish Amazon Reviews Worked

Posted by Dr. Pete

This is not a post about SEO. It is, however, a post about the future of search. This surprised even me – when I started writing this piece, it really was just an idea about building a better review. I realized, though, that finding relevant reviews is a useful microcosm of the broader challenge search engines face. Specifically, I want to talk about three S’s – Social, Sentiment, and Semantics, and how each of these pieces fit the search puzzle. Along the way, I might just try to build a better mousetrap.

The Core Problem

Product reviews are great, but on a site as big and popular as Amazon.com, filtering reviews isn’t much easier than filtering Google search results. Here’s the review section for the Kindle Fire:

Kindle Fire on Amazon - 10,859 reviews

That’s right – 10,859 reviews to sort through. Even if I just decide to look at the 5 stars and 1 stars, that’s still 7,208 reviews. If I could click and skim each one of those 7,208 in about 5 seconds, I’ve got roughly 10 hours of enjoyment ahead of me (if I don’t eat or take bathroom breaks). So, how can we make this system better?

(1) The Social Graph

These days our first answer is usually: “SOCIAL!” Social is sexy, and it will solve all our problems with its sexy sexiness. The problem is that we tend to oversimplify. Here’s how we think about Search + Social, in our perfect world:

Search/Social Intersection = Sexy

Unfortunately, it’s not quite so magical. There are two big problems, whether we’re talking about product reviews or organic search results. The first problem is a delicate one. Some of the people that you associate with are – how shall I put it – stupid.

Ok, maybe stupid is a bit harsh, but just because you’re connected to someone doesn’t mean you have a lot in common or share the same tastes. So, we really want to weed out some of the intersection, like Crazy Cousin Larry…

Search/Social Intersection minus Crazy Cousin Larry

It’s surprisingly hard to figure out who we actually sit at the Crazy-Larry table. Computationally, this is a huge challenge. There’s a bigger problem, though. In most cases, especially once we start weeding people out, the picture actually looks more like this:

Real Search/Social Intersection - Very Small

Even with relatively large social circles, the actual overlap of your network and any given search result or product is often so small as to be useless. We can extend our circles to 2nd- and 3rd-degree relationships, but then relevance quickly suffers.

To be fair to Amazon, they’ve found one solution – they elicit user feedback of the reviews themselves as a proxy social signal:

20,396 people thie review helpful

This approach certainly helps, but it mostly weeds out the lowest-quality offerings. Reviews of reviews help control quality, but they don’t do much to help us find the most relevant information.

(2) Sentiment Analysis

Reviews are a simple form of sentiment analysis – they help us determine if people view a product positively or negatively. More advanced sentiment analysis uses natural-language processing (NLP) to try to extract the emotional tone of the text.

You may be wondering why we need more advanced sentiment analysis when someone has already told us how they feel on a 1-5 scale. Welcome to what I call “The Cupholder Problem”, something I’ve experienced frequently as a parent trying to buy high-end products on Amazon. Consider this fictional review which is all-too-based in reality:

The Cupholder Problem (fake review)

I’m exaggerating, of course, but the core problem is that reviews are entirely subjective, and sometimes just one feature or problem can ruin a product for someone. Once that text is reduced to a single data point (one star), though, the rest of the information in the content is lost.

Sentiment analysis probably wouldn’t have a dramatic impact on Amazon reviews, but it’s a hot topic in search in general because it can help extract emotional data that’s sometimes lost in a summary (whether it’s a snippet or a star rating). It might be nice to see Amazon institute some kind of sentiment correction process, warning people if the tone of their review doesn’t seem to match the star rating.

(3) Semantic Search

This is where things get interesting (and I promise I’ll get back to sentiment so that the previous section has a point). The phrase “semantic search” has been abused, unfortunately, but the core idea is to get at the meaning and conceptual frameworks behind information. Google Knowledge Graph is probably the most visible, recent attempt to build a system that extracts concepts and even answers, instead of just a list of relevant documents.

How does this help our review problem? Let’s look at the “Thirsty” example again. It’s not a dishonest review or even useless – the problem is that I fundamentally don’t care about cupholders. There are certain features that matter a lot to me (safety, weight, durability), others that I’m only marginally sensitive to (price, color), and some that I don’t care about at all (beverage dispensing capability).

So, what if we could use a relatively simple form of semantic analysis to extract the salient features from reviews for any given product? We might end up with something like this:

Sample Review w/ Feature Extraction

Pardon the uninspired UI, but even the addition of a few relevant features could help customers drill down to what really matters to them, and this could be done with relatively simple semantic analysis. This basic idea also illustrates some of the direction I think search is heading.  Semantic search isn’t just about retrieving concepts; it’s also about understanding the context of our questions.

Here’s an interesting example from Google Australia (Google.com.au). Search for “Broncos colors” and you’ll get this answer widget (hat tip to Brian Whalley for spotting these):

Denver Broncos Colors (Google.com.au)

It’s hardly a thing of beauty, but it gets the job done and probably answers the query for 80-90% of searches. This alone is an example of search returning concepts and not just documents, but it gets even more interesting. Now search for “Broncos colours”, using the British spelling (still in Google.com.au). You should get this answer:

Brisbane Broncos Colors

The combination of Google.com.au and the Queen’s English now has Google assuming that you meant Australia’s own Brisbane Broncos. This is just one tiny taste of the beginning of search using concepts to both deliver answers and better understand the questions.

(4) Semantics + Sentiment

Let’s bring this back around to my original idea. What if we could combine semantic analysis (feature extraction) and sentiment in Amazon reviews? We could easily envision a system like this:

Reviews with Feature Extraction + Sentiment

I’ve made one small addition – a positive or negative (+/-) sentiment choice next to each feature. Maybe I only want to see products where people spoke highly of the value, or rule out the ones where they bashed the safety. Even a few simple combinations could completely change the way you digest this information.

The Tip of the Penguin

This isn’t the tip of the iceberg – it’s the flea on the wart on the end of the penguin’s nose on the tip of the iceberg. We still think of Knowledge Graph and other semantic search efforts as little more than toys, but they’re building a framework that will revolutionize the way we extract information from the internet over the next five years. I hope this thought exercise has given you a glimpse into how powerful even a few sources of information can be, and why they’re more powerful together than alone. Social doesn’t hold all of the answers, but it is one more essential piece of a richer puzzle.

I’d also like to thank you for humoring my Amazon reviews insanity. To be fair to Amazon, they’ve invested a lot into building better systems, and I’m sure they have fascinating ideas in the pipe. If they’d like to use any of these ideas, I’m happy to sell them for the very reasonable price of ONE MILL-I-ON DOLLARS.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Evolution of the Local Algorithm – Whiteboard Friday

Posted by David Mihm

Remember the days when doing SEO for a local business was no different than doing SEO for any other business? We’ve come a long way since the early 2000’s, and local SEO has evolved tremendously since the beginning of online search. There are still many questions to be answered when it comes to the ever-changing landscape of local SEO: what are the factors Google is using to rank local businesses? Where should owners focus their energy? What’s the new hot thing for local rankings?

Our local expert, David Mihm, is here to shed some light on all of your burning local SEO questions. In today’s Whiteboard Friday, David discusses what factors affect ranking of local businesses, and how local algorithms within Google have evolved throughout the years. 



Video Transcription

“Hey everybody. David Mihm, the Director of Local Search Strategy for SEOmoz, here doing my very first Whiteboard Friday since joining the company, and for the very first one I thought I would start with one of the most common questions that I get asked about local search, which is:  What are the factors Google is using to rank local businesses? Where should I be focusing my energy? And also kind of how has that changed over time? What’s the new hot thing for local rankings?

So I thought I’d take you guys through kind of a brief history, from my perspective, of how the local algorithm has evolved at Google. So with the help of my handy dandy graph that I’ve sort of started to kick things off. Back in the late 1990’s, 2000-ish, when Google first came out, many of you who have been practicing SEO for that long kind of remember, hey back in those days doing SEO for a local business was no different than doing SEO for any other kind of business. Right?

You needed title tags telling what you did, where you did it, where you were located, and you needed links pointing at your site with those keywords embedded in those links, preferably from locally relevant websites. But really at that time any link that had good anchor text with location or product and service information, that’s how you ranked in those 10 blue link type search results.

Fast forward a little bit to January of 2008, many of you guys remember at that point Google introduced these 10 packs of local businesses right there in the main search results. So if you did a search for something like Portland injury lawyer, you’d see a map with 10 injuries lawyers’ business listings rather than website information.

So that was really the first point at which we saw this concept of citation start to play a role in local rankings. So Google said, “Okay, well we know that there are 22 million businesses out there in the U.S. Less than half of them even have a website at this stage, so we have no way to gauge what the title tags are on a non-existent website, and it’s not possible to send a link to a business without a website.” Right?

So Google introduced this concept of citations where they sort of tracked mentions of a business across the web. So just someone mentioning the business name with its address, with its phone number, somewhere out there on the web would count essentially as a vote for that business, just like the way links count for votes on websites. So we started to see that play a pretty big role in these rankings for 10 packs soon after they were introduced.

Again, fast forward a little bit to March 2009. We started to see these 10 packs being introduced for generic queries, queries without geo-modifiers. So instead of typing in “Portland injury lawyer,” if you typed in something like “injury lawyer,” Google associated that as being a phrase with local intent. You were looking to hire somebody in your particular market, and they showed this for a ton of different phrases, things like restaurants, pizza, bakeries, things where they knew you were looking for a business in your area.

It was really about at this time that we started to see reviews play a little bit larger role. So what people were saying about you on some of these primary websites that businesses were getting cited on, places like Yelp, City Search, Urban Spoon, these types of sites the reviews that users were leaving really seemed to start to play more of a role in rankings.

And keep in mind that it’s not like all of a sudden the importance of title tags and links went away. It’s not like the importance of citations went away. But Google sort of layered on this additional ranking factor of user reviews, and not only user reviews at third party websites, like Yelp, City Search, the ones I mentioned, but also reviews left directly at Google Places. I’ll switch sides here for just a second.

That really started to come in to importance in April 2011, when Google rolled its Hot Pot product right into Google Places. So Google launched this Hot Pot product, essentially a precursor of Google+, where Google would surface businesses that your friends had rated higher in the search results. They launched that in about November of 2010. Just about six months after that, they integrated it right into Google Places, and again this was when we started to see especially reviews left directly at Google Places really start to play a more important role.

And then everybody remembers June 2012 or actually late May 2012, when Marissa Mayer announced Google+ Local prior to leaving to take the job at Yahoo. So right there in the search results we started to see Google+ information getting surface. So the number of circles that an author of a website was in and the number of circles that a local business had in its following, those types of things started to play a role. They still don’t seem to be quite as important as some of these other more traditional factors – title tags and links, user reviews, and citation information. But we do think going forward here I’ve got this sort of . . . to represent current time and some time in the future. We do think, most of us in the local search community, that Google will start to incorporate a few more of these Google+ signals into the local rankings.

And just to speculate a little bit, because I love to speculate, going forward I also think we’re going to see Google potentially integrating some offline information into the local rankings. So what do I mean by that? As we get more and more comfortable, we as a society get more and more comfortable with things like Foursquare check-ins or Facebook check-ins, using our phones to make mobile payments, using Google Wallet, or companies like Square or LevelUp, these types of things, loyalty programs, Google has acquired a company several years ago that focused on digital loyalty cards, these types of offline signals about how we’re actually engaging with businesses in the real world, I think there’s no reason that they wouldn’t try to incorporate those into their local rankings going forward.

So keep in mind through all of this Google’s goal has been to identify what the most popular businesses are in a given category, in a given community, and what better way to gauge popularity than the number of people actually buying something at a business or actually visiting a business and checking in.

So that’s why I kind of speculate that we will start to see offline signals maybe playing a role in the future, but for right now I kind of see title tags and links, reviews, citation information, all being about equal in importance, and going forward again I think we’ll start to see Google+ play a little bit more of a role as well as potentially these offline signals.

So that’s it for me from this week, and I hope you enjoyed this brief tour of the evolution of the local algorithm at Google.”

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →