Archives for 

seo

Behind the Scenes of Fresh Web Explorer

Posted by dan.lecocq

Fresh Web Explorer is conceptually simple — it’s really just a giant feed reader. Well, just a few million of what we think are the most important feeds on the web.

At a high level, it’s arranged as a pipeline, beginning with crawling the feeds themselves and ending with inserting the crawled data into our index. In between, we filter out URLs that we’ve already seen in the last few months, and then crawl and do a certain amount of processing. Of course, this wouldn’t be much of an article if it ended here, with the simplicity. So, onwards!

The smallest atoms of work that the pipeline deals with is a job. These are pulled off of various queues by a fleet of workers, processed, and then handed off to other workers. Different stages take different amounts of times, are best suited to certain types of machines, and thus it makes sense to use queues in this way. Because of the volume of data that must move through the system, it’s impractical to pass the data along with each job. In fact, workers are frequently uploading to and downloading from S3 (Amazon’s Simple Storage Service) and just passing around references to the data stored there.

The queueing system itself is one we talked about several months ago called “qless.” Fresh Web Explorer is actually one of the two projects for which qless was written (campaign crawl’s the other), though it has found adoption for other projects from our data science team to other as-of-yet announced projects. Here’s an example of what part of our crawl queue looks like, for example:

In each of the following sections, I’ll be talk about some of the hidden challenges to many of these seemingly-innocuous stages of the pipeline, as well as the particular ways in which we’ve tackled them. To kick this process off, we begin with the primordial soup out of which this crawl emerges: the schedule of our feeds to crawl.


Scheduling

Like you might expect on the web, a few domains are responsible for most of the feeds that we crawl. Domains like Feedburner and Blogspot come to mind, in particular. This becomes problematic in terms of balancing politeness with crawling in a reasonable timeframe. For some context, our goal is to crawl every feed in our index roughly every four hours, and yet some of these domains have hundreds of thousands of feeds. To make matters worse, this is a distributed crawl on several workers, and coordination between workers is severely detrimental to performance.

With job queues in general, it’s important to strike a balance between too many jobs and jobs that take too long. Jobs sometimes fail and must be retried, but if the job represents too much work, a retry represents a lot of wasted work. Yet, if there are too many jobs, the queueing system becomes inundated with operations about maintaining the state of the queues.

To allow crawlers to crawl independently and not have to coordinate page fetches with one another, we pack as many URLs from one domain as we can into a single job subject to the constraint that it could be crawled in a reasonable amount of time (on the order of minutes, not hours). In the case of large domains, fortunately, the intuition is that if they’re sufficiently popular on the web, then they can handle larger amounts of traffic. So we pack all these URLs into a handful of slightly larger-than-normal jobs in order to limit the parallelism, and so long as each worker obeys politeness rules, we’re guaranteed a global close approximation to politeness.

Deduping URLs

Suffice it to say, we’re reluctant to recrawl URLs repeatedly. To that end, one of the stages of this pipeline is to keep track of and remove all the URLs that we’ve seen in the last few months. We intentionally kept the feed crawling stage simple and filter-free, and it just passes _every_ url it sees to the deduplication stage. As a result, we need to process hundreds of millions of URLs in a streaming fashion and filter as needed.

As you can imagine, simply storing a list of all the URLs we’ve seen (even normalized) would consume a lot of storage, and checking would be relatively slow. Even using an index would likely not be fast enough, or small enough, to fit on a few machines. Enter the bloom filter. Bloom filters are probabilistic data structures that allow you to relatively compactly store information about objects in a set (say, the set of URLs we’ve seen in the last week or month). You can’t ask a bloom filter to list out all the members of the set, but it does allow you to add and query specific members.

Fortunately, we don’t need to know all the URLs we’ve seen, but just answer the question: have we seen _this_ url or _that_ url. A couple of downsides to bloom filters: 1) they don’t support deletions, and 2) they do have a small false positive rate. The false positive rate can be controlled by allocating more space in memory, and we’ve limited ours to 1 in 100,000. In practice, it turns out to often be less than that limit, but it’s the highest rate we’re comfortable with. To get around the lack of being able to remove items from the set, we must resort to other tricks.

We actually maintain several bloom filters; one for the current month, another for the previous month, and so on and so forth. We only add URLs that we’ve seen to the current month, but when filtering URLs out, we check each of the filters for the last _k_ months. In order to allow these operations to be distributed across a number of workers, we use an in-memory (but disk-backed) database called Redis and our own Python bindings for an in-Redis bloom filter, pyreBloom. This enables us to filter tens of thousands of URLs per second and thus, keep pace.

Crawling

We’ve gone through several iterations of a Python-based crawler, and we’ve learned a number of lessons in the process. This subject is enough to merit its own article, so if you’re interested, keep an eye on the dev blog for an article on the subject.

The gist of it is that we need a way to efficiently fetch URLs from many sources in parallel. In practice for Fresh Web Explorer, this is hundreds or thousands of hosts at any one time, but at peak it’s been on the order of tens of thousands. Your first instinct might be to reach for threads (and it’s not a bad instinct), but it comes with a lot of inefficiencies at the expense of conceptual simplicity.

There are mechanisms for the ever-popular asynchronous I/O that are relatively well-known. Depending on what circles in which you travel, you may have encountered some of them. Node.js, Twisted, Tornado, libev, libevent, etc. At their root, these all use two main libraries: kqueue and epoll (depending on your system). The trouble is that these libraries expose a callback interface that can make it quite difficult to keep code concise and straightforward. A callback is a function you’ve written that you give to a library to run when it’s done doing it’s processing. It’s something along the lines of saying, ‘fetch this page, and when you’re done, run this function with the result.’ While this doesn’t always lead to convoluted code, it can all too easily lead to so-called ‘callback hell.’

To our rescue comes threading’s lesser-known cousin, coroutines and incarnated in gevent. We’ve tried a number of approaches, and in particular we’ve been burned by the aptly-named “twisted.” Gevent has been the sword that has cut the gordian knot of crawling. Of course, it’s not a panacea, and we’ve written a lot of code to help make common crawling tasks easy. Tasks like URL parsing and normalization, and robots.txt parsing. In fact, the Python bindings for qless even have a mode that is gevent-compatible, so we can still keep our job code simple and still make full use of gevent’s power.

A few crawlers is actually all it takes to maintain steady state for us, but we’ve had periods where we wanted to accelerate crawling (for backlogs, or to recrawl when experimenting). By way of an example of the kind of power the coroutines offer, here are some of our crawl rates for various status codes scaled down to 10%. This graph is from a time when we were using 10 modestly-sized machines, and while maintaining politeness they sustain about 1250 URLs/second including parsing, which amounts to about 108 million URLs a day at a cost of about $1 per million. Of course, this step alone is just a portion of the work that goes into making Fresh Web Explorer.

Dechroming

There’s a small amount of processing associated with our crawling. Parse the page, look at some headers, et. all, but the most interesting feature of this process is the dechroming: trying to remove all the non-content markup in a page, from sidebars to headers to ads. It’s a difficult task, and no solution will be perfect. Despite that, through numerous hours and great effort (the vast majority of which has been provided by our data scientist, Dr. Matt Peters) we have a reasonable approach.

Dechroming is an area of active research in certain fields, and there are certainly some promising approaches. Many of the earlier approaches (including that of blogscape from our tools section, Fresh Web Explorer’s predecessor) relied on finding many examples from a given site, and then using that information to try to find the common groups of elements. This has the obvious downside of needing to be able to quickly and easily access other examples from any given site at any given time. Not only this, but it’s quite sensitive to changes to website markup and changes in chrome.

Most current research focuses instead on finding a way to differentiate chrome from content with a single page example. We actually began our work by implementing a couple of algorithms described in papers. Perhaps the easiest to conceptually understand is one in which a distribution of the amount of text per block (this doesn’t have a 1:1 correspondence with HTML tags, necessarily) and then finding the clumps within that. The intuition is that the main content is likely to be larger sequential blocks of text than, say, comments or sidebars. In the end, our approach ended up being a combination of several techniques and you can find out more about it in our “dragnet” repo.


All told

Fresh Web Explorer has been in the works for a long while — perhaps longer than I’d care to admit. It has been rife with obstacles overcome (both operational and algorithmic) and lessons learned. These lessons will be carried forward in subsequent iterations and future projects. There are many changes we’d like to make given this hindsight and of course we will. Refactoring and maintaining code is often more time-consuming than writing the original!

The feedback from our community has generally been positive so far, which is encouraging. Obviously we hope this is something that will not only be useful, but also enjoyable for our customers. The less-than-positive feedback has highlighted some issues of which we are aware, most of which are high on our priorities, and leaves us raring to go to make it better.

On many points here there are many equally valid approaches. While time and space don’t permit us to present a complete picture, we’ve tried to pull out the most important parts. If there are particular questions you have about other aspects of this project or why we chose to tackle an issue one way or another, please comment! We’re happy to field any thoughts you might have on the subject 🙂


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

The Guide to US Census Data for Local SEO

Posted by OptimizePrime

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

As tax time nears in the United States, it’s hard not to wonder what exactly all that money is being spent on. Instead of getting into politics, I’d rather describe something our taxes do pay for and how it can help you plan an effective local SEO strategy.

During the daily grind, we can become accustomed to exclusive data available to us only through analytics platforms, Webmaster tools accounts, and other resources requiring a username, a password, and a mother’s maiden name. This private-access mentality makes it easy to overlook that which is freely available to everyone – including our own Census. I’m Harris Schachter (you might know me better as OptimizePrime) and I’d like to show you not what you can do for your country, but what your country can do for you.

Uncle Sam's Data

All of the information and images presented from the US Census are free to reproduce unless otherwise noted.

Using Census Data When Planning Local Strategy

During the planning phase of a local strategy, you need to identify which specific localities will serve you best, whether it be local content, social media, community engagement (geographic community & company community), on-site optimization, off-site citation building, link building, or anything else that goes into local SEO.

By using census data, these viable hyper-local markets can be identified before you even publish a single tweet. You can plan micro-campaigns designed to match each of the various cities, counties, towns, or even city blocks in your selected location. This type of analysis is particularly important when considering where to open a new brick-and-mortar establishment.

Demographic data can guide everything from the language and reading level of your content, to the methods by which it should be distributed. Distinct personas for each of the geographic components can be made to help you visualize the potential customers within them. Once armed with this information, local strategies (including everything you are going to learn from GetListed+SEOmoz) can be applied with laser precision.

You can spend hours on Census.gov exploring the myriad databases and tables. It can be overwhelming, so I’ll just demonstrate three of the most useful resources. If you’re an international reader, let this guide serve to motivate you to seek out what is available through your government.

Since it’s been cold lately, and Richmond has as enough plaid and square glasses to rival Seattle, I’ll use hipster snow boots as my example of a locally marketable product, targeting the 20-24 age groups. I’ll look for viable hyper-local markets in the Richmond area since that is where I live, and do most of my local SEO here at Dynamic Web Solutions. I’ll go through each of the three Census tools using this scenario.

1. Interactive Population Map

First is the Interactive Population Map. With this interactive map, you’re able to utilize population data at the most granular views. Currently the data is for 2010, but if you suspect a large population shift since the last data collection you can use proportions instead of volumes to make your observations. The image below shows counties, but you can view data at the following levels (from widest to most specific): national, Indian reservations, congressional districts, counties/municipalities, subdivisions, places, census tracts, census block groups, and census blocks (basically city blocks).

You can segment population data by age, race, ethnicity, and housing status, and compare these features to those of nearby locations.

How to use the Interactive Population Map:

  1. Head over to the map. Enter your location into the Find field, place your area of interest within the cross hairs, and use the on-screen controls to adjust the view and detail level.
  2. Choose any of the segmentation tabs, select a location, and click Compare.
  3. You can compare up to 4 locations to examine their demographics side by side. Once in the compare screen, you can flip between the tabs to view populations by age, race, ethnicity, or housing status for each of your chosen locations.

In my example, I chose Richmond City and the nearby counties of Henrico, Chesterfield, and Hanover. Since my hipster snow boots business isn’t concerned with any specific ethnicity, race, or housing status, I’ll flip over to age since I am primarily focused on the 20-24 age group.

From the table, I can see the city of Richmond has more people in my target demographic (20-24) than the three neighboring counties. Interesting.

2. County and Business Demographics Interactive Map

Next up is the County and Business Demographics Interactive Map (or CBD Map). This is similar to the interactive population map, but provides more robust information in addition to population, race, ethnicity, age/sex, and housing status. This map layers in three business demographics: industries, business patterns, and raw counts of establishments per industry.

Industries are the general market classifications, such as Accommodation and Food Services, Construction, Manufacturing, Health Care, Real Estate, etc. Business patterns contain data on annual payroll, and employee counts (within a location or industry).

The CBD Map is limited to the county level, but the additional information makes it an essential tool to decide where to focus your marketing efforts. This map can display the number of establishments in each industry, in each location. The capability for local competitive analysis is priceless.

How to use the CBD Map:

  1. Head over to the map. Enter your city, state or zip code into the Find field. It should automatically switch to the County view on the left (under Geographic Levels). Choose any of the top demographic tabs – anyone will do for now.
  2. Select a location and click “Compare” at the bottom of the window.
  3. In the new window that appears, click “Add Topic” to choose your areas of interest.
  4. Once you have your topic areas chosen, go back to the map and select up to 4 more locations.

Going back to our cooler-than-snow snow boots business, I chose Retail Trade from Industries, 20-24 from Age/Sex, and Total Establishments from business patterns. In addition to Richmond City, I again picked the neighboring counties of Henrico, Chesterfield, and Hanover.

The goal while using the CBD map is to identify areas with large shares of your target demographic, but low business counts for your industry. This is a good indicator of areas with many potential customers, but low competition for them. Using the table, I can do some quick math to rank the four locations along these criteria. The comparison metric to use in this instance is number of (20-24 year old) people per retail trade establishment.

Richmond has 33, Chesterfield has 19, Henrico has 14, and Hanover has 15. Richmond has the greatest number of potential customers per establishment, suggesting comparatively low competition for retail store customers. Interesting.

3. US Economic Census

The final data table is the Economic Census within the American FactFinder collection. This is the most powerful database of the three, and also the most complicated. Data contained here includes everything the interactive maps do but at a much more granular level. Specifically, industries can be further broken down by individual product or service, and how many establishments offer them in any given area. This resource also contains a search bar- a familiar face in an unfamiliar environment.

The FactFinder database is rather complex, so I’ll dive right into how to use it. Because this one is so detailed, the accessibility and recency of data is highly variable, so you may have more or less than what I’ve found.

How to use the Economic Census:

Step 1. Visit the American FactFinder database. Don’t be tempted to use the search bar just yet.

Step 2: Program

  1. Choose the Topics tab.
  2. Select “Economic Census.”

Step 3: Location

  1. Choose your location by selecting the Geographics tab. Use the “Geographic type” dropdown, and pick your level of detail.
  2. Select State from the next dropdown. I’ve selected County and Virginia respectively.
  3. Pick your actual locations from the next dropdown.
  4. Use the “Add to your selections” button to select your criteria.
  5. You’ll see the chosen options in the left sidebar under “Your Selections.”

Step 4: Industry

  1. Select your industry by finding the North American Industrial Classification System (NAICS) number under Industry Codes.
  2. Do this by using the search bar to find your business. This one is much more detailed than the industry selections of the Interactive Map, so try a few queries until you get a solid match.
  3. For my example, I first searched for “boots” with no luck. I then tried “shoes” and found the codes 4482 for “shoe stores.” Check off the applicable industry code and click “Add.”
  4. Close this window to reveal your search results.

Step 5: Database Results

  1. First, review your selections in the left sidebar.
  2. Check off the source most applicable to you, and make sure it is the most recent version.
  3. Select View.

Step 6: Data!

Finally, we’ve got the goods. First of all, I should warn you not to use your browser’s back button – all of your selections will be lost and the process starts over again. Instead, take note of the “Return to Advanced Search button.” Use this if you want to go back to the search options.

Check out the data columns. Specifically, the most important are: geographic area, number of establishments, and sales.

Data Collected From the Economic Census

Due to the sheer number of search options, every research endeavor will be different. My results had three of the four locations in a data table, and the most recent data is from 2007. Immediately, we can see Chesterfield had 31 shoe stores, Henrico had 47, and the city of Richmond had 32. This is another good indicator for the city of Richmond, since it shows a relatively low number of shoe stores, and we already know it has the greatest volume of our target demographic.

Now let’s look at the sales column for the total sales each location’s shoe stores generated. Using our county population data from earlier, we can calculate how much the average person spends on shoes (customer value) in each location. Keep in mind the population numbers are from 2010 while the sales figures here are from 2007, but hey, we’re just making estimations.

Sales, Population, and Businesses

Divide the sales figures by the total county population. I found the average person in Richmond to be worth $101, in Henrico $174, and in Chesterfield $85. Of the three locations, the average Henrico resident spends the most on shoes. But what about our target demographics of 20-24 year olds?

For this calculation, we’ll apply the percent of target population within the total population, and apply it to sales for each location. Although this assumes the different age groups purchase shoes at the same rate, it will give us an estimated percent of sales contributed by our target demographic.

From the first analysis, we found the 20-24 year old group made up 13% of Richmond’s population, 6% of Henrico’s population, and 6% of Chesterfield’s population. After applying these percentages to total shoe sales, we find our target demographic spending $2.7M in Richmond, $3.2M in Henrico, and $1.6M in Chesterfield.

At this point, it might seem wiser to go after Henrico County, since the target demographic spends the most on shoes there, in total. Given the sheer amount of money spent on shoes in that county, I might consider a separate strategy to attract Henrico’s business.

However, keep in mind Henrico has 47 shoes stores, while Richmond only has 32, and Chesterfield has 31. Taking this competitive information into account, we can compute the sales generated by the target demographic, for each store in each location. The data translates into $84k in sales per Richmond shoe store, $68k per Henrico store, and $52k per Chesterfield store. This suggests individual shoe stores in Richmond generate more sales from our target demographic than they do in the other two nearby counties. In-ter-est-ing.

Analysis Results

After three rounds of analysis, Richmond looks like the ideal place to set up a shoe store (especially one that sells supa-fly snow boots to young adults).

So, what have we learned from all this? From the data available, I’ve found:

  1. Richmond has a greater volume of people in the target demographic than neighboring counties.
  2. Richmond has more potential customers within the target demographic per retail store than neighboring counties.
  3. A shoe store in Richmond generates more sales from the target demographic than a shoe store in a neighboring county.

Apply the Insights

Now that you’ve identified the most viable business locations, it’s time to incorporate these findings into local strategies. Go after these promising localities by gaining relevancy and ranking through a variety of methods, including:

  • Hosting events in the chosen location to establish an audience.
  • Building inbound links from sites which rank well in/for the target area (and be sure to diversify these links).
  • Doing competitive analysis for the most visible websites in the locations uncovered by the analysis. Go through their backlink profiles for relevant links and try to attain them too.
  • Encouraging customers to leave reviews, with specific attention to people in the targeted areas. Include the reviewer’s location in the review itself to gain more trust and influence among the potential customers.
  • Engaging with prospects in the identified locations through social media. Find them through various tools like Followerwonk’s Twitter bio search and get the conversation going.
  • Creating content specific to the viable locations. Dedicate a section of your blog for things to do and see in the area, why you like doing business there, interview citizens, government officials, or well known residents. Publishing content about the area can gain you exposure well before your visitors are even looking for your products or services. Once aware of your business, they’ll likely keep you in mind at some point down the road.
  • Optimizing your content with traditional on-site methodologies for the locations uncovered in the analysis (but don’t overdo it).
  • Developing press releases specifically for the target locations and distributing them to online sources like chambers of commerce, colleges and universities, local newspapers, free publications, etc.
  • Considering mobile users and making sure your site delivers a satisfactory experience for people in the targeted areas. Local and mobile go hand in hand.
  • Finally (and possibly the most effective in the long-run) is to consider opening a physical store within the location.
    • Claim all profiles and listings from data aggregators using consistent NAP citations.
    • Use consistent NAP citations on the website itself.
    • Consider including the name of the location in a brand name.
    • Utilize rich snippets to take full advantage of your new location in the SERP
    • Complete your Google+ Local page with proper categorization, and mentions of the location within the business description.
    • Modify social media profiles to include this new location.

I encourage you to explore Census.gov, and subscribe to the Census RSS feed to make sure you don’t miss any of their interesting publications. They recently released mobile apps for the true geeks out there (I recommend the iPad app “America’s Economy”). Also be sure to check out the data visualization gallery to learn something new or just to get some data vis inspiration.

So the next time a Census taker knocks on your door, answer it! You never know what type of product or business you’ll be working with in the future, but chances are good that you’ll have data for it.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Testing: Moving Our Industry Forward

Posted by Geoff Kenyon

Over the past few years, our industry has changed dramatically. We have seen some of the biggest removals of spam from the search results, and a growing number of people are starting to focus on building quality links rather than just building links. Companies are starting to really invest in content, and sites are building better category pages and are improving their product descriptions. Content marketing is now a “thing.” A big thing.

However, while all these changes are great, it seems as if we have stopped testing in order to adopt new ideas. While I know there are many exceptions to this generalization, I see the trend too often. Most SEOs work off of best practices, and while this is good, who can argue with having good page titles, headlines, copy, having crawlable paths to content, and building good links? We need to continue to refine these portions for the best results.

A great example of this sort of refinement is ranking factors research. A few years back, SEOmoz did some testing around H1’s vs. H2’s and said that the H1 doesn’t provide an added benefit. Whether or not you agree with this idea, this example shows how factors can (potentially) change over time.

Over the last few years, Google has rolled updates that have had significant impact on search: the canonical tag, href lang, and rich snippets/support for schema, just to name a few. While there have been tests on these updates, we need to continue to test and keep our knowledge current. Google is continually testing new things, and we need to rely on testing to keep up. For example, back when Search Quality Updates were a thing, Google would “share” names and descriptions of updates and tests to the search engine algorithm. Frequently, there were 30-40 updates a month that they were rolling out or testing.

As you already know, this means there is huge potential for a high number of changes to the algorithm. We need to be testing (new things and old) to make sure we’re staying current. 

Share your results

In addition to all the updates we are aware of, there is a lot that Google isn’t telling us. This is what makes testing and sharing even more important. Barry Schwartz pointed out on Search Engine Round Table that Google left some important items out of their August/September update. Further, there are updates that Google will deny. If it weren’t for people carefully watching and analyzing the SERPs and then sharing their tools (like Dr. Pete’s MozCast), we would probably be largely unaware of much activity.

If we don’t share our observations after testing, we face two problems. First, we can’t confirm and verify what we see (and believe), and second, we can’t move our industry forward. While the SEO industry is evolving and SEO is gaining more widespread acceptance, it is still seen by many as a mystery and a dark art. By sharing our tests and results, we educate the industry as a whole and raise not only the bar, along with our collective reputation. If we can retire bad practices and tactics that are of low-value, we bring more credibility to the industry.

Share your failures

We all want to conduct awesome, break through tests; it’s really exciting to learn new stuff. However, we have a tendency to only share our successes, rather than our failures. No one really wants to share failure, and it’s natural to want to “save face” when your test doesn’t go according to plan. But the fact remains that if there is a test that “fails,” it isn’t a failure.

There is so much we can learn from a test that doesn’t go as expected (and sometimes we don’t know what will happen). Further, sharing the “failed” results can lead to more ideas. Last week, I posted about 302’s passing link equity. I began this test because my first test failed. I was trying to see if a page that was 302’d to another page would retain its rankings. It didn’t work, and the page I was testing dropped out of the SERPs, but it was replaced with the page on the receiving end of the redirect. This result led me to test them compared to 301s. On top of that, there was a really good comment from Kane Jamison about further tests to run to gain a better understanding. If I hadn’t shared my “failed” results, I would have never learned from my mistakes and gained knowledge where I least expected it.

Below are a few other tests I’ve run over the years that ended up with “failed” results. I hope you can learn as much from them as I did.

Keyword research with Adwords

For this test, I needed to provide a comparison of the head vs. long term search volume related to tires. I had heard, at one point, that you could use Adwords impression data for keyword research. I decided to give it a try. I whipped up a rock solid domain and set up a broad match Adwords campaign. 

Tires^4!

(People even signed up!)

It didn’t work. While we got a lot of impressions, we couldn’t access the data. There was a category called “Other Search Terms” that contained all the impression data we wanted.

Lesson learned: Adwords impression data isn’t great for keyword discovery, at least in the capacity that we tried to use it.

Keywords in H2 tags

A few years back, I wanted to see if there was any advantage to placing an H2 tag around keywords in the content. The keywords were styled to look the same as the normal text; the only difference was the H2 tag. I rolled this out on about 10,000 pages and watched the results for a few months.

What did I find? Nothing. Exactly the same as the control group. Still, lesson learned. 

Link title element

This failed test is actually one of Paddy Moogan’s. He wanted to test the link title element to see if that passed any value. He set the title to ‘k34343fkadljn3lj’ and then checked to see if the site improved its ranking for that term.

There was no improvement.

Later, he found out that Craig’s site was actually down, so it probably wouldn’t be ranking regardless of how it was linked to. This brings up a really important point in testing: double check everything, even the small points. It can be really frustrating to run a test and then realize it was all for nothing. 

Your “failed” tests

We’ve all been there, so it’s time to share your story. What have you recently tested that didn’t turn out exactly how you planned? If we can all learn from the mistakes of others, we’re in a better place. Drop a line in the comments and let us all know!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

6 Ways to Use Fresh Links & Mentions to Improve Your Marketing Efforts – Whiteboard Friday

Posted by randfish

This week, we announced the release of our newest tool, Fresh Web Explorer. We’re so excited to give marketers incredibly recent data in a tool to keep track of their mentions and links in a scalable way.

In today’s Whiteboard Friday, Rand walks us through improving our marketing through fresh links and mentions, and he explains how you can use Fresh Web Explorer to achieve the best results. 

Excited about Fresh Web Explorer? Have questions you’d like answered? Leave your thoughts in the comments below!



Video Transcription

“Howdy SEOmoz fans, and welcome to another edition of Whiteboard Friday. This week, as you may know, we’ve been very excited to release Fresh Web Explorer. It’s one of our latest tools. We’ve been working on it for a long time. A lot of work and effort goes into that project. Huge congrats and thank you to Dan Lecocq and Tamara Hubble and to the entire team who has been working on that project. Kelsey and Carin and everyone.

So I wanted to take some time and talk through the value that marketers can get from Fresh Web Explorer and not just from Fresh Web Explorer, because I realize it’s one in a set of tools, but also from things like doing regular Google 24 hour searches to look for brand mentions and links, using other tools like Radian6 or an uberVU, which is inside empowering, Raven Tools fresh links and fresh mentions section. You can do a lot of these things with any of those tools.

I’m going to focus on Fresh Web Explorer for this part, but you can extrapolate out some ways to use this stuff in other tools too.

So number one, one of the most obvious ones is trying to find opportunities for your brand, for your site to get coverage and press, and that often will lead to links that can help with SEO, lead to co-occurrence citations of your brand name next to industry terms, which can help with SEO, could help with local for those of you who are doing local and have local businesses mentioned. It certainly can help with branding and brand growth, and a lot of times helps with direct traffic too.

So, when I perform a search inside Fresh Web Explorer, I’m getting a list of the URLs and the domains that they’re on, along with a feed authority score, and I can see then that I can get all sorts of information. I can plug in my competitors and see links, who’s pointing to my competitor’s sites. Perhaps those are opportunities for me to get a press mention or a link. I can see links to industry sites. So, for example, it may not be a competitor, but anyone who’s doing coverage in my space is probably interesting for me to potentially reach out to build a relationship with.

Mentions of industry terms. If I find, you know whatever it is, print magazines that are on the web, or blogs, or forums, or news sites, feeds that are coming from places that are indicative of, wow, they’re talking about a lot of things that are relevant to my industry, relevant to my brand and to what our company’s doing, that’s probably an opportunity for a potential press mention.

Mentions of competitors brands. If a press outlet is covering, or a blog or whoever, is covering one of your competitors, chances are good that you have an opportunity to get coverage from that source as well, particularly if they try to be editorially balanced.

Mentions of industry brands. It could be that you’re in an industry that, and you’re not necessarily competitive with someone, but you want to find those people who are relevant to your brand. So for example, for us this could include things like a brand like Gnip or a brand like HubSpot. We’re not competitive with these brands, SEOmoz is not. But they are industry brands and places who cover Gnip and HubSpot may indeed cover Moz as well.

Number two, I can find some content opportunities, opportunities to create content based on what I’m discovering from Fresh Web Explorer. So I plugged in “HTC One,” the new phone from HTC, and I’m looking at maybe I can curate and aggregate some of the best of the content that’s been produced around the HTC One. I can aggregate reviews, get really interesting information about what’s coming out about the phone. I might even be able to discover information to share with my audience.

So, for example, we focus on SEO topics and on local topics. If we expect the HTC One to be big and we want to cover several different phones and how that’s affecting the mobile search space, we can look at their default search providers, what sorts of things they do in terms of voice search versus web search, whether they have special contracts and deals with any providers to be tracking that data and who that might be going to, all those kinds of things, and we can relate it back to what we’re doing in our industry.

You can also Fresh Web Explorer to find the best time to share this type of information. So, for example, the HTC One comes out and maybe you’re working for a mobile review site and you’re like, “Oh, you know what? This has already been covered to death. Let’s do something else this week, or let’s cover some other stuff. Maybe we’ll hit up the HTC One.” Or, “Boy, you know what? This is just starting to get hot. Now is a great time to share. We can get on Techmeme and get the link from there. We can be mentioned in some of the other press coverages. We still have a chance, a shot to cover this new technology, new trend early on in its life cycle.”

Number three, we can track fresh brand and link growth versus our competitors. So a lot of the time one of the things that marketers are asking themselves, especially in the inbound field is, “How am I doing against my competition?” So I might be Fitbit, which is a Foundry cousin of ours. They’re also funded by Foundry Group. They compete with the Nike FuelBand, and they might be curious about who’s getting more press this week. We released a new version of the Fitbit, or we’re about to, or whatever it is, and let’s see how we’re doing against the Nike FuelBand. Then when we have our press release, our launch, let’s see how that compares to the coverage we’re getting. Where are they getting covered that we are not getting covered? Where are we getting coverage where they are not?

We can then use things like the CSV Export feature, which is in the top right-hand corner of the Fresh Web Explorer, and we can look at CSV Export to do things like, “Oh, I want to filter out these types of sites. Or I only want a report on the high feed authority sites versus the low feed authority one. So I want to see only the places where my coverage is high.”

A note on feed authority though. Be very careful here because remember that a great page on a great site might be discovered through a low quality feed. It could be that a relatively junky feed is linking to some high quality stuff. We’ll discover it and report on the feed authority of the source where we discovered it. So you may want to try using metrics like page authority and domain authority to figure out where are you being mentioned and is that a high quality site, not just feed authority.

All right. Number four. Find fresh sources that link to or mention two or more of your competitors, but don’t mention you. Now, this has been a classic tool. We’ve had a tool in our library at Moz, which is similar to SEO Book’s HubFinder. Ours is called the Link Intersect tool, and what you can do here is you can plug in something like some ice cream brands and see how it writes. So “Full Tilt” and “Molly Moons” ice cream, and I actually want to put quotes around those brand names so that I can get mentions every time someone mentions the Moon and the name Molly that would pop in there, that wouldn’t be ideal, minus D’Ambrosio, which is the best Seattle ice cream shop obviously. It’s a gelateria. It’s fantastic. Side note, it’s possible that maybe owned by my cousin-in-law, but shh, let’s not tell anybody.

Okay, and then if I’m Marco over at D’Ambrosio Gelato, I can see where are Full Tilt and Molly Moons getting mentioned that aren’t mentioning me. If it’s, “Hey, there was an article in The Stranger about ice cream and they didn’t cover us.” And, “Hey the Capitol Hill blog didn’t cover us.” Maybe they don’t know that we also have a Capitol Hill location. We should get in there and talk to those folks. We should mention, maybe leave a comment, maybe just tweet at the author of the post, whatever it is and tell them, “Hey, next time you cover ice cream, you should also write about us.”

Number five. Compare sources coverage. So this is actually a bit of a teaser, and I apologize for that. So the operator site colon will not be available at lunch. So when you’re watching this video, you probably can’t use the site colon operator to see different sources and to run a search like the CRO site colon SEOmoz. However, it will be coming soon.

When it is, you’ll be able to compare, hey is SEOmoz or is HubSpot more active in covering the CRO topic? Are there different sources out there that maybe don’t have coverage of a topic and I could go and pitch them for a guest post? I could find those content opportunities. I could know if a topic is saturated or if it hasn’t been covered enough. Maybe I find sites or blogs that might be interested in covering a topic that I would like them to write about. I can see who’s covered and who hasn’t using this site colon operator to figure out the source and the level of coverage that they might have or not.

The last one, number six, is really about reporting. Fresh Web Explorer is going to show you these great sort of trends about how is a particular term or phrase or link doing, links to a site, mentions of a brand name, mentions of a phrase or an industry term, whatever it is. So I can plug in things like my brand, SD, which is our link operator for just seeing links to anything on the sub-domain. I can plug in my sub-domain, and then I can see, here’s how that’s gone over the past 7 days or 30 days. I can screen shot that and put it in a report. I can download using the export functionality. I can download the CSV and then filter or scrub.

A lot of times, for example, PR companies, companies that help you with your press will do this type of work. They’ll assemble this kind of reporting. In fact, at Moz we use a firm called Barokas here in Seattle. Every week they send us a report of here are all the places that you were mentioned, and here are places that mentioned industry terms and that kind of stuff, which is really nice, but you’re oftentimes paying a lot of money to get that reporting. You can actually do that yourself if you don’t have a PR company that you’re already using for this type of work. Of course, if you are a PR company, this might be an option for you to do that type of reporting.

These six, they are only scratching the surface of what you can do with Fresh Web Explorer, and I don’t doubt that I haven’t thought of hundreds of uses yet for the data that’s inside Fresh Web Explorer. I really look forward to seeing some cool creative uses from you guys out there, and I hope that you are enjoying the product. If you would like, please give us feedback. I know the team would love to hear from you on this, and they’re constantly working and iterating and updating and adding in things like the site colon operator. So very cool.

Thank you very much, and we will join you again next week for another edition of Whiteboard Friday. Take care.”

 

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

The Google AdWords Landscape (Infographic)

Posted by Dr. Pete

We tend to think of AdWords as the domain of PPC specialists, but it’s becoming clearer and clearer that Google’s SERP advertising has a huge impact on the position and effectiveness of organic results. So, I wanted to ask a simple question – what does the AdWords “landscape” actually look like in 2013? In other words, where are the ads, how many are there, what combinations occur in the “wild”, and how often do they show up? I’ll dive into some details below, but the answer looks something like this (click the image for a full-sized view)…

The Google AdWords Landscape

Embed this image:

The Methodology

We collected data from 10,000 page-one Google SERPs via Google.com on a weekday during normal business hours. Personalization was turned off, and the crawler emulated a logged-out Chrome browser. We parsed the major ad blocks (which have consistent DOM markers) and the links within those blocks. Keywords and categories were pulled from AdWords’ keyword tools, with 500 keywords coming from each of 20 categories.

A Few Caveats

Naturally, keywords pulled from the AdWords’ research tools are more likely to have commercial intent than the “average” keyword (if such a thing exists), so these percentages may not be indicative of the entire world of search queries. We did run these numbers at other time periods and on other days, and the results were fairly consistent.

These statistics were computed by unique queries, not by query volume. The results seem to be very similar, though. For example, we found ads on 85.2% of the queries crawled – if we weight those queries by Google’s “global” volume, we get ad penetration of 84.5%. The correlation between the presence of ads and query volume was virtually non-existent (r=-0.018). The correlation between the presence of ads and Google’s competition metric was high (r=0.874). This is probably not surprising, since “competition” is essentially defined by how many advertisers are vying for any given query.

The Changing Landscape

This is only a snapshot of a rapidly changing picture. For example, paid shopping results are still relatively new, but we discovered them on almost 20% of the queries we crawled. Unlike the traditional AdWords blocks, paid shopping can appear in multiple positions and forms, including the larger, upper-right format previously reserved for Knowledge Graph.

Even traditional top ads are evolving, with ads showing extensions, expanded site-links, lead generation forms, etc.  Expect Google to experiment with new formats on the top and right, and to blend advertising into the Knowledge Graph area to increase CTR. This changing landscape will impact the efforts of people in both paid and organic search, so keep your eyes open, and don’t assume that this is something only the PPC team has to worry about.

I just wanted to thank Dawn Shepard for all her help putting together the infographic. I know it was probably a bit painful to hear “Make it kind of boring!”


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →