Archives for 

seo

A Checklist for Native Advertising: How to Comply with the FTC’s New Rules

Posted by willcritchlow

The FTC recently published their updated rules (and more accessible “guidance”) on what constitutes a “misleading” native advert [PDF]. I’ve read them. I only fell asleep twice. Then, near the end, a couple of bombshells.

But first, the background.

Native ads and the FTC

For those who haven’t been following the trends closely, native advertising is a form of digital advertising whereby adverts are included “in-stream,” interspersed with regular editorial content.

On social platforms, this takes the form of “promoted” posts — including stories or videos in your Facebook stream or tweets in your Twitter stream from brands you didn’t explicitly follow or “Like.” See, for example, this Coursera ad in my Facebook stream:

Native ads are particularly interesting on mobile, where the smaller screens and personal nature tend to make anything that isn’t in-stream intrusive or irrelevant.

For publishers, native advertising looks more like brand content presented as whole pages rather than as banners or advertising around the edges of the regular content. It can take the form of creative content promoting the brand, brand-funded “advertorial,” or anything in between. See, for example, this well-labelled advertorial on Autocar:

You might notice that this is actually a lot like offline magazine advertising — where “whole page takeovers” are common, and presented in the “stream” of pages that you turn as you browse. And in a similar way to the digital version, they can be glossy creative or advertorial that looks more like editorial content.

The big way that the digital world differs, however, is in the way that you find content. Most people turn pages of a magazine sequentially, whereas a lot of visitors to many web pages come from search engines, social media, email, etc. — essentially, anywhere but the “previous page” on the website (whatever that would even mean).

It’s this difference that has led the FTC to add additional regulations on top of the usual ones banning misleading advertising — the new rules are designed to prevent consumers from being misled if they fail to realize that a particular piece is sponsored.

For the most part, if you understood the spirit of the previous rules, the new rules will come as no surprise — and the newest parts mainly relate to ensuring that consumers are fully aware why they are seeing a particular advert (i.e. because the advertiser paid for its inclusion) and they are clear on the difference between the advert and the editorial / unpaid content on the publisher’s site.

At a high level, it seems very reasonable to me — the FTC wants to see clear disclosures, and will assess confusion and harm in the context of consumers’ expectations, the ways in which confusion would cause them to behave differently, and will take into account the rest of the publisher’s site:

The Commission will find an advertisement deceptive if the ad misleads reasonable consumers as to its nature or source, including that a party other than the sponsoring advertiser is its source. Misleading representations of this kind are likely to affect consumers’ decisions or conduct regarding the advertised product or the advertisement, including by causing consumers to give greater credence to advertising claims or to interact with advertising content with which they otherwise would not have interacted.

And, crucially:

The FTC considers misleadingly formatted ads to be deceptive regardless of whether the underlying product claims that are conveyed to consumers are truthful.

They summarize the position as:

From the FTC’s perspective, the watchword is transparency. An advertisement or promotional message shouldn’t suggest or imply to consumers that it’s anything other than an ad.

Subjectivity

I was interested to see the FTC say that:

Some native ads may be so clearly commercial in nature that they are unlikely to mislead consumers even without a specific disclosure.

While I think this would be risky to rely upon without specific precedents, it nicely shows more of the FTC’s intent, which seems very reasonable throughout this briefing.

Unfortunately, the subjectiveness cuts both ways, as another section says:

“…the format of [an] advertisement may so exactly duplicate a news or feature article as to render the caption ‘ADVERTISEMENT’ meaningless and incapable of curing the deception.”

It’s not easy to turn this into actionable advice, and I think it’s most useful as a warning that the whole thing is very subjective, and there is a lot of leeway to take action if the spirit of the regulations is breached.

The controversial and unexpected parts

It wasn’t until quite far through the document that I came to pieces that I found surprising. The warning bells started sounding for me when I saw them start drawing on the (very sensible) general principle that brands shouldn’t be able to open the door using misleading or deceptive practices (even if they subsequently come clean). Last year, the FTC took action against this offline advert under these rules:

Ruling that the price advertised in big red font was a “deceptive door opener” because:

To get the advertised deal, buyers needed to qualify for a full house of separate rebate offers. In other words, they had to be active duty members of the military and had to be recent college grads and had to trade in a car.

Bringing it back to web advertising, the Commission says:

Under FTC law, advertisers cannot use “deceptive door openers” to induce consumers to view advertising content. Thus, advertisers are responsible for ensuring that native ads are identifiable as advertising before consumers arrive at the main advertising page. [Emphasis mine]

If you understand how the web works, and how people find content on publishers’ sites these days, this will probably be starting to seem at odds with the way a lot of native advertising works right now. And your instincts are absolutely right. The Commission is going exactly where you think they might be. They title this set of new rules “Disclosures should remain when native ads are republished by others.

Social media

In the guidelines document, the Commission includes a whole bunch of examples of infringing and non-infringing behavior. Example 15 in their list is:

The … article published in Fitness Life, “Running Gear Up: Mistakes to Avoid,” … includes buttons so that readers can post a link to the article from their personal social media streams. When posted, the link appears in a format that closely resembles the format of links for regular Fitness Life articles posted to social media. In this situation, the ad’s format would likely mislead consumers to believe the ad is a regular article published in Fitness Life. Advertisers should ensure that the format of any link for posting in social media does not mislead consumers about its commercial nature. [Emphasis mine]

Now, it’s obviously really hard to ensure anything about how people post your content to social media. In the extreme case, where a user uses a URL shortener, doesn’t use your title or your Open Graph information, and writes their own caption, there could be literally nothing in the social media post that is in the control of the advertiser or the publisher. Reading this within the context of the reasonableness of the rest of the FTC advice, however, I believe that this will boil down to flagging commercial content in the main places that show up in social posts.

Organic search

Controlling the people who share your content on social media is one challenge, but the FTC also comments on the need to control the robots that display your content in organic search results, saying:

The advertiser should ensure that any link or other visual elements, for example, webpage snippets, images, or graphics, intended to appear in non-paid search results effectively disclose its commercial nature.

…and they also clarify that this includes in the URL:

URL links … should include a disclosure at the beginning of the native ad’s URL.

Very sensibly, it’s not just advertisers who need to ensure that they abide by the rules, but I find it very interesting that the one party noticeably absent from the FTC’s list is the publishers:

In appropriate circumstances, the FTC has taken action against other parties who helped create deceptive advertising content — for example, ad agencies and operators of affiliate advertising networks.

Historically, the FTC has maintained that it has the authority to regulate media companies over issues relating to misleading advertising, but has generally focused on the advertisers; for example, when talking about taking special measures to target a glut of misleading weight loss adverts:

“…the FTC said it does not plan to pursue media outlets directly, ‘but instead wants to continue to work with them to identify and reject ads with obviously bogus claims’ using voluntary guidelines.”

In the case of native advertising, I am very surprised not to see more of the rules and guidelines targeted at publishers. Many of the new rules refer to platform and technical considerations, and elements of the publishers’ CMS systems which are likely to be system-wide and largely outside the control of the individual advertisers. Looking at well-implemented native ads released prior to these new guidelines (like this one from the Telegraph which is clearly and prominently disclosed), we see that major publishers have not been routinely including disclosures in the URL up to now.

In addition, individual native ads could remain live and ranking in organic search for a long time, yet the publisher could undergo redesigns / platform changes that change things like URL structures. I doubt we’d see this pinned on individual advertisers, but it is an interesting wrinkle.

Checklist for compliant native advertising

Clearly I’m not a lawyer, and this isn’t legal advice, but from my reading of the new rules, advertisers should already have been expecting to:

  • Ensure you comply with all the normal rules to ensure that your advert is not misleading in content including being sure to avoid “deceptive door openers”
  • “Clearly and prominently disclose the paid nature” of native adverts — it is safest to use language like “Advertisement” or “Paid Advertisement” — and this guide has detailed guidance.

In addition, following the release of these new rules, advertisers should also work through this checklist:

  • The URL includes a disclosure near the beginning (e.g. example.com/advertisement/<slug>)
  • The title includes a disclosure near the beginning
  • The meta description includes a disclosure
  • All structured data on the page intended to appear in social sharing contains disclosures:
    • Open Graph data
    • Twitter cards
    • Social sharing buttons’ pre-filled sharing messages
    • Given the constraints on space in tweets especially, I would suggest this could be shorter than in other places — a simple [ad] probably suffices
  • Links to the native advert from elsewhere on the publisher’s site includes disclosures in both links and images
  • Embedded media have disclosures included within them — for example, via video and image overlays
  • Search engines can crawl the native advert (if it’s blocked in robots.txt, the title tag and meta description disclosures wouldn’t show in organic search)
  • Outbound links are nofollow (this is a Google requirement rather than an FTC one, but it seemed sensible to include it here)

And it seems sensible to me that advertisers would require publishers to commit to an ongoing obligation to ensure that their CMS / sharing buttons / site structure maintains compliance with the FTC regulations for as long as the native advertising remains live.

Conclusion

There are elements of the new requirements that seem onerous, particularly the disclosure early in the URL which could be a technically complicated change depending on the publisher’s platform. I have looked around at a bunch of major publisher platforms, and I haven’t found a high-profile example that does disclose all advertorials in their URLs.

It also seems likely that full compliance with all these requirements will reduce the effectiveness of native advertising by limiting its distribution in search and social.

On balance, however, the FTC’s approach is based in sound principles of avoiding misleading situations. Knowing how little people actually read, and how blind we all are to banners, I’m inclined to agree that this kind of approach where the commercial relationship is disclosed at every turn probably is the only way to avoid wide-scale misunderstandings by users.

I’d be very interested to hear others’ thoughts. In particular, I’d love to hear from:

  • Brands: Whether this makes you less likely to invest in native advertising
  • Publishers: Whether these technical changes are likely to be onerous and if it feels like a threat to selling native ads

I look forward to hearing your thoughts in the comments.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Can SEOs Stop Worrying About Keywords and Just Focus on Topics? – Whiteboard Friday

Posted by randfish

Should you ditch keyword targeting entirely? There’s been a lot of discussion around the idea of focusing on broad topics and concepts to satisfy searcher intent, but it’s a big step to take and could potentially hurt your rankings. In today’s Whiteboard Friday, Rand discusses old-school keyword targeting and new-school concept targeting, outlining a plan of action you can follow to get the best of both worlds.

Can We Abandon Keyword Research & On-Page Targeting in Favor of a Broader Topic/Concept Focus in Our SEO Efforts?

Click on the whiteboard image above to open a high resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to talk about a topic that I’ve been seeing coming up in the SEO world for probably a good 6 to 12 months now. I think ever since Hummingbird came out, there has been a little bit of discussion. Then, over the last year, it’s really picked up around this idea that, “Hey, maybe we shouldn’t be optimizing for researching and targeting keywords or keyword phrases anymore. Maybe we should be going more towards topics and ideas and broad concept.”

I think there’s some merit to the idea, and then there are folks who are taking it way too far, moving away from keywords and actually losing and costing themselves so much search opportunity and search engine traffic. So I’m going to try and describe these two approaches today, kind of the old-school world and this very new-school world of concept and topic-based targeting, and then describe maybe a third way to combine them and improve on both models.

Classic keyword research & on-page targeting

In our classic keyword research, on-page targeting model, we sort of have our SEO going, “Yeah. Which one of these should I target?”

He’s thinking about like best times to fly. He’s writing a travel website, “Best Times to Fly,” and there’s a bunch of keywords. He’s checking the volume and maybe some other metrics around “best flight times,” “best days to fly,” “cheapest days to fly,” “least crowded flights,” “optimal flight dates,” “busiest days to fly.” Okay, a bunch of different keywords.

So, maybe our SEO friend here is thinking, “All right. She’s going to maybe go make a page for each of these keywords.” Maybe not all of them at first. But she’s going to decide, “Hey, you know what? I’m going after ‘optimal flight dates,’ ‘lowest airport traffic days,’ and ‘cheapest days to fly.’ I’m going to make three different pages. Yeah, the content is really similar. It’s serving a very similar purpose. But that doesn’t matter. I want to have the best possible keyword targeting that I can for each of these individual ones.”

“So maybe I can’t invest as much effort in the content and the research into it, because I have to make these three different pages. But you know what? I’ll knock out these three. I’ll do the rest of them, and then I’ll iterate and add some more keywords.”

That’s pretty old-school SEO, very, very classic model.

New school topic- & concept-based targeting

Newer school, a little bit of this concept and topic targeting, we get into this world where folks go, “You know what? I’m going to think bigger than keywords.”

“I’m going to kind of ignore keywords. I don’t need to worry about them. I don’t need to think about them. Whatever the volumes are, they are. If I do a good job of targeting searchers’ intent and concepts, Google will do a good job recognizing my content and figuring out the keywords that it maps to. I don’t have to stress about that. So instead, I’m going to think about I want to help people who need to choose the right days to buy flights.”

“So I’m thinking about days of the week, and maybe I’ll do some brainstorming and a bunch of user research. Maybe I’ll use some topic association tools to try and broaden my perspective on what those intents could be. So days of the week, the right months, the airline differences, maybe airport by airport differences, best weeks. Maybe I want to think about it by different country, price versus flexibility, when can people use miles, free miles to fly versus when can’t they.”

“All right. Now, I’ve come up with this, the ultimate guide to smart flight planning. I’ve got great content on there. I have this graph where you can actually select a different country or different airline and see the dates or the weeks of the year, or the days of the week when you can get cheapest flights. This is just an awesome, awesome piece of content, and it serves a lot of these needs really nicely.” It’s not going to rank for crap.

I don’t mean to be rude. It’s not the case that Google can never map this to these types of keywords. But if a lot of people are searching for “best days of the week to fly” and you have “The Ultimate Guide to Smart Flight Planning,” you might do a phenomenal job of helping people with that search intent. Google is not going to do a great job of ranking you for that phrase, and it’s not Google’s fault entirely. A lot of this has to do with how the Web talks about content.

A great piece of content like this comes out. Maybe lots of blogs pick it up. News sites pick it up. You write about it. People are linking to it. How are they describing it? Well, they’re describing it as a guide to smart flight planning. So those are the terms and phrases people associate with it, which are not the same terms and phrases that someone would associate with an equally good guide that leveraged the keywords intelligently.

A smarter hybrid

So my recommendation is to combine these two things. In a smart combination of these techniques, we can get great results on both sides of the aisle. Great concept and topic modeling that can serve a bunch of different searcher needs and target many different keywords in a given searcher intent model, and we can do it in a way that targets keywords intelligently in our titles, in our headlines, our sub-headlines, the content on the page so that we can actually get the searcher volume and rank for the keywords that send us traffic on an ongoing basis.

So I take my keyword research ideas and my tool results from all the exercises I did over here. I take my topic and concept brainstorm, maybe some of my topic tool results, my user research results. I take these and put them together in a list of concepts and needs that our content is going to answer grouped by combinable keyword targets — I’ll show you what I mean — with the right metrics.

So I might say my keyword groups are there’s one intent around “best days of the week,” and then there’s another intent around “best times of the year.” Yes, there’s overlap between them. There might be people who are looking for kind of both at the same time. But they actually are pretty separate in their intent. “Best days of the week,” that’s really someone who knows that they’re going to fly at some point and they want to know, “Should I be booking on a Tuesday, Wednesday, Thursday, or a Monday, or a Sunday?”

Then, there’s “best times of the year,” someone who’s a little more flexible with their travel planning, and they’re trying to think maybe a year ahead, “Should I buy in the spring, the fall, the summer? What’s the time to go here?”

So you know what? We’re going to take all the keyword phrases that we discovered over here. We’re going to group them by these concept intents. Like “best days of the week” could include the keywords “best days of the week to fly,” “optimal day of week to fly,” “weekday versus weekend best for flights,” “cheapest day of the week to fly.”

“Best times of the year,” that keyword group could include words and phrases like “best weeks of the year to fly,” “cheapest travel weeks,” “lowest cost months to fly,” “off-season flight dates,” “optimal dates to book flights.”

These aren’t just keyword matches. They’re concept and topic matches, but taken to the keyword level so that we actually know things like the volume, the difficulty, the click-through rate opportunity for these, the importance that they may have or the conversion rate that we think they’re going to have.

Then, we can group these together and decide, “Hey, you know what? The volume for all of these is higher. But these ones are more important to us. They have lower difficulty. Maybe they have higher click-through rate opportunity. So we’re going to target ‘best times of the year.’ That’s going to be the content we create. Now, I’m going to wrap my keywords together into ‘the best weeks and months to book flights in 2016.'”

That’s just as compelling a title as “The Ultimate Guide to Smart Flight Planning,” but maybe a tiny bit less. You could quibble. But I’m sure you could come up with one, and it uses our keywords intelligently. Now I’ve got sub-headings that are “sort by the cheapest,” “the least crowded,” “the most flexible,” “by airline,” “by location.” Great. I’ve hit all my topic areas and all my keyword areas at the same time, all in one piece of content.

This kind of model, where we combine the best of these two worlds, I think is the way of the future. I don’t think it pays to stick to your old-school keyword targeting methodology, nor do I think it pays to ignore keyword targeting and keyword research entirely. I think we’ve got to merge these practices and come up with something smart.

All right everyone. I look forward to your comments, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

The Machine Learning Revolution: How it Works and its Impact on SEO

Posted by EricEnge

Machine learning is already a very big deal. It’s here, and it’s in use in far more businesses than you might suspect. A few months back, I decided to take a deep dive into this topic to learn more about it. In today’s post, I’ll dive into a certain amount of technical detail about how it works, but I also plan to discuss its practical impact on SEO and digital marketing.

For reference, check out Rand Fishkin’s presentation about how we’ve entered into a two-algorithm world. Rand addresses the impact of machine learning on search and SEO in detail in that presentation, and how it influences SEO. I’ll talk more about that again later.

For fun, I’ll also include a tool that allows you to predict your chances of getting a retweet based on a number of things: your Followerwonk Social Authority, whether you include images, hashtags, and several other similar factors. I call this tool the Twitter Engagement Predictor (TEP). To build the TEP, I created and trained a neural network. The tool will accept input from you, and then use the neural network to predict your chances of getting an RT.

The TEP leverages the data from a study I published in December 2014 on Twitter engagement, where we reviewed information from 1.9M original tweets (as opposed to RTs and favorites) to see what factors most improved the chances of getting a retweet.

My machine learning journey

I got my first meaningful glimpse of machine learning back in 2011 when I interviewed Google’s Peter Norvig, and he told me how Google had used it to teach Google Translate.

Basically, they looked at all the language translations they could find across the web and learned from them. This is a very intense and complicated example of machine learning, and Google had deployed it by 2011. Suffice it to say that all the major market players — such as Google, Apple, Microsoft, and Facebook — already leverage machine learning in many interesting ways.

Back in November, when I decided I wanted to learn more about the topic, I started doing a variety of searches of articles to read online. It wasn’t long before I stumbled upon this great course on machine learning on Coursera. It’s taught by Andrew Ng of Stanford University, and it provides an awesome, in-depth look at the basics of machine learning.

Warning: This course is long (19 total sections with an average of more than one hour of video each). It also requires an understanding of calculus to get through the math. In the course, you’ll be immersed in math from start to finish. But the point is this: If you have the math background, and the determination, you can take a free online course to get started with this stuff.

In addition, Ng walks you through many programming examples using a language called Octave. You can then take what you’ve learned and create your own machine learning programs. This is exactly what I have done in the example program included below.

Basic concepts of machine learning

First of all, let me be clear: this process didn’t make me a leading expert on this topic. However, I’ve learned enough to provide you with a serviceable intro to some key concepts. You can break machine learning into two classes: supervised and unsupervised. First, I’ll take a look at supervised machine learning.

Supervised machine learning

At its most basic level, you can think of supervised machine learning as creating a series of equations to fit a known set of data. Let’s say you want an algorithm to predict housing prices (an example that Ng uses frequently in the Coursera classes). You might get some data that looks like this (note that the data is totally made up):

In this example, we have (fictitious) historical data that indicates the price of a house based on its size. As you can see, the price tends to go up as house size goes up, but the data does not fit into a straight line. However, you can calculate a straight line that fits the data pretty well, and that line might look like this:

This line can then be used to predict the pricing for new houses. We treat the size of the house as the “input” to the algorithm and the predicted price as the “output.” For example, if you have a house that is 2600 square feet, the price looks like it would be about $xxxK ?????? dollars.

However, this model turns out to be a bit simplistic. There are other factors that can play into housing prices, such as the total rooms, number of bedrooms, number of bathrooms, and lot size. Based on this, you could build a slightly more complicated model, with a table of data similar to this one:

Already you can see that a simple straight line will not do, as you’ll have to assign weights to each factor to come up with a housing price prediction. Perhaps the biggest factors are house size and lot size, but rooms, bedrooms, and bathrooms all deserve some weight as well (all of these would be considered new “inputs”).

Even now, we’re still being quite simplistic. Another huge factor in housing prices is location. Pricing in Seattle, WA is different than it is in Galveston, TX. Once you attempt to build this algorithm on a national scale, using location as an additional input, you can see that it starts to become a very complex problem.

You can use machine learning techniques to solve any of these three types of problems. In each of these examples, you’d assemble a large data set of examples, which can be called training examples, and run a set of programs to design an algorithm to fit the data. This allows you to submit new inputs and use the algorithm to predict the output (the price, in this case). Using training examples like this is what’s referred to as “supervised machine learning.”

Classification problems

This a special class of problems where the goal is to predict specific outcomes. For example, imagine we want to predict the chances that a newborn baby will grow to be at least 6 feet tall. You could imagine that inputs might be as follows:

The output of this algorithm might be a 0 if the person was going to shorter than 6 feet tall, or 1 if they were going to be 6 feet or taller. What makes it a classification problem is that you are putting the input items into one specific class or another. For the height prediction problem as I described it, we are not trying to guess the precise height, but a simple over/under 6 feet prediction.

Some examples of more complex classifying problems are handwriting recognition (recognizing characters) and identifying spam email.

Unsupervised machine learning

Unsupervised machine learning is used in situations where you don’t have training examples. Basically, you want to try and determine how to recognize groups of objects with similar properties. For example, you may have data that looks like this:

The algorithm will then attempt to analyze this data and find out how to group them together based on common characteristics. Perhaps in this example, all of the red “x” points in the following chart share similar attributes:

However, the algorithm may have trouble recognizing outlier points, and may group the data more like this:

What the algorithm has done is find natural groupings within the data, but unlike supervised learning, it had to determine the features that define each group. One industry example of unsupervised learning is Google News. For example, look at the following screen shot:

You can see that the main news story is about Iran holding 10 US sailors, but there are also related news stories shown from Reuters and Bloomberg (circled in red). The grouping of these related stories is an unsupervised machine learning problem, where the algorithm learns to group these items together.

Other industry examples of applied machine learning

A great example of a machine learning algo is the Author Extraction algorithm that Moz has built into their Moz Content tool. You can read more about that algorithm here. The referenced article outlines in detail the unique challenges that Moz faced in solving that problem, as well as how they went about solving it.

As for Stone Temple Consulting’s Twitter Engagement Predictor, this is built on a neural network. A sample screen for this program can be seen here:

The program makes a binary prediction as to whether you’ll get a retweet or not, and then provides you with a percentage probability for that prediction being true.

For those who are interested in the gory details, the neural network configuration I used was six input units, fifteen hidden units, and two output units. The algorithm used one million training examples and two hundred training iterations. The training process required just under 45 billion calculations.

One thing that made this exercise interesting is that there are many conflicting data points in the raw data. Here’s an example of what I mean:

What this shows is the data for people with Followerwonk Social Authority between 0 and 9, and a tweet with no images, no URLs, no @mentions of other users, two hashtags, and between zero and 40 characters. We had 1156 examples of such tweets that did not get a retweet, and 17 that did.

The most desirable outcome for the resulting algorithm is to predict that these tweets not get a retweet, so that would make it wrong 1.4% of the time (17 times out of 1173). Note that the resulting neural network assesses the probability of getting a retweet at 2.1%.

I did a calculation to tabulate how many of these cases existed. I found that we had 102,045 individual training examples where it was desirable to make the wrong prediction, or for just slightly over 10% of all our training data. What this means is that the best the neural network will be able to do is make the right prediction just under 90% of the time.

I also ran two other sets of data (470K and 473K samples in size) through the trained network to see the accuracy level of the TEP. I found that it was 81% accurate in its absolute (yes/no) prediction of the chance of getting a retweet. Bearing in mind that those also had approximately 10% of the samples where making the wrong prediction is the right thing to do, that’s not bad! And, of course, that’s why I show the percentage probability of a retweet, rather than a simple yes/no response.

Try the predictor yourself and let me know what you think! (You can discover your Social Authority by heading to Followerwonk and following these quick steps.) Mind you, this was simply an exercise for me to learn how to build out a neural network, so I recognize the limited utility of what the tool does — no need to give me that feedback ;->.

Examples of algorithms Google might have or create

So now that we know a bit more about what machine learning is about, let’s dive into things that Google may be using machine learning for already:

Penguin

One approach to implementing Penguin would be to identify a set of link characteristics that could potentially be an indicator of a bad link, such as these:

  1. External link sitting in a footer
  2. External link in a right side bar
  3. Proximity to text such as “Sponsored” (and/or related phrases)
  4. Proximity to an image with the word “Sponsored” (and/or related phrases) in it
  5. Grouped with other links with low relevance to each other
  6. Rich anchor text not relevant to page content
  7. External link in navigation
  8. Implemented with no user visible indication that it’s a link (i.e. no line under it)
  9. From a bad class of sites (from an article directory, from a country where you don’t do business, etc.)
  10. …and many other factors

Note that any one of these things isn’t necessarily inherently bad for an individual link, but the algorithm might start to flag sites if a significant portion of all of the links pointing to a given site have some combination of these attributes.

What I outlined above would be a supervised machine learning approach where you train the algorithm with known bad and good links (or sites) that have been identified over the years. Once the algo is trained, you would then run other link examples through it to calculate the probability that each one is a bad link. Based on the percentage of links (and/or total PageRank) coming from bad links, you could then make a decision to lower the site’s rankings, or not.

Another approach to this same problem would be to start with a database of known good links and bad links, and then have the algorithm automatically determine the characteristics (or features) of those links. These features would probably include factors that humans may not have considered on their own.

Panda

Now that you’ve seen the Penguin example, this one should be a bit easier to think about. Here are some things that might be features of sites with poor-quality content:

  1. Small number of words on the page compared to competing pages
  2. Low use of synonyms
  3. Overuse of main keyword of the page (from the title tag)
  4. Large blocks of text isolated at the bottom of the page
  5. Lots of links to unrelated pages
  6. Pages with content scraped from other sites
  7. …and many other factors

Once again, you could start with a known set of good sites and bad sites (from a content perspective) and design an algorithm to determine the common characteristics of those sites.

As with the Penguin discussion above, I’m in no way representing that these are all parts of Panda — they’re just meant to illustrate the overall concept of how it might work.

How machine learning impacts SEO

The key to understanding the impact of machine learning on SEO is understanding what Google (and other search engines) want to use it for. A key insight is that there’s a strong correlation between Google providing high-quality search results and the revenue they get from their ads.

Back in 2009, Bing and Google performed some tests that showed how even introducing small delays into their search results significantly impacted user satisfaction. In addition, those results showed that with lower satisfaction came fewer clicks and lower revenues:

The reason behind this is simple. Google has other sources of competition, and this goes well beyond Bing. Texting friends for their input is one form of competition. So are Facebook, Apple/Siri, and Amazon. Alternative sources of information and answers exist for users, and they are working to improve the quality of what they offer every day. So must Google.

I’ve already suggested that machine learning may be a part of Panda and Penguin, and it may well be a part of the “Search Quality” algorithm. And there are likely many more of these types of algorithms to come.

So what does this mean?

Given that higher user satisfaction is of critical importance to Google, it means that content quality and user satisfaction with the content of your pages must now be treated by you as an SEO ranking factor. You’re going to need to measure it, and steadily improve it over time. Some questions to ask yourself include:

  1. Does your page meet the intent of a large percentage of visitors to it? If a user is interested in that product, do they need help in selecting it? Learning how to use it?
  2. What about related intents? If someone comes to your site looking for a specific product, what other related products could they be looking for?
  3. What gaps exist in the content on the page?
  4. Is your page a higher-quality experience than that of your competitors?
  5. What’s your strategy for measuring page performance and improving it over time?

There are many ways that Google can measure how good your page is, and use that to impact rankings. Here are some of them:

  1. When they arrive on your page after clicking on a SERP, how long do they stay? How does that compare to competing pages?
  2. What is the relative rate of CTR on your SERP listing vs. competition?
  3. What volume of brand searches does your business get?
  4. If you have a page for a given product, do you offer thinner or richer content than competing pages?
  5. When users click back to the search results after visiting your page, do they behave like their task was fulfilled? Or do they click on other results or enter followup searches?

For more on how content quality and user satisfaction has become a core SEO factor, please check out the following:

  1. Rand’s presentation on a two-algorithm world
  2. My article on Term Frequency Analysis
  3. My article on Inverse Document Frequency
  4. My article on Content Effectiveness Optimization

Summary

Machine learning is becoming highly prevalent. The barrier to learning basic algorithms is largely gone. All the major players in the tech industry are leveraging it in some manner. Here’s a little bit on what Facebook is doing, and machine learning hiring at Apple. Others are offering platforms to make implementing machine learning easier, such as Microsoft and Amazon.

For people involved in SEO and digital marketing, you can expect that these major players are going to get better and better at leveraging these algorithms to help them meet their goals. That’s why it will be of critical importance to tune your strategies to align with the goals of those organizations.

In the case of SEO, machine learning will steadily increase the importance of content quality and user experience over time. For you, that makes it time to get on board and make these factors a key part of your overall SEO strategy.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Moz Content Launches New Tiers and Google Analytics Integration

Posted by JayLeary

When we launched Moz Content at the end of November, we limited the subscription to a single tier. At the time we wanted to get the product out in the wild and highlight the importance of content auditing, competitive research, and data-driven content strategy. To all of those early adopters who signed up as Strategists over the last few months, a big thanks.

Since then, we’ve made a long list of improvements to both Audit performance/stability and the size of our Content Search index. Along with those updates we also added two new subscriptions for larger sites and agencies handling multiple clients: Teams and Agencies. The new tiers not only increase page and Audit limits, but also enable Google Analytics integration with Tracked Audits.

More data for Tracked Audits

For those that aren’t familiar, Tracked Audits let you trend and monitor the performance of your site’s content over time. On the backend, Moz Content re-audits your site every week in order to discover new pages and update site metrics. This allows you to compare, say, the average shares per article across your entire site from week-to-week or month-to-month.

To date, Moz Content Audits have focused on links and shares. We did that intentionally, since a major goal for the product was to enable the analysis of any site on the web, including competitors. We also wanted to give agencies the freedom to prospect or audit clients without painful integrations or code snippets.

That said (and I’m guessing you’ll agree), figuring out your content ROI usually requires a deeper look at site performance.

With Moz Content, we stopped short of a full, conversion-focused analysis suite with custom tracking code and the like. Instead we’re focused on a product that delivers content analysis and insights to our entire community of online marketers — not just the select few that can afford it. Besides, there are plenty of big-ticket tools out there filling the niche, great products like Newscred, Kapost, SimpleReach, and Idio.

So while we’re not jumping into the “enterprise” ring (just yet), we did want to give data-driven marketers a leg up in their analysis. The good news: if you’re a Teams or Agencies subscriber, you now have the added option of integrating Google Analytics with Tracked Audits.

It’s easy to get started, and if you’re a Moz Pro subscriber you’re probably familiar with the authentication flow. Just go to any Tracked Audit you have GA data for, scroll down to the middle of the report, and click “Connect Google Analytics”:

authentication.png

Once you’ve connected a profile, Moz Content immediately pulls in key metrics for each URL in the Inventory:

inventory.png

You’ll probably notice we’re only displaying page views in the inventory. While we didn’t have the real estate to include all of the metrics we’re collecting in the interface, we’ve added them to the CSV export:

csv.png

Once you dump the data to Excel you’ll see the following metrics for the Organic and Paid segments, as well as a rollup of all referrers:

  • Unique Page Views
  • All Page Views
  • Time on Page
  • Bounce Rate
  • Page Value

After we’ve collected multiple audits, Moz Content also starts trending aggregated metrics so you can get a sense of performance over time:

graph.png

We’re hoping this added reporting gets you a step closer to that all-important ROI analysis. Some of you will already have a sense of how much a page view is worth, or the impact of a unique page view on a specific conversion. And for the Google Analytics pros out there, properly configuring Page Value will give you a direct indicator of a page’s effectiveness.

This is just the beginning, and we’d love to hear about other ways we could make the GA data more useful. Please reach out and let us know what you think by clicking the round, blue Help button in the lower-right corner of the Moz Content app.

More to come!

We’re pushing out code every day to improve the app experience and build on the current features. We’re also growing our Content Search index with new sources of popular, trending content. If you haven’t tried it for a while, we encourage you to take a second look. Head over to http://moz.com/content and start a search or enter a domain to preview an Audit (be sure to sign into your community account to access both the Dashboard and increased limits).

On the feature front, we’re pushing to integrate Twitter data into both the Audit and Content Search. As I’m sure you’ve noticed, this has been missing in our reporting to date. While we won’t have exact share counts for individual articles (see Twitter’s decision to deprecate the share count endpoint), we’re confident we can provide related information that’ll be useful for your Twitter analysis.

As we develop this and other features, we’re always on the lookout for feedback from our community. As always, feel free to reach out to our Help team with any issues or feedback about the product. And if you’ve used Moz Content and are interested in beta testing the latest, shoot us an email at mozcontent+testing@moz.com and we’ll add you to the list.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

New and Improved Local Search Expert Quiz: What’s Up with Local SEO in 2016?

Posted by Isla_McKetta

Think you’re up on the latest developments in local SEO?

One year ago we asked you to test your local SEO knowledge with the Local Search Expert Quiz. Because the SERPs are changing so fast and (according to our latest Industry Survey) over 42% of online marketers report spending more time on local search in the past 12 months, we’ve created an updated version.

Written by local search expert Miriam Ellis, the quiz contains 40 questions designed to test both your general local SEO knowledge and your industry awareness. Bonus? The quiz takes less than 10 minutes to complete.

Ready to get started? When you are finished, we’ll automatically score your quiz.

Rating your score

Although the Local Search Expert Quiz is just for fun, we’ve established the following guidelines for bragging rights:

  • 0–14 Newbie: Time to study up on your citation data!
  • 15–23 Beginner: Good job, but you’re not quite in the 3-pack yet.
  • 24–29 Intermediate: You’re getting close to the centroid!
  • 30–34 Pro: Let’s tackle multi-location!
  • 35–40 Guru: We all bow down to your local awesomeness.

Resources to improve your performance

Didn’t get the score you hoped for? Brush up on your local SEO knowledge with this collection of free learning resources:

  1. The Moz Local Learning Center
  2. Glossary of Local Search Terms and Definitions
  3. Guidelines for Representing Your Business on Google
  4. Local Search Ranking Factors
  5. Blumenthal’s Blog
  6. Local SEO Guide
  7. Whitespark Blog

You can also learn the latest local search tips and tricks by signing up for the MozCon Local one-day conference, subscribing to the Moz Local Top 7 newsletter, or reading local SEO posts on the Moz Blog.

Don’t forget to brag about your local search expertise in the comments below!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →