About frans

Website:
frans has written 4625 articles so far, you can find them below.

So You Want to Build a Chat Bot – Here’s How (Complete with Code!)

Posted by R0bin_L0rd

You’re busy and (depending on effective keyword targeting) you’ve come here looking for something to shave months off the process of learning to produce your own chat bot. If you’re convinced you need this and just want the how-to, skip to “What my bot does.” If you want the background on why you should be building for platforms like Google Home, Alexa, and Facebook Messenger, read on.

Why should I read this?

Do you remember when it wasn’t necessary to have a website? When most boards would scoff at the value of running a Facebook page? Now Gartner is telling us that customers will manage 85% of their relationship with brands without interacting with a human by 2020 and publications like Forbes are saying that chat bots are the cause.

The situation now is the same as every time a new platform develops: if you don’t have something your customers can access, you’re giving that medium to your competition. At the moment, an automated presence on Google Home or Slack may not be central to your strategy, but those who claim ground now could dominate it in the future.

The problem is time. Sure, it’d be ideal to be everywhere all the time, to have your brand active on every platform. But it would also be ideal to catch at least four hours sleep a night or stop covering our keyboards with three-day-old chili con carne as we eat a hasty lunch in between building two of the Next Big Things. This is where you’re fortunate in two ways;

  1. When we develop chat applications, we don’t have to worry about things like a beautiful user interface because it’s all speech or text. That’s not to say you don’t need to worry about user experience, as there are rules (and an art) to designing a good conversational back-and-forth. Amazon is actually offering some hefty prizes for outstanding examples.
  2. I’ve spent the last six months working through the steps from complete ignorance to creating a distributable chat bot and I’m giving you all my workings. In this post I break down each of the levels of complexity, from no-code back-and-forth to managing user credentials and sessions the stretch over days or months. I’m also including full code that you can adapt and pull apart as needed. I’ve commented each portion of the code explaining what it does and linking to resources where necessary.

I’ve written more about the value of Interactive Personal Assistants on the Distilled blog, so this post won’t spend any longer focusing on why you should develop chat bots. Instead, I’ll share everything I’ve learned.

What my built-from-scratch bot does

Ever since I started investigating chat bots, I was particularly interested in finding out the answer to one question: What does it take for someone with little-to-no programming experience to create one of these chat applications from scratch? Fortunately, I have direct access to someone with little-to-no experience (before February, I had no idea what Python was). And so I set about designing my own bot with the following hard conditions:

  1. It had to have some kind of real-world application. It didn’t have to be critical to a business, but it did have to bear basic user needs in mind.
  2. It had to be easily distributable across the immediate intended users, and to have reasonable scope to distribute further (modifications at most, rather than a complete rewrite).
  3. It had to be flexible enough that you, the reader, can take some free code and make your own chat bot.
  4. It had to be possible to adapt the skeleton of the process for much more complex business cases.
  5. It had to be free to run, but could have the option of paying to scale up or make life easier.
  6. It had to send messages confirming when important steps had been completed.

The resulting program is “Vietnambot,” a program that communicates with Slack, the API.AI linguistic processing platform, and Google Sheets, using real-time and asynchronous processing and its own database for storing user credentials.

If that meant nothing to you, don’t worry — I’ll define those things in a bit, and the code I’m providing is obsessively commented with explanation. The thing to remember is it does all of this to write down food orders for our favorite Vietnamese restaurant in a shared Google Sheet, probably saving tens of seconds of Distilled company time every year.

It’s deliberately mundane, but it’s designed to be a template for far more complex interactions. The idea is that whether you want to write a no-code-needed back-and-forth just through API.AI; a simple Python program that receives information, does a thing, and sends a response; or something that breaks out of the limitations of linguistic processing platforms to perform complex interactions in user sessions that can last days, this post should give you some of the puzzle pieces and point you to others.

What is API.AI and what’s it used for?

API.AI is a linguistic processing interface. It can receive text, or speech converted to text, and perform much of the comprehension for you. You can see my Distilled post for more details, but essentially, it takes the phrase “My name is Robin and I want noodles today” and splits it up into components like:

  • Intent: food_request
  • Action: process_food
  • Name: Robin
  • Food: noodles
  • Time: today

This setup means you have some hope of responding to the hundreds of thousands of ways your users could find to say the same thing. It’s your choice whether API.AI receives a message and responds to the user right away, or whether it receives a message from a user, categorizes it and sends it to your application, then waits for your application to respond before sending your application’s response back to the user who made the original request. In its simplest form, the platform has a bunch of one-click integrations and requires absolutely no code.

I’ve listed the possible levels of complexity below, but it’s worth bearing some hard limitations in mind which apply to most of these services. They cannot remember anything outside of a user session, which will automatically end after about 30 minutes, they have to do everything through what are called POST and GET requests (something you can ignore unless you’re using code), and if you do choose to have it ask your application for information before it responds to the user, you have to do everything and respond within five seconds.

What are the other things?

Slack: A text-based messaging platform designed for work (or for distracting people from work).

Google Sheets: We all know this, but just in case, it’s Excel online.

Asynchronous processing: Most of the time, one program can do one thing at a time. Even if it asks another program to do something, it normally just stops and waits for the response. Asynchronous processing is how we ask a question and continue without waiting for the answer, possibly retrieving that answer at a later time.

Database: Again, it’s likely you know this, but if not: it’s Excel that our code will use (different from the Google Sheet).

Heroku: A platform for running code online. (Important to note: I don’t work for Heroku and haven’t been paid by them. I couldn’t say that it’s the best platform, but it can be free and, as of now, it’s the one I’m most familiar with).

How easy is it?

This graph isn’t terribly scientific and it’s from the perspective of someone who’s learning much of this for the first time, so here’s an approximate breakdown:

Label

Functionality

Time it took me

1

You set up the conversation purely through API.AI or similar, no external code needed. For instance, answering set questions about contact details or opening times

Half an hour to distributable prototype

2

A program that receives information from API.AI and uses that information to update the correct cells in a Google Sheet (but can’t remember user names and can’t use the slower Google Sheets integrations)

A few weeks to distributable prototype

3

A program that remembers user names once they’ve been set and writes them to Google Sheets. Is limited to five seconds processing time by API.AI, so can’t use the slower Google Sheets integrations and may not work reliably when the app has to boot up from sleep because that takes a few seconds of your allocation*

A few weeks on top of the last prototype

4

A program that remembers user details and manages the connection between API.AI and our chosen platform (in this case, Slack) so it can break out of the five-second processing window.

A few weeks more on top of the last prototype (not including the time needed to rewrite existing structures to work with this)

*On the Heroku free plan, when your app hasn’t been used for 30 minutes it goes to sleep. This means that the first time it’s activated it takes a little while to start your process, which can be a problem if you have a short window in which to act. You could get around this by (mis)using a free “uptime monitoring service” which sends a request every so often to keep your app awake. If you choose this method, in order to avoid using all of the Heroku free hours allocation by the end of the month, you’ll need to register your card (no charge, it just gets you extra hours) and only run this application on the account. Alternatively, there are any number of companies happy to take your money to keep your app alive.

For the rest of this post, I’m going to break down each of those key steps and either give an overview of how you could achieve it, or point you in the direction of where you can find that. The code I’m giving you is Python, but as long as you can receive and respond to GET and POST requests, you can do it in pretty much whatever format you wish.


1. Design your conversation

Conversational flow is an art form in itself. Jonathan Seal, strategy director at Mando and member of British Interactive Media Association’s AI thinktank, has given some great talks on the topic. Paul Pangaro has also spoken about conversation as more than interface in multiple mediums.

Your first step is to create a flow chart of the conversation. Write out your ideal conversation, then write out the most likely ways a person might go off track and how you’d deal with them. Then go online, find existing chat bots and do everything you can to break them. Write out the most difficult, obtuse, and nonsensical responses you can. Interact with them like you’re six glasses of wine in and trying to order a lemon engraving kit, interact with them as though you’ve found charges on your card for a lemon engraver you definitely didn’t buy and you are livid, interact with them like you’re a bored teenager. At every point, write down what you tried to do to break them and what the response was, then apply that to your flow. Then get someone else to try to break your flow. Give them no information whatsoever apart from the responses you’ve written down (not even what the bot is designed for), refuse to answer any input you don’t have written down, and see how it goes. David Low, principal evangelist for Amazon Alexa, often describes the value of printing out a script and testing the back-and-forth for a conversation. As well as helping to avoid gaps, it’ll also show you where you’re dumping a huge amount of information on the user.

While “best practices” are still developing for chat bots, a common theme is that it’s not a good idea to pretend your bot is a person. Be upfront that it’s a bot — users will find out anyway. Likewise, it’s incredibly frustrating to open a chat and have no idea what to say. On text platforms, start with a welcome message making it clear you’re a bot and giving examples of things you can do. On platforms like Google Home and Amazon Alexa users will expect a program, but the “things I can do” bit is still important enough that your bot won’t be approved without this opening phase.

I’ve included a sample conversational flow for Vietnambot at the end of this post as one way to approach it, although if you have ideas for alternative conversational structures I’d be interested in reading them in the comments.

A final piece of advice on conversations: The trick here is to find organic ways of controlling the possible inputs and preparing for unexpected inputs. That being said, the Alexa evangelist team provide an example of terrible user experience in which a bank’s app said: “If you want to continue, say nine.” Quite often questions, rather than instructions, are the key.

2. Create a conversation in API.AI

API.AI has quite a lot of documentation explaining how to create programs here, so I won’t go over individual steps.

Key things to understand:

You create agents; each is basically a different program. Agents recognize intents, which are simply ways of triggering a specific response. If someone says the right things at the right time, they meet criteria you have set, fall into an intent, and get a pre-set response.

The right things to say are included in the “User says” section (screenshot below). You set either exact phrases or lists of options as the necessary input. For instance, a user could write “Of course, I’m [any name]” or “Of course, I’m [any temperature].” You could set up one intent for name-is which matches “Of course, I’m [given-name]” and another intent for temperature which matches “Of course, I’m [temperature],” and depending on whether your user writes a name or temperature in that final block you could activate either the “name-is” or “temperature-is” intent.

The “right time” is defined by contexts. Contexts help define whether an intent will be activated, but are also created by certain intents. I’ve included a screenshot below of an example interaction. In this example, the user says that they would like to go to on holiday. This activates a holiday intent and sets the holiday context you can see in input contexts below. After that, our service will have automatically responded with the question “where would you like to go?” When our user says “The” and then any location, it activates our holiday location intent because it matches both the context, and what the user says. If, on the other hand, the user had initially said “I want to go to the theater,” that might have activated the theater intent which would set a theater context — so when we ask “what area of theaters are you interested in?” and the user says “The [location]” or even just “[location],” we will take them down a completely different path of suggesting theaters rather than hotels in Rome.

The way you can create conversations without ever using external code is by using these contexts. A user might say “What times are you open?”; you could set an open-time-inquiry context. In your response, you could give the times and ask if they want the phone number to contact you. You would then make a yes/no intent which matches the context you have set, so if your user says “Yes” you respond with the number. This could be set up within an hour but gets exponentially more complex when you need to respond to specific parts of the message. For instance, if you have different shop locations and want to give the right phone number without having to write out every possible location they could say in API.AI, you’ll need to integrate with external code (see section three).

Now, there will be times when your users don’t say what you’re expecting. Excluding contexts, there are three very important ways to deal with that:

  1. Almost like keyword research — plan out as many possible variations of saying the same thing as possible, and put them all into the intent
  2. Test, test, test, test, test, test, test, test, test, test, test, test, test, test, test (when launched, every chat bot will have problems. Keep testing, keep updating, keep improving.)
  3. Fallback contexts

Fallback contexts don’t have a user says section, but can be boxed in by contexts. They match anything that has the right context but doesn’t match any of your user says. It could be tempting to use fallback intents as a catch-all. Reasoning along the lines of “This is the only thing they’ll say, so we’ll just treat it the same” is understandable, but it opens up a massive hole in the process. Fallback intents are designed to be a conversational safety net. They operate exactly the same as in a normal conversation. If a person asked what you want in your tea and you responded “I don’t want tea” and that person made a cup of tea, wrote the words “I don’t want tea” on a piece of paper, and put it in, that is not a person you’d want to interact with again. If we are using fallback intents to do anything, we need to preface it with a check. If we had to resort to it in the example above, saying “I think you asked me to add I don’t want tea to your tea. Is that right?” is clunky and robotic, but it’s a big step forward, and you can travel the rest of the way by perfecting other parts of your conversation.

3. Integrating with external code

I used Heroku to build my app . Using this excellent weather webhook example you can actually deploy a bot to Heroku within minutes. I found this example particularly useful as something I could pick apart to make my own call and response program. The weather webhook takes the information and calls a yahoo app, but ignoring that specific functionality you essentially need the following if you’re working in Python:

#start
    req = request.get_json
    print("Request:")
    print(json.dumps(req, indent=4))
#process to do your thing and decide what response should be

    res = processRequest(req)
# Response we should receive from processRequest (you’ll need to write some code called processRequest and make it return the below, the weather webhook example above is a good one).
{
        "speech": “speech we want to send back”,
        "displayText": “display text we want to send back, usually matches speech”,
        "source": "your app name"
    }

# Making our response readable by API.AI and sending it back to the servic

 response = make_response(res)
    response.headers['Content-Type'] = 'application/json'
    return response
# End

As long as you can receive and respond to requests like that (or in the equivalent for languages other than Python), your app and API.AI should both understand each other perfectly — what you do in the interim to change the world or make your response is entirely up to you. The main code I have included is a little different from this because it’s also designed to be the step in-between Slack and API.AI. However, I have heavily commented sections like like process_food and the database interaction processes, with both explanation and reading sources. Those comments should help you make it your own. If you want to repurpose my program to work within that five-second window, I would forget about the file called app.py and aim to copy whole processes from tasks.py, paste them into a program based on the weatherhook example above, and go from there.

Initially I’d recommend trying GSpread to make some changes to a test spreadsheet. That way you’ll get visible feedback on how well your application is running (you’ll need to go through the authorization steps as they are explained here).

4. Using a database

Databases are pretty easy to set up in Heroku. I chose the Postgres add-on (you just need to authenticate your account with a card; it won’t charge you anything and then you just click to install). In the import section of my code I’ve included links to useful resources which helped me figure out how to get the database up and running — for example, this blog post.

I used the Python library Psycopg2 to interact with the database. To steal some examples of using it in code, have a look at the section entitled “synchronous functions” in either the app.py or tasks.py files. Open_db_connection and close_db_connection do exactly what they say on the tin (open and close the connection with the database). You tell check_database to check a specific column for a specific user and it gives you the value, while update_columns adds a value to specified columns for a certain user record. Where things haven’t worked straightaway, I’ve included links to the pages where I found my solution. One thing to bear in mind is that I’ve used a way of including columns as a variable, which Psycopg2 recommends quite strongly against. I’ve gotten away with it so far because I’m always writing out the specific column names elsewhere — I’m just using that method as a short cut.

5. Processing outside of API.AI’s five-second window

It needs to be said that this step complicates things by no small amount. It also makes it harder to integrate with different applications. Rather than flicking a switch to roll out through API.AI, you have to write the code that interprets authentication and user-specific messages for each platform you’re integrating with. What’s more, spoken-only platforms like Google Home and Amazon Alexa don’t allow for this kind of circumvention of the rules — you have to sit within that 5–8 second window, so this method removes those options. The only reasons you should need to take the integration away from API.AI are:

  • You want to use it to work with a platform that it doesn’t have an integration with. It currently has 14 integrations including Facebook Messenger, Twitter, Slack, and Google Home. It also allows exporting your conversations in an Amazon Alexa-understandable format (Amazon has their own similar interface and a bunch of instructions on how to build a skill — here is an example.
  • You are processing masses of information. I’m talking really large amounts. Some flight comparison sites have had problems fitting within the timeout limit of these platforms, but if you aren’t trying to process every detail for every flight for the next 12 months and it’s taking more than five seconds, it’s probably going to be easier to make your code more efficient than work outside the window. Even if you are, those same flight comparison sites solved the problem by creating a process that regularly checks their full data set and creates a smaller pool of information that’s more quickly accessible.
  • You need to send multiple follow-up messages to your user. When using the API.AI integration it’s pretty much call-and-response; you don’t always get access to things like authorization tokens, which are what some messaging platforms require before you can automatically send messages to one of their users.
  • You’re working with another program that can be quite slow, or there are technical limitations to your setup. This one applies to Vietnambot, I used the GSpread library in my application, which is fantastic but can be slow to pull out bigger chunks of data. What’s more, Heroku can take a little while to start up if you’re not paying.

I could have paid or cut out some of the functionality to avoid needing to manage this part of the process, but that would have failed to meet number 4 in our original conditions: It had to be possible to adapt the skeleton of the process for much more complex business cases. If you decide you’d rather use my program within that five-second window, skip back to section 2 of this post. Otherwise, keep reading.

When we break out of the five-second API.AI window, we have to do a couple of things. First thing is to flip the process on its head.

What we were doing before:

User sends message -> API.AI -> our process -> API.AI -> user

What we need to do now:

User sends message -> our process -> API.AI -> our process -> user

Instead of API.AI waiting while we do our processing, we do some processing, wait for API.AI to categorize the message from us, do a bit more processing, then message the user.

The way this applies to Vietnambot is:

  1. User says “I want [food]”
  2. Slack sends a message to my app on Heroku
  3. My app sends a “swift and confident” 200 response to Slack to prevent it from resending the message. To send the response, my process has to shut down, so before it does that, it activates a secondary process using “tasks.”
  4. The secondary process takes the query text and sends it to API.AI, then gets back the response.
  5. The secondary process checks our database for a user name. If we don’t have one saved, it sends another request to API.AI, putting it in the “we don’t have a name” context, and sends a message to our user asking for their name. That way, when our user responds with their name, API.AI is already primed to interpret it correctly because we’ve set the right context (see section 1 of this post). API.AI tells us that the latest message is a user name and we save it. When we have both the user name and food (whether we’ve just got it from the database or just saved it to the database), Vietnambot adds the order to our sheet, calculates whether we’ve reached the order minimum for that day, and sends a final success message.

6. Integrating with Slack

This won’t be the same as integrating with other messaging services, but it could give some insight into what might be required elsewhere. Slack has two authorization processes; we’ll call one “challenge” and the other “authentication.”

Slack includes instructions for an app lifecycle here, but API.AI actually has excellent instructions for how to set up your app; as a first step, create a simple back-and-forth conversation in API.AI (not your full product), go to integrations, switch on Slack, and run through the steps to set it up. Once that is up and working, you’ll need to change the OAuth URL and the Events URL to be the URL for your app.

Thanks to github user karishay, my app code includes a process for responding to the challenge process (which will tell Slack you’re set up to receive events) and for running through the authentication process, using our established database to save important user tokens. There’s also the option to save them to a Google Sheet if you haven’t got the database established yet. However, be wary of this as anything other than a first step — user tokens give an app a lot of power and have to be guarded carefully.

7. Asynchronous processing

We are running our app using Flask, which is basically a whole bunch of code we can call upon to deal with things like receiving requests for information over the internet. In order to create a secondary worker process I’ve used Redis and Celery. Redis is our “message broker”; it makes makes a list of everything we want our secondary process to do. Celery runs through that list and makes our worker process do those tasks in sequence. Redis is a note left on the fridge telling you to do your washing and take out the bins, while Celery is the housemate that bangs on your bedroom door, note in hand, and makes you do each thing. I’m sure our worker process doesn’t like Celery very much, but it’s really useful for us.

You can find instructions for adding Redis to your app in Heroku here and you can find advice on setting up Celery in Heroku here. Miguel Grinberg’s Using Celery with Flask blog post is also an excellent resource, but using the exact setup he gives results in a clash with our database, so it’s easier to stick with the Heroku version.

Up until this point, we’ve been calling functions in our main app — anything of the form function_name(argument_1, argument_2, argument_3). Now, by putting “tasks.” in front of our function, we’re saying “don’t do this now — hand it to the secondary process.” That’s because we’ve done a few things:

  • We’ve created tasks.py which is the secondary process. Basically it’s just one big, long function that our main code tells to run.
  • In tasks.py we’ve included Celery in our imports and set our app as celery.Celery(), meaning that when we use “app” later we’re essentially saying “this is part of our Celery jobs list” or rather “tasks.py will only do anything when its flatmate Celery comes banging on the door”
  • For every time our main process asks for an asynchronous function by writing tasks.any_function_name(), we have created that function in our secondary program just as we would if it were in the same file. However in our secondary program we’ve prefaced with “@app.task”, another way of saying “Do wash_the_dishes when Celery comes banging the door yelling wash_the_dishes(dishes, water, heat, resentment)”.
  • In our “procfile” (included as a file in my code) we have listed our worker process as –app=tasks.app

All this adds up to the following process:

  1. Main program runs until it hits an asynchronous function
  2. Main program fires off a message to Redis which has a list of work to be done. The main process doesn’t wait, it just runs through everything after it and in our case even shuts down
  3. The Celery part of our worker program goes to Redis and checks for the latest update, it checks what function has been called (because our worker functions are named the same as when our main process called them), it gives our worker all the information to start doing that thing and tells it to get going
  4. Our worker process starts the action it has been told to do, then shuts down.

As with the other topics mentioned here, I’ve included all of this in the code I’ve supplied, along with many of the sources used to gather the information — so feel free to use the processes I have. Also feel free to improve on them; as I said, the value of this investigation was that I am not a coder. Any suggestions for tweaks or improvements to the code are very much welcome.


Conclusion

As I mentioned in the introduction to this post, there’s huge opportunity for individuals and organizations to gain ground by creating conversational interactions for the general public. For the vast majority of cases you could be up and running in a few hours to a few days, depending on how complex you want your interactions to be and how comfortable you are with coding languages. There are some stumbling blocks out there, but hopefully this post and my obsessively annotated code can act as templates and signposts to help get you on your way.

Grab my code at GitHub


Bonus #1: The conversational flow for my chat bot

This is by no means necessarily the best or only way to approach this interaction. This is designed to be as streamlined an interaction as possible, but we’re also working within the restrictions of the platform and the time investment necessary to produce this. Common wisdom is to create the flow of your conversation and then keep testing to perfect, so consider this example layout a step in that process. I’d also recommend putting one of these flow charts together before starting — otherwise you could find yourself having to redo a bunch of work to accommodate a better back-and-forth.

Bonus #2: General things I learned putting this together

As I mentioned above, this has been a project of going from complete ignorance of coding to slightly less ignorance. I am not a professional coder, but I found the following things I picked up to be hugely useful while I was starting out.

  1. Comment everything. You’ll probably see my code is bordering on excessive commenting (anything after a # is a comment). While normally I’m sure someone wouldn’t want to include a bunch of Stack Overflow links in their code, I found notes about what things portions of code were trying to do, and where I got the reasoning from, hugely helpful as I tried to wrap my head around it all.
  2. Print everything. In Python, everything within “print()” will be printed out in the app logs (see the commands tip for reading them in Heroku). While printing each action can mean you fill up a logging window terribly quickly (I started using the Heroku add-on LogDNA towards the end and it’s a huge step up in terms of ease of reading and length of history), often the times my app was falling over was because one specific function wasn’t getting what it needed, or because of another stupid typo. Having a semi-constant stream of actions and outputs logged meant I could find the fault much more quickly. My next step would probably be to introduce a way of easily switching on and off the less necessary print functions.
  3. The following commands: Heroku’s how-to documentation for creating an app and adding code is pretty great, but I found myself using these all the time so thought I’d share (all of the below are written in the command line; type cmd in on Windows or by running Terminal on a Mac):
    1. CD “””[file location]””” – select the file your code is in
    2. “git init” – create a git file to add to
    3. “git add .” – add all of the code in your file into the file that git will put online
    4. “git commit -m “[description of what you’re doing]” “ – save the data in your git file
    5. “heroku git:remote -a [the name of your app]” – select your app as where to put the code
    6. “git push heroku master” – send your code to the app you selected
    7. “heroku ps” – find out whether your app is running or crashed
    8. “heroku logs” – apologize to your other half for going totally unresponsive for the last ten minutes and start the process of working through your printouts to see what has gone wrong
  4. POST requests will always wait for a response. Seems really basic — initially I thought that by just sending a POST request and not telling my application to wait for a response I’d be able to basically hot-potato work around and not worry about having to finish what I was doing. That’s not how it works in general, and it’s more of a symbol of my naivete in programming than anything else.
  5. If something is really difficult, it’s very likely you’re doing it wrong. While I made sure to do pretty much all of the actual work myself (to avoid simply farming it out to the very talented individuals at Distilled), I was lucky enough to get some really valuable advice. The piece of advice above was from Dominic Woodman, and I should have listened to it more. The times when I made least progress were when I was trying to use things the way they shouldn’t be used. Even when I broke through those walls, I later found that someone didn’t want me to use it that way because it would completely fail at a later point. Tactical retreat is an option. (At this point, I should mention he wasn’t the only one to give invaluable advice; Austin, Tom, and Duncan of the Distilled R&D team were a huge help.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

The Anatomy of a $97 Million Page: A CRO Case Study

Posted by jkuria

In this post, we share a CRO case study from Protalus, one of the fastest-growing footwear companies in the world. They make an insole that corrects the misalignment suffered by roughly 85% of the population. Misalignment is the cause of most back, knee, and foot pain. Back pain alone is estimated to be worth $100 billion a year.


Summary

  • We (with Protalus’ team) increased direct sales by 91% in about 6 months through one-click upsells and CRO.
  • Based on the direct sales increase, current run-rate revenue, the “Virtuous Cycle of CRO”-fueled growth rate, and revenue multiple for their industry, we estimate this will add about $97 million to the company’s valuation over the next 12–18 months*.
  • A concrete example of the Virtuous Cycle of CRO: Before we increased the conversion rate and average order value, Google Adwords was not a viable channel. Now it is, opening a whole new floodgate of profitable sales! Ditto for at least two other channels. In part due to our work, Protalus’ annual run-rate revenue has grown by 1,212% in less than a year.

* Protalus’ core product is differentiated, patent protected, and high margin. They also have a strong brand and raving fans. In the Shoes & Apparel category, they’re most similar to Lululemon Athletica, which has a 4x plus revenue multiple. While Nike and Under Armor engage in a bloody price war and margin-eroding celebrity endorsements, Lululemon commands significantly higher prices than its peers, without big-name backers! Business gurus Warren Buffett and Charlie Munger often say that the true test of a defensive moat around a business is “Can you raise prices without hurting sales?” Protalus has this in spades. They’ve raised prices several times while simultaneously increasing units sold — from $39 to $49 to $59 to $69 to $79 to $99 to $119.


One-click upsells: A 21% sales boost

When we do engagements, the first order of business to uncover low-hanging fruit growth opportunities. This accomplishes two things:

  1. It helps the client get an immediate ROI on the engagement
  2. It earns us goodwill and credibility within the company. We then have wide latitude to run the big, bold experiments that produce huge conversion lifts

In Protalus’ case, we determined they were not doing post-purchase one-click upsells. Adding these immediately boosted sales by 21%. Here’s how we did it:

  • On their main sales landing page, Protalus has an offer where you get $30 off on the second pair of insoles, as well as free expedited shipping for both. About 30% of customers were taking this offer.
  • For those who didn’t, right after they purchased but BEFORE they got to the “Thank You” page, we presented the offer again, which led to the 21% sales increase.

Done correctly, one-click upsells easily boost sales, as customers do not have to re-enter credit card details. Here’s the best way to do them: The Little Secret that Made McDonalds a $106 Billion Behemoth.

Below is the final upsell page that got the 21% sales increase:

A screenshot of a cell phone Description generated with very high confidence

We tested our way to it. The key effective elements are:

1. Including “free upgrade to expedited shipping” in the headline: 145% lift

The original page had it lower in the body copy:

Google Experiments screenshot showing 145% lift

2. Adding celebrity testimonials: 60% lift

Google Experiments screenshot showing a 60% lift

Elisabeth Howard’s (Ms. Senior America) unsolicited endorsement is especially effective because about 60% of Protalus’ customers are female and almost one-third are retired. We uncovered these gems by reviewing all 11,000 (at the time) customers’ testimonials.

3. Explaining the reasons why other customers bought additional insoles.

See the three bulleted reasons on the first screenshot (convenience, different models, purchasing for loved ones).


Radical re-design and long-form page: A 58% conversion lift

With the upsells producing positive ROI for the client, we turned to re-designing the main sales page. The new page produced a cumulative lift of 58%, attained in two steps.

[Step 1] 35% lift: Long-form content-rich page

Optimizely screenshot shows 35% lift at 99% statistical significance

Note that even after reaching 99% statistical significance, the lift fluctuated between 33% and 37%, so we’ll claim 35%.

[Step 2] 17% lift: Performance improvements

The new page was quite a bit longer, so its “fully loaded” time increased a lot — especially on mobile devices with poor connections. A combination of lazy loading, lossless image shrinking, CSS sprites, and other ninja tactics led to a further 17% lift.

These optimizations reduced the page load time by 40% and shrunk the size by a factor of 4x!

The total cumulative lift was therefore 58% (1.35 x 1.17 = 1.58).

With the earlier 21% sales gain from one-click upsells, that’s a 91% sales increase (1.21 x 1.35 x 1.17 = 1.91).


Dissecting the anatomy of the winning page

To determine what vital few elements to change, we surveyed the non-converting visitors. Much of the work in A/B testing is the tedious research required to understand non-converting visitors.

“Give me six hours to chop a tree and I’ll spend the first four sharpening the axe.” – Abraham Lincoln

All CRO practitioners would do well to learn from good, ol’ honest Abe! We used Mouseflow’s feedback feature to survey bouncing visitors from the main landing page and the check-out page. The top objection themes were:

  1. Price is too high/product too expensive
  2. Not sure it will work (because others didn’t work before)
  3. Not sure it will work for my specific condition
  4. Difficulty in using website

We then came up with specific counter-objections for each: A landing page is a “salesmanship in digital print,” so many of the techniques that work in face-to-face selling also apply.

On a landing page, though, you must overcorrect because you lack the back- and-forth conversation in a live selling situation. Below is the list of key elements on the winning page.

1. Price is too high/product is too expensive

This was by far the biggest objection, cited by over 50% of all respondents. Thus, we spent a disproportionate amount of effort and page real estate on it.

Protalus’ insoles cost $79, whereas Dr. Scholls (the 100-year-old brand) cost less than $10. When asked what other products they considered, customers frequently said Dr. Scholls.

Coupled with this, nearly one-third of customers are retired and living on a fixed income.

“I ain’t gonna pay no stinkin’ $79! They cost more than my shoes,” one visitor remarked.

To overcome the price objection, we did a couple of things.

Articulated the core value proposition and attacked the price from the top

When prospects complain about price it simply means that they do not understand or appreciate the the product’s value proposition. They are seeing this:

The product’s cost exceeds the perceived value

To effectively deal with price, you must tilt the scale so that it looks like this instead:

The perceived value exceeds cost

While the sub-$10 Dr. Scholls was the reference point for many, we also learned that some customers had tried custom orthotics ($600 to $3,000) and Protalus’ insoles compared favorably.

We therefore decided our core value proposition would be:

“Avoid paying $600 for custom orthotics. Protalus insoles are almost as effective but cost 87% less.”

…forcing the $600 reference point, instead of the $10 for Dr. Scholls. In the conversion rate heuristic we use, the value proposition is the single biggest lever.

We explained all this from a “neutral” educational standpoint (rather than a salesy one) in three steps:

1. First, we use “market data” to explain the cause of most pain and establish that custom orthotics are more effective than over-the-counter insoles. Market data is always more compelling than product data, so you should lead with it.

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML32c02fc1.PNG

2. Next, like a good trial lawyer, we show why Protalus insoles are similar to custom orthotics but cost 87% less:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML32c1e5dd.PNG

3. Finally, we deal with the “elephant in the room” and explain how Protalus insoles are fundamentally different from Dr. Scholls:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML32c39c19.PNG

We also used several verbatim customer testimonials to reinforce this point:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML32c7042b.PNG

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML32c8a047.PNG

Whenever possible, let others do your bragging!

Attacked price from the bottom

Here, we used a technique known as “break the price down to the ridiculous.” $79 is just 44 cents per day, less than a K-cup of coffee — which most people consume once or twice a day! This makes the price more palatable.

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML32cd1f37.PNG

Used the quality argument

The quality technique is from Zig Ziglar’s Sales Training. You say to a prospect:

“Many years ago, our company/founder/founding team made a basic decision. We decided it would be easier to use the highest quality materials and explain price one time than it would be to apologize for low quality forever. When you use the product/service, you’ll be glad we made that decision.”

It’s especially effective if the company has a well-known “maker” founder (like Yvon Chouinardat at Patagonia). It doesn’t work as well for MBAs or suits, much as we need them!

Protalus’ founder Chris Buck designed the insoles and has a cult-like following, so it works for him.

Dire outcomes of not taking action

Here we talked about the dire outcomes if you do not get the insoles; for example, surgery, doctors’ bills, and lost productivity at work! Many customers work on their feet all day (nurses, steelworkers, etc.) so this last point is highly relevant.

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML3717c03d.PNG

Microsoft employed this technique successfully against Linux in the early 2000s. While Linux was free, the “Total Cost of Ownership” for not getting Windows was much higher when you considered support, frequent bugs, less accountability, fewer feature updates, and so on.

2. Not sure the product will work

For this objection, we did the following:

Used Dr. Romansky

We prominently featured Dr. Romansky, Protalus’ resident podiatrist. A consultant to the US Men’s and Women’s soccer teams and the Philadephia Phillies baseball team, he has serious credibility.

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML371d6ed4.PNG

The “educational” part of the landing page (above the fold) is done in “his voice.” Before, only his name appeared on a rarely visited page. This is an example of a “hidden wealth” opportunity!

Used celebrity testimonials on the main landing page

Back in 1997, a sports writer asked Phil Knight (Nike’s founder): “Is there no better way for you to spend $100 million?”

You see, Knight had just paid that staggering sum to a young Tiger Woods — and it seemed extravagant!

Knight’s answer? An emphatic “No!” That $100 million would generate several billion dollars in sales for Nike over the next decade!

Celebrity testimonials work. Period.

Since our celebrity endorsements increased the one-click upsell take-rate by 60%, we also used them on the main page:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML372a0993.PNG

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML3728f545.PNG

Used expert reviews

We solicited and included expert reviews from industry and medical professionals. Below are two of the four we used:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML372ff274.PNG

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML37315c55.PNG

These also helped address the price concern because some site visitors had expressed discomfort paying so much for an over-the-counter product without doctor recommendation.

3. Not sure the product will work for me

This is different from “Not sure the product will work” and needs to be treated separately. If there’s one thing we’ve learned over the years, it is that everyone thinks their situation is one-in-a-million unique!

We listed all the conditions that Protalus insoles address, as well as those they do not.

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML37353580.PNG

In addition, we clearly stated that the product does not work for 15% of the population.

By conspicuously admitting this (NOT just in the fine print section!) you are more credible. This is expressed in the Prospect’s Protest as:

“First tell me what your product CANNOT do and I might believe you when you tell me what it can do!”

4. Difficulty in using the site

Several visitors reported difficulty using the site, so we used Mouseflow’s powerful features to detect and fix usability issues.

Interestingly, the visitor session recordings confirmed that price was a big issue as we could clearly see prospects navigate to the price, stare incredulously, and then leave!

Accentuate the customers’ reasons for buying

Most of the opportunity in CRO is in the non-converting visitors (often over 90%), but understanding converting ones can yield crucial insights.*

For Protalus, the top reasons for buying were:

  • Desperation/too much leg, knee, or back pain/willing to try anything (This is the 4M, for “motivation,” in the strategic formula we use)
  • The testimonials were persuasive
  • Video was convincing

On the last point, the Mouseflow heatmaps showed that those who watched the video bought at a much higher rate, yet few watched it.

We therefore placed the video higher above the fold, used an arrow to draw attention, and inserted a sub-headline:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML373cd9dc.PNG

A million-dollar question we ask buyers is:

“Was there any reason you ALMOST DID NOT buy?”

Devised by Cambridge-educated Dr. Karl Blanks, who coined the term “conversion rate optimization” in 2006, this question earned him a knighthood from the Queen of England! Thanks, Sir Karl!

It’s a great question because its answer is usually the reason many others didn’t buy. For every person who almost didn’t buy for reason X, I guarantee at least three others did not buy!

Given the low response rates when surveying non-converting visitors, this question helps get additional intelligence. In our case, price came up again.

*Sometimes the customers’ reasons for buying will surprise you. One of our past clients is in the e-cigarette/vaping business and a common reason cited by men for vaping was “to quit smoking because of my young daughter.” They almost never said “child” or “son”! Armed with this knowledge, we converted a whole new segment of smokers who had not considered vaping.

Speed testimonials

One of the most frequently asked questions was “How soon can I expect relief?” While Protalus addressed this in their Q&A section, we included conspicuous “speed testimonials” on the main page:

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML37a1de17.PNG

For someone in excruciating pain, the promise of fast relief is persuasive!

Patent protection exclusivity & social proof

C:\Users\jkuri\AppData\Local\Temp\SNAGHTML37494993.PNG

Many of Protalus’ site visitors are older and still prefer to buy in physical stores, as we learned from our survey. They may like the product, but then think “I’ll buy them at the store.” We clarified that the product is only available on Protalus’ site.

Mentioning the patent-protection added exclusivity, one of the two required elements for a compelling value proposition.

At its core, landing page optimization isn’t about optimizing pages. A page just happens to be the medium used to optimize thought sequences in the prospect’s mind.

Dr. Flint likes to say, “The geography of the page determines the chronology of thought sequences in the prospect’s mind.” As shown above, we repeated the social proof elements at the point of purchase.

Tying it all together

After systematically addressing each objection and adding various appeal elements, we strung them all in the cohesive long-form page below.

We start with a powerful headline and Elisabeth’s story because it’s both intriguing and relevant to Protalus’ audience, which skews female and over 55. The only goal of a headline is to get visitors to read what comes next — NOT to sell.

The product’s price is not mentioned until we have told a compelling story, educated visitors and engaged them emotionally.

Note that the winning page is several times longer than the control. There is a mistaken belief that you “just need to get to the point” because people won’t read long pages. In fact, a previous consultant told Protalus that their sales were low because the “buy button” wasn’t high enough on the page. 🙂

Nothing could be further from the truth. For a high-priced product, you must articulate a compelling value proposition before you sell!

But also note the page is “as long as necessary, but as short as possible.” Buy buttons are sprinkled liberally after the initial third of the page so that those who are convinced needn’t “sit through the entire presentation.”


Acknowledgement

We’d like to thank team Protalus for giving us wide latitude to conduct bold experiments and for allowing us to publish this. Their entrepreneurial culture has been refreshing. We are most grateful to Don Vasquez, their forward-thinking CMO (and minority owner), for trusting the process and standing by us when the first test caused some revenue loss.

Thanks to Hayk Saakian, Nick Jordan, Yin-so Chen, and Jon Powell for reading drafts of this piece.


Free CRO audit

I can’t stress this enough: CRO is hard work. We spent countless hours on market research, studied visitor behavior, and reviewed tens of thousands of customer comments before we ran a single A/B test. We also solicited additional testimonials from industry experts and doctors. There is no magical silver bullet — just lots of little lead ones!

Results like this don’t happen by accident. If you are unhappy with your current conversion rate for sales, leads or app downloads, first, we encourage you to review the tried-and-true strategic formula. Next, we would like to offer Moz readers a free CRO audit. We’ll also throw in a free SEO (Search Engine Optimization) review. While we specialize in CRO, we’ve partnered with one of the best SEO firms due to client demand. Lastly, we are hiring. Review the roles and reasons why you should come work for us!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Understanding and Harnessing the Flow of Link Equity to Maximize SEO Ranking Opportunity – Whiteboard Friday

Posted by randfish

How does the flow of link equity work these days, and how can you harness its potential to help improve your rankings? Whether you’re in need of a refresher or you’ve always wanted a firmer grasp of the concept, this week’s Whiteboard Friday is required watching. Rand covers the basic principles of link equity, outlines common flow issues your site might be encountering, and provides a series of action items to ensure your site is riding the right currents.

Link equity flow

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat about understanding and harnessing link equity flow, primarily internal link equity flow, so that you can get better rankings and execute on your SEO. A big thank you to William Chou, @WChouWMX on Twitter, for suggesting this topic. If you have a topic or something that you would like to see on Whiteboard Friday, tweet at me. We’ll add it to the list.

Principles of link equity

So some principles of link equity first to be aware of before we dive into some examples.

1. External links generally give more ranking value and potential ranking boosts than internal links.

That is not to say, though, that internal links provide no link equity, and in fact, many pages that earn few or no external links can still rank well if a domain itself is well linked to and that page is on that site and has links from other good, important pages on the domain. But if a page is orphaned or if a domain has no links at all, extremely difficult to rank.

2. Well-linked-to pages, both internal and external, pass more link equity than those that are poorly linked to.

I think this makes intuitive sense to all of us who have understood the concept of PageRank over the years. Basically, if a page accrues many links, especially from other important pages, that page’s ability to pass its link equity to other pages, to give a boost in ranking ability is stronger than if a page is very poorly linked to or not linked to at all.

3. Pages with fewer links tend to pass more equity to their targets than pages with more links.

Again, going off the old concept of PageRank, if you have a page with hundreds or thousands of links on it, each of those receives a much more fractional, smaller amount of the link equity that could be passed to it than if you have a page with only a few links on it. This is not universally… well, I just want to say this doesn’t scale perfectly. So it’s not the case that if you were to trim down your high link earning pages to having only one link and point to this particular page on your site, then you suddenly get tremendously more benefit than if you had your normal navigation on that page and you link to your homepage and About page and products page. That’s not really the case. But if you had a page that had hundreds of links in a row and you instead made that page have only a few links to the most important, most valuable places, you’ll get more equity out of that, more rank boosting ability.

4. Hacks and tricks like “nofollow” are often ineffective at shaping the flow of link equity.

Using rel=”no follow” or embedding a remotely executable JavaScript file that makes it so that browsers can see the links and visitors can, but Google is unlikely to see or follow those links, to shape the flow of your link equity is generally (a) a poor use of your time, because it doesn’t affect things that much. The old-school PageRank algorithm not that hugely important anymore. And (b) Google is often pretty good at interpreting and discounting these things. So it tends to not be worth your time at all.

5. Redirects and canonicalization lose a small amount of link equity. Non-ideal ones like 302s, JS redirects, etc. may lose more than 301, rel=canonical, etc.

So if I have a 301 or a rel=canonical from one page to another, those will lose or cost you a small, a very small amount of link equity. But more potentially costly would be using non-ideal types of redirects or canonicalization methods, like a JavaScript-based redirect or a 302 or a 307 instead of a 301. If you’re going to do a redirect or if you’re going to do canonicalization, 301s or rel=canonicals are the way to go.

So keeping in mind these principles, let’s talk through three of the most common link equity flow issues that we see websites facing.

Common link equity flow issues

A. A few pages on a large site get all the external links:

You have a relatively large site, let’s say thousands to tens of thousands, maybe even hundreds of thousands of pages, and only a few of those pages are earning any substantial quantity of external links. I have highlighted those in pink. So these pages are pointing to these pink ones. But on this website you have other pages, pages like these purple ones, where you essentially are wanting to earn link equity, because you know that you need to rank for these terms and pages that these purple ones are targeting, but they’re not getting the external links that these pink pages are. In these cases, it’s important to try a few things.
  1. We want to identify the most important non-link earning pages, these purple ones. We’ve got to figure out what these actually are. What are the pages that you wish would rank that are not yet ranking for their terms and phrases that they’re targeting?
  2. We want to optimize our internal links from these pink pages to these purple ones. So in an ideal world, we would say, “Aha, these pages are very strong. They’ve earned a lot of link equity.” You could use Open Site Explorer and look at Top Pages, or Ahrefs or any of our other competitors and look at your pages, the ones that have earned the most links and the most link equity. Then you could say, “Hey, can I find some relevance between these two or some user stories where someone who reaches this page needs something over here, and thus I’m going to create a link to and from there?” That’s a great way to pass equity.
  3. Retrofitting and republishing. So what I mean by this is essentially I’m going to take these pages, these purple ones that I want to be earning links, that are not doing well yet, and consider reworking their content, taking the lessons that I have learned from the pink pages, the ones that have earned link equity, that have earned external links and saying, “What did these guys do right that we haven’t done right on these guys, and what could we do to fix that situation?” Then I’m going to republish and restart a marketing, a link building campaign to try and get those links.

B. Only the homepage of a smaller site gets any external links.

This time we’re dealing with a small site, a very, very small site, 5 pages, 10 pages, maybe even up to 50 pages, but generally a very small site. Often a lot of small businesses, a lot of local businesses have this type of presence, and only the homepage gets any link equity at all. So what do we do in those cases? There’s not a whole lot to spread around. The homepage can only link to so many places. We have to serve users first. If we don’t, we’re definitely going to fall in the search engine rankings.

So in this case, where the pink link earner is the homepage, there are two things we can do:
  1. Make sure that the homepage is targeting and serves the most critical keyword targets. So we have some keyword targets that we know we want to go after. If there’s one phrase in particular that’s very important, rather than having the homepage target our brand, we could consider having the homepage target that specific query. Many times small businesses and small websites will make this mistake where they say, “Oh, our most important keyword, we’ll make that this page. We’ll try and rank it. We’ll link to it from the homepage.” That is generally not nearly as effective as making a homepage target that searcher intent. If it can fit with the user journey as well, that’s one of the best ways you can go.
  2. Consider some new pages for content, like essentially saying, “Hey, I recognize that these other pages, maybe they’re About and my Terms of Service and some of my products and services and whatnot, and they’re just not that link-worthy. They don’t deserve links. They’re not the type of pages that would naturally earn links.” So we might need to consider what are two or three types of pages or pages that we could produce, pieces of content that could earn those links, and think about it this way. You know who the people who are already linking to you are. It’s these folks. I have just made up some domains here. But the folks who are already linking to your homepage, those are likely to be the kinds of people who will link to your internal pages as well. So I would think about them as link targets and say, “What would I be pretty confident that they would link to, if only they knew that it existed on our website?” That’s going to give you a lot of success. Then I would check out some of our link building sections here on Whiteboard Friday and across the Moz Blog for more tips.

C. Mid-long tail KW-targeting pages are hidden or minimized by the site’s nav/IA.

So this is essentially where I have a large site, and I have pages that are targeting keywords that don’t get a ton of volume, but they’re still important. They could really boost the value that we get from our website, because they’re hyper-targeted to good customers for us. In this case, one of the challenges is they’re hidden by your information architecture. So your top-level navigation and maybe even your secondary-level navigation just doesn’t link to them. So they’re just buried deep down in the website, under a whole bunch of other stuff. In these cases, there are some really good solutions.

  1. Find semantic and user intent relationships. So semantic is these words appeared on those pages. Let’s say one of these pages here is targeting the word “toothpaste,” for example, and I find that, oh, you know what, this page over here, which is well linked to in our navigation, mentions the word “toothpaste,” but it doesn’t link over here yet. I’m going to go create those links. That’s a semantic relationship. A user intent relationship would be, hey, this page over here talks about oral health. Well, oral health and toothpaste are actually pretty relevant. Let me make sure that I can create that user journey, because I know that people who’ve read about oral health on our website probably also later want to read about toothpaste, at least some of them. So let’s make that relationship also happen between those two pages. That would be a user intent type of relationship. You’re going find those between your highly linked to external pages and your well-linked-to internal pages and these long tail pages that you’re trying to target. Then you’re going to create those new links.
  2. Try and leverage the top-level category pages that you already have. If you have a top-level navigation and it links to whatever it is — home, products, services, About Us, Contact, the usual types of things — it’s those pages that are extremely well linked to already internally where you can add in content links to those long-tail pages and potentially benefit.
  3. Consider new top-level or second-level pages. If you’re having trouble adding them to these pages, they already have too many links, there’s no user story that make good sense here, it’s too weird to jam them in, maybe engineering or your web dev team thinks that that’s ridiculous to try and jam those in there, consider creating new top-level pages. So essentially saying, “Hey, I want to add a page to our top-level navigation that is called whatever it is, Additional Resources or Resources for the Curious or whatever.” In this case in my oral health and dentistry example, potentially I want an oral health page that is linked to from the top-level navigation. Then you get to use that new top-level page to link down and flow the link equity to all these different pages that you care about and currently are getting buried in your navigation system.
All right, everyone. Hope you’ve enjoyed this edition of Whiteboard Friday. Give us your tips in the comments for how you’ve seen link equity flow, the benefits or drawbacks that you’ve seen to try and controlling and optimizing that flow. We’ll see again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Pros and Cons of HTTPS Services: Traditional vs Let’s Encrypt vs Cloudflare

Posted by jrridley

If you have a website property verified in Google Search Console, and the website is not HTTPS-secured, you’ve likely seen some form of the following message in your dashboard recently:

After months of talk and speculation, Google has finally started to move forward with its plan to secure the web by enforcing HTTPS. Although HTTPS had previously only been a concern for e-commerce sites or sites with login functionality, this latest update affects significantly more sites. The vast majority of websites have a contact page (or something similar) that contains a contact or subscription form. Those forms almost always contain text input fields like the ones Google warns about in the message above. The “NOT SECURE” warning has already been appearing on insecure sites that collect payment information or passwords. It looks like this in a user’s URL bar:

Now that this warning will be displaying for a much larger percentage of the web, webmasters can’t put off an HTTPS implementation any longer. Unfortunately, Google’s advice to webmasters for solving this problem is about as vague and unhelpful as you might imagine:

Thanks, Google.

Implementing HTTPS is not a simple process. The Washington Post published a blog post outlining their 10-month HTTPS migration back in 2015, and numerous sites (including Moz) have reported experiencing major traffic fluctuations following their migrations. The time and resources required to migrate to HTTPS are no minor investment; we’re talking about a substantial website overhaul. In spite of these obstacles, Google has shown little sympathy for the plight of webmasters:

@rchtjn Well, turning the website off saves money too.
— John ☆.o(≧▽≦)o.☆ (@JohnMu) December 18, 2015

Google’s singular focus in this area is to provide a better user experience to web visitors by improving Internet security. On its surface, there’s nothing wrong with this movement. However, Google’s blatant disregard for the complexities this creates for webmasters leaves a less-than-pleasant taste in my mouth, despite their good intentions.

Luckily, there’s a bit of a silver lining to these HTTPS concerns. Over the last few years, we’ve worked with a number of different clients to implement HTTPS on their sites using a variety of different methods. Each experience was unique and presented its own set of challenges and obstacles. In a previous post, I wrote about the steps to take before, during, and after a migration based on our experience. In this post, my focus is instead on highlighting the pros and cons of various HTTPS services, including non-traditional implementations.

Here are the three methods we’ve worked with for our clients:

  1. Traditional HTTPS implementation
  2. Let’s Encrypt
  3. Cloudflare


Method 1: Traditional HTTPS implementation

A traditional HTTPS implementation starts with purchasing an SSL certificate from a trusted provider, like Digicert or GeoTrust (hint: if a site selling SSL certificates is not HTTPS-secured, don’t buy from them!). After that, you’ll need to verify the certificate with the Certificate Authority you purchased it from through a Certificate Signing Request (CSR); this just proves that you do manage the site you claim to be managing. At this point, your SSL certificate will be validated, but you’ll still have to implement it across your site. Namecheap has a great article about installing SSL certificates depending on your server type. Once that SSL certificate has been installed, your site will be secured, and you can take additional steps to enable HSTS or forced HTTPS rewrites at this point.

Pros

  1. Complete security. With a fully validated SSL certificate installed on your root server, there is no possibility of having a compromised connection between your server and site, or between your site and the site visitor.
  2. Customizable. One of the features of a full SSL implementation is that you can purchase an Extended Validation (EV) SSL certificate. This not only provides your green padlock in the browser bar, but also includes your company name to provide further assurance to visitors that your site is safe and secure.
  3. Easier to implement across multiple subdomains. If you have multiple subdomains, what you’ll likely need for your HTTPS implementation is either a separate SSL certificate for each subdomain or a wildcard certificate for all variations of your domain. A traditional SSL service is often the easiest way to set up a wildcard certificate if you need to secure several variations.

Cons

  1. Expensive. Though basic SSL certificates may be available for as little as $150, depending on the complexity of your site, these costs can quickly increase to several thousand dollars if you need more advanced security features, a better CDN network, etc. This also doesn’t include the cost of having developers implement the SSL certificate, which can be extensive as well.
  2. Time to implement. As mentioned above, it took the Washington Post 10 months to complete their HTTPS migration. Other companies have reported similar timeframes, especially for larger, more complex websites. It’s very hard to know in advance what kinds of issues you’ll have to resolve with your site configuration, what kinds of mixed content you may run into, etc., so plan lots of extra time to address these issues if you go with a standard implementation.


Method 2: Let’s Encrypt

Let’s Encrypt is a free nonprofit service provided by the Internet Security Research Group to promote web security by providing free SSL certificates. Implementing Let’s Encrypt is very similar to a traditional HTTPS implementation: You still need to validate the Certificate Authority, install the SSL certificate on your server, then enable HSTS or Forced HTTPS rewrites. However, implementing Let’s Encrypt is often much simpler through the help of services like Certbot, which will provide the implementation code needed for your particular software and server configuration.

Pros

  1. Free. The cost is zero, zippo, nada. No fine print or hidden details.
  2. Ease of implementation. Let’s Encrypt SSL is often much simpler to implement on your site than a traditional HTTPS implementation. Although not quite as simple as Cloudflare (see below), this ease of implementation can solve a lot of technical hurdles for people looking to install an SSL certificate.
  3. Complete security. Like with a traditional HTTPS implementation, the entire connection between site visitor and site server is secure, leaving no possibility of a compromised connection.

Cons

  1. Compatibility issues. Let’s Encrypt is known to be incompatible with a few different platforms, though the ones it is incompatible with are not likely to be a major source of traffic to your site (Blackberry, Nintendo 3DS, etc.).
  2. 90-day certificates. While traditional SSL certificates are often valid for a year or more, Let’s Encrypt certificates are only valid for 90 days, and they recommend renewing every 60 days. Forgetting to renew your certificate with this necessary frequency could put your site in a compromising situation.
  3. Limited customization. Let’s Encrypt will only offer Domain Validation certificates, meaning that you can’t purchase a certificate to get that EV green bar SSL certificate. Also, Let’s Encrypt does not currently offer wildcard certificates to secure all of your subdomains, though they’ve announced this will be rolling out in January 2018.

Method 3: Cloudflare

This is one of my favorite HTTPS implementations, simply because of how easy it is to enable. Cloudflare offers a Flexible SSL service, which removes almost all of the hassle of implementing an SSL certificate directly on your site. Instead, Cloudflare will host a cached version of your site on their servers and secure the connection to the site visitors through their own SSL protection. You can see what this looks like in the picture below:

In doing so, Cloudflare makes this process about as simple as you can ask for. All you have to do is update your DNS records to point to Cloudflare’s nameservers. Boom, done. And as with Let’s Encrypt, the process is entirely free.

Pros

  1. Free. The cost is zero, zippo, nada. No fine print or hidden details. Cloudflare does offer more advanced features if you upgrade to one of their paid plans, but the base SSL service comes completely free.
  2. Easiest implementation. As I mentioned above, all that’s required for implementing Cloudflare’s SSL service is creating an account and updating your DNS records. There’s no update to the server configuration and no time spent trying to resolve additional configuration issues. Additionally, implementing HSTS and forced HTTPS rewrites can be done directly through the Cloudflare dashboard, so there’s really almost no work involved on your end.
  3. PageSpeed optimizations. In addition to SSL security, Cloudflare’s HTTPS implementation also provides several additional services that can preserve PageSpeed scores and page load times. While a traditional HTTPS implementation (or Let’s Encrypt) can often have negative consequences for your site’s page load times, Cloudflare offers the ability to auto-minify JS, CSS, and HTML; Accelerated Mobile Pages (AMP); and a Rocket loader for faster JS load times. All of these features (along with Cloudflare serving a cached version of your site to visitors) will help prevent any increase in page load times on your site.

Cons

  1. Incomplete encryption. As you can see in the picture above, Cloudflare encrypts the connection between the visitor and the cached version of your site on Cloudflare, but it doesn’t encrypt the connection between your site and your server. While this means that site visitors can feel secure while visiting your site, there is still the chance that your server connection will be compromised. While you can upgrade to a full SSL implementation that does enable this setup, that is not part of the free service.
  2. Security concerns. Cloudflare was infamously hacked earlier this year, exposing lots of sensitive user information. While it appears they have resolved and tightened security since then, it’s still important to be aware of this development.
  3. Lack of customization. Like with Let’s Encrypt, Cloudflare’s free SSL service doesn’t provide any kind of EV green bar SSL for your site. While you can upgrade to full SSL which does provide this functionality, the service is no longer free at that point.

Which type of HTTPS implementation is best?

It really depends on your site. Smaller sites who just need enough security that Google won’t punish the site in Chrome can likely use Cloudflare. The same goes for agencies providing HTTPS recommendations to clients where you don’t have development control of the site. On the other hand, major e-commerce or publication sites are going to want a fully customized HTTPS implementation through traditional means (or via Let’s Encrypt’s wildcard certificate, when that happens next year). Ultimately, you’ll have to decide which implementation makes the most sense for your situation.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →

Announcing 5 NEW Feature Upgrades to Moz Pro’s Site Crawl, Including Pixel-Length Title Data

Posted by Dr-Pete

While Moz is hard at work on some major new product features (we’re hoping for two more big launches in 2017), we’re also working hard to iterate on recent advances. I’m happy to announce that, based on your thoughtful feedback, and our own ever-growing wish lists, we’ve recently launched five upgrades to Site Crawl.

1. Mark Issues as Fixed

It’s fine to ignore issues that don’t matter to your site or business, but many of you asked for a way to audit fixes or just let us know that you’ve made a fix prior to our next data update. So, from any issues page, you can now select items and “Mark as fixed” (screens below edited for content).

Fixed items will immediately be highlighted and, like Ignored issues, can be easily restored…

Unlike the “Ignore” feature, we’ll also monitor these issues for you and warn you if they reappear. In a perfect world, you’d fix an issue once and be done, but we all know that real web development just doesn’t work out that way.

2. View/Ignore/Fix More Issues

When we launched the “Ignore” feature, many of you were very happy (it was, frankly, long overdue), until you realized you could only ignore issues in chunks of 25 at a time. We have heard you loud and clear (seriously, Carl, stop calling) and have taken two steps. First, you can now view, ignore, and fix issues 100 at a time. This is the default – no action or extra clicks required.

3. Ignore Issues by Type

Second, you can now ignore entire issue types. Let’s say, for example, that Moz.com intentionally has 33,000 Meta Noindex tags (for example). We really don’t need to be reminded of that every week. So, once we make sure none of those are unintentional, we can go to the top of the issue page and click “Ignore Issue Type”:

Look for this in the upper-right of any individual issue page. Just like individual issues, you can easily track all of your ignored issues and start paying attention to them again at any time. We just want to help you clear out the noise so that you can focus on what really matters to you.

4. Pixel-length Title Data

For years now, we’ve known that Google cut display titles by pixel length. We’ve provided research on this subject and have built our popular title tag checker around pixel length, but providing this data at product scale proved to be challenging. I’m happy to say that we’ve finally overcome those challenges, and “Pixel Length” has replaced Character Length in our title tag diagnostics.

Google currently uses a 600-pixel container, but you may notice that you receive warnings below that length. Due to making space to add the “…” and other considerations, our research has shown that the true cut-off point that Google uses is closer to 570 pixels. Site Crawl reflects our latest research on the subject.

As with other issues, you can export the full data to CSV, to sort and filter as desired:

Looks like we’ve got some work to do when it comes to brevity. Long title tags aren’t always a bad thing, but this data will help you much better understand how and when Google may be cutting off your display titles in SERPs and decide whether you want to address it in specific cases.

5. Full Issue List Export

When we rebuilt Site Crawl, we were thrilled to provide data and exports on all pages crawled. Unfortunately, we took away the export of all issues (choosing to divide those up into major issue types). Some of you had clearly come to rely on the all issues export, and so we’ve re-added that functionality. You can find it next to “All Issues” on the main “Site Crawl Overview” page:

We hope you’ll try out all of the new features and report back as we continue to improve on our Site Crawl engine and UI over the coming year. We’d love to hear what’s working for you and what kind of results you’re seeing as you fix your most pressing technical SEO issues.

Find and fix site issues now


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Continue reading →