Test Fatigue – Why it Happens

First of all super thanks to all of the great comments on my previous post about Test Fatigue. If you didn’t read my previous post or you don’t know what I mean by Test Fatigue, then please go ahead and read it now. I’ll wait.

Now, to the point – why do we often see the lift from a challenger in a split test decrease after it seems to be going strong and steady?

Statistical significance is for the winner, not the lift.
First and foremost, most split testing tools (I’ve only used Test&Target and Google Website Optimizer extensively) will provide a confidence level for your results. If the control has a conversion rate of 4% and the challenger a conversion rate of 6% (a 50% lift) with a 97% confidence level, the tool is NOT telling you that there is a 97% chance that there will be a 50% lift. The confidence level is referring to the confidence that the the challenger will outperform the control.

You don’t have enough data and there are many variables outside of your control.
We tend to think that in a split test all variables other than the visitor being presented with the control vs. the challenger are identical. In reality there are many external variables outside of our control, some of which we aren’t even aware of. All things being equal, we often see fluctuations in conversion rates even when we don’t make any changes in our site. Meta Brown provided some excellent points in her comments in my previous post.

Results aren’t always reproducible. Learn to live with it.
Lisa Seaman pointed out an excellent article from the New Yorker magazine about this very same phenomenon in other sciences. This is a must read for anyone doing any type of testing in any field. Read it. Now: The Truth Wears Off

What was especially eye opening for me was this part of the article (on page 5). Here is a shortened version of it:

In the late nineteen-nineties, John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of.

The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.

The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise.

So there you have it. While I know you really want a silver bullet that will make your positive results always stay the same, reality isn’t so simple.

They say that conversion optimization is part art and part science, but I think we have to accept that it’s also part noise :)

Ophir

Test Fatigue – Conversion Optimization’s Dirty Little Secret

I’m going to expose to you a phenomenon that’s fairly common when split testing, but no one seems to be talking about it (other than veteran split testers) and I don’t think it’s ever been blogged about (please add a comment if I’m wrong).

It has to do with the question:
“Will the lift I see during my split test continue over time”?

Let’s start by looking at a scenario commonly used by practically everyone in the business of split testing.

Your web site currently is currently generating \$400k a month is sales which has been steady for the past few months. You hire a conversion optimization company, which does a split test on your checkout page.

After running the test for 3-4 weeks, the challenger version provides a 10% lift in conversion and RPV at a 99% statistical confidence level. The conversion rate company turns off the test and you hard code the winning challenger.

First of all – Wooohoo!!! (Seriously, that’s an excellent win.)

A 10% lift from \$400k a month is an extra \$40k a month. Annualized that amounts to an extra \$480k a year. So your potential increased yearly revenue from using the winning checkout page is almost half a million dollars. Sounds pretty good to me.

Here’s the problem.

All things being equal, by using the winning version of the checkout page and not your old checkout page, there is a good chance you won’t be making an extra \$480k in the next 12 months.

Don’t get me wrong. You will indeed be making more money with the winning checkout page than with the old one, but in all likelihood, it will be less than simply annualizing the lift from during the test itself.

The culprit is what I like to call “Test Fatigue” (a term I think I just coined).

Here’s what often happens if instead of stopping your split test after 3-4 weeks you could let it run for an entire year. There is a phenomenon that I’ve often, but not always seen with very long running split tests; after a while (this might be 3 weeks or 3 months) the performance of the winning version and the control (original) version start to converge.

They usually won’t totally converge, but that 10% lift which was going strong for a while with full statistical confidence is now a 9% lift or an 8% lift or a 5% lift or maybe even less.

As I mentioned before this doesn’t always happen and the time frame can change, but this is a very real phenomenon.

Why is does this happen?

Please read my next posting – Why Test Fatigue Happens where I provide some explanations on why this happens.

Also, I’d love to hear if you have also seen this phenomenon with your own tests and what your personal theories are as to why it happens.

Thanks
Ophir

New Job at Adobe

Just a quick note to announce that I am now an Optimization Manager at Adobe.

What does that mean to you?

While analytics and optimization go hand in hand, going forward the emphasis will be more on the optimization side of things.

Also, now that I’m primarily using Test&Target as a split testing tool, I will be able to pass on any cool tips or ideas that I come up with on Test&Target.

If you have any Test&Target related question, feel free to ask me directly on this blog.

– Ophir

3 Easy Ways to Improve Your Conversion Rates

I’ve read my share of articles on “101 things to test to improve conversion rates”.

While most of the suggestions are usually sound, I find that these lists are often overwhelming and you don’t know where to start.

So here’s how to start with a a simple but often overlooked problem –  your links / link visibility.

Specifically, do your links look like links?  Do visitors know what will happen after they click on a link?

This goes back to one of my main mantras in conversion rate optimization – Don’t make me think.

Visitors don’t read web pages, they skim. And when skimming, you should make these two points very obvious:

1. What elements on a page are a link?
2. What will happen when I click on that link?

While the answers to the above questions are obvious to you – the site creator, they aren’t always obvious to a first time site visitor.

Here’s how you can actually fix any issues your links might have.

First of all, print out your homepage (or other page you want to test). Take the printout to someone who has never seen your site before, if possible, someone who is similar to your target audience.

Now ask them to circle the links on the page with a pen or highlighter. For extra credit, use two pens. A blue one for elements they’re pretty sure are a link and a red one for elements they think are a link but aren’t sure.

This alone should unveil any major issues where visitors aren’t sure what actions they can take on page.

Next, ask them to mark any links where they aren’t 100% sure what will happen once they click on the link.

For example, a link labeled “HOT” might be confusing where “Most Popular Items” would not be.

Lastly, people know a link is a link based on two different criteria.

1. What it says
2. What it looks like

When viewing a page, what a link looks like will be the first thing a visitor notices. Is it a different color? Does it have an underline? etc.

In order to make sure visitors can find links based purely on what they look like, we’ll use the “Greek Link Test”. The idea is to translate all of a page’s text to Greek and then see if people know what’s a link and what isn’t.

First thing is to go to Google Translate – http://translate.google.com/ choose English to Greek and enter the URL of your page.

For example, here’s what my blog looks like in Greek: http://goo.gl/EKwya

Now print the page (now in Greek) and do the same exercise as before. Ask someone who is not familiar with the site to mark all of the links on the page.

What’s Next?

Now that you’ve identified problematic links on your page, you have one of two possibilities.

Your best option is to actually split test problematic links with ones that look more like a link. This will tell you conclusively the effect of improving link visibility, it will look like you’re getting instant likes on Instagram. The first metrics you should look at are bounce rate (or exit rate), page views per visit and time on site per visit. You should also look at the conversion rates for your site’s main goals, but it will probably take longer to get statistically significant data.

Please note that if time on page goes down, this is NOT a bad thing. Sometimes increasing link visibility makes it easier for visitors to find what they’re looking for and they stay less time on a page.

Even if you can’t split test the links, I would still suggest trying to improve them by making them visually stand out more or improve the link text itself. Then repeat the above exercises and see if there is any improvement.

What are your thoughts?

Guest Post on Cause and Effect

I just wrote a guest post about cause, effect and split testing (and a bit about measuring the value of content).

Enjoy
Ophir

Welcome to Analytics Impact

First of all thanks for visiting Analytics Impact. It’s my job to make sure you have a pleasant stay and get some real value from your visit.

If you have any specific questions, please feel free to ask.

I’ve been posting online since 1996 (the term blog didn’t exist then) and have been blogging about web analytics, conversion rate optimization, SEO, SEM and other fun stuff since 2005, though I was using my personal blog.

I decided it was time to separate my personal ramblings from my professional insights, hence this blog was born.