I’m going to expose to you a phenomenon that’s fairly common when split testing, but no one seems to be talking about it (other than veteran split testers) and I don’t think it’s ever been blogged about (please add a comment if I’m wrong).
It has to do with the question:
“Will the lift I see during my split test continue over time”?
Let’s start by looking at a scenario commonly used by practically everyone in the business of split testing.
Your web site currently is currently generating $400k a month is sales which has been steady for the past few months. You hire a conversion optimization company, which does a split test on your checkout page.
After running the test for 3-4 weeks, the challenger version provides a 10% lift in conversion and RPV at a 99% statistical confidence level. The conversion rate company turns off the test and you hard code the winning challenger.
First of all – Wooohoo!!! (Seriously, that’s an excellent win.)
A 10% lift from $400k a month is an extra $40k a month. Annualized that amounts to an extra $480k a year. So your potential increased yearly revenue from using the winning checkout page is almost half a million dollars. Sounds pretty good to me.
Here’s the problem.
All things being equal, by using the winning version of the checkout page and not your old checkout page, there is a good chance you won’t be making an extra $480k in the next 12 months.
Don’t get me wrong. You will indeed be making more money with the winning checkout page than with the old one, but in all likelihood, it will be less than simply annualizing the lift from during the test itself.
The culprit is what I like to call “Test Fatigue” (a term I think I just coined).
Here’s what often happens if instead of stopping your split test after 3-4 weeks you could let it run for an entire year. There is a phenomenon that I’ve often, but not always seen with very long running split tests; after a while (this might be 3 weeks or 3 months) the performance of the winning version and the control (original) version start to converge.
They usually won’t totally converge, but that 10% lift which was going strong for a while with full statistical confidence is now a 9% lift or an 8% lift or a 5% lift or maybe even less.
As I mentioned before this doesn’t always happen and the time frame can change, but this is a very real phenomenon.
Why is does this happen?
Please read my next posting – Why Test Fatigue Happens where I provide some explanations on why this happens.
Also, I’d love to hear if you have also seen this phenomenon with your own tests and what your personal theories are as to why it happens.