Are you making this common split testing mistake?

I was reading a simple case study today.

They were testing two different versions of a banner that was advertising a webinar.
One of the banners had an image of the presenter, while the other did not.
The banner without the image of the presenter won (by over 50%).

One of the comments was something along the lines of:

I guess this audience prefers banners without an image of a person.


If you don’t immediately realize the mistake the commenter made, don’t feel bad. It’s a very common mistake.

Beyond the fact that a specific banner (which did have have an image of the presenter) won over a different specific banner (which did not have an image of the presenter) you really can’t be sure of anything.

The loosing banner might have won with:

  • An image of a different person
  • A different image of the same person
  • The same image of the same person in a different position or size on the banner.
  • The same image of the same person in the same position and size but with different elements on the banner changed.

The point is:

Don’t jump to generalized conclusions based on the outcome of a specific experiment.


Should You Test or Target?

Recently I’ve been hearing more and more online buzz about the benefits of delivering targeted content to your visitors. In simple terms this means a customized message based on information you know about the visitor (opposed to a generic message which all visitors see).

A simple example would be adding a message for international visitors that your site ships to their country. Something more complex would be a 20% discount on ink cartridges for customers that purchased a printer in the past year but have not purchased any ink in the past 90 days (and of course the message would include the name of the printer they already purchased).

Serving up targeted content is indeed a valuable tool which I have used for many of our clients (I work for Adobe), though I invite you to take a step back and look at the greater question:

What content on my website will bring me the best results?

Intuitively it makes sense that targeted content will resonate better with visitors, and ultimately get more sales (or leads, etc).

On the other hand, you can simply test changes on your site which will effect everyone in order to try to improve your conversion rates.

Both are valid methods for optimizing your site and in an ideal world your company would be doing both.

In reality though, you have limited resources to improve your online marketing efforts and you’ll need to prioritize how much targeting you’ll do and how much user experience (common content) testing you’ll do.

Based on my personal experience, most websites still have huge room for improvement by simply optimizing the user experience through split testing. I’ve discussed this with a few other conversion rate professionals who agree. Just look at the case studies out there and you’ll see dozens of examples of how making relatively simple changes to your website can increase conversion rates by double digits.

In other words, you should initially focus on improving the common user experience and then test and test and test and then test some more. Only then does it make the most sense to start targeting (and of course test to see what targeted message performs best).

If you’re site sucks, it will still suck with targeted messaging.

I will add though that some targeting opportunities are very low hanging fruit and I would implement them without even testing. For example any traffic that you are sending to your web site and know what they clicked on to get there (search, display, email, etc) make sure the main message on the landing page is the same as the message they clicked on to get there.

I’d love to hear your targeting success and failures (and I’ll even provide feedback if you want).


What makes a world class conversion optimization organization?

I’ve been thinking about what makes a world class conversion optimization organization for the past couple of days and have come up with what I think are the top 6 criteria.
I wasn’t shooting for 6 but it seems to cover all bases. I’d LOVE to hear your thoughts.

  1. Optimization is embedded in planning, process and corporate culture at all levels.
  2. Optimization efforts are prioritized based on maximum increase of revenue/goals.
  3. Optimization is executed for the entire end-to-end user experience across all lines of business.
  4. Optimization is based on analytical data, previous learnings and best practices.
  5. User experience is targeted to individual visitor or group.
  6. Optimization process itself is efficient (optimized).

In a bit more detail:

1 – Optimization is embedded in planning, process and corporate culture at all levels.
This means two things:
– There is full buy in from the executive team and every employee is on-board and understands that optimization is a commitment, not an add-on.
– All relevant internal processes take into account the opportunity to optimize. Testing is part of the standard process and budget.

2. Optimization efforts are prioritized based on maximum increase of revenue/goals.
What to test (both in terms of where on the site and which page elements) is based on where it makes the most business sense (based on numbers and research), NOT internal politics or personal opinion.

3. Optimization is executed for the entire end-to-end user experience across all site sections.
End-to-end means looking at both off-site (paid search, display, email, etc) and on-site opportunities as well as the making sure the “funnel” starts before they land on your site (ie. how does the messaging in your paid ad match the experience on the landing page).
Across all site sections applies to sites which have multiple competing goals or categories. For example product sales vs. consulting services. The goal is to maximize overall company revenue even if a large lift in one area causes a small decline in another. This also means cross section targeting.

4. Optimization is based on analytical data, non analytical user data (think personas), previous learnings and best practices.
This includes:
– Figuring out where and what to test (what the numbers are telling us)
– Visual site/page review (what is the user experience?)
– What do we know about our visitors (who are they? what makes them tick? what are they truly looking for?)
– What did we learn from previous tests? (Layout X performed better than layout Y on the shirts page).
– Are we just guessing to create challenger experiences or applying best practices (while still keeping an open mind).

5. User experience is targeted to individual visitor or group.
Serving up the same experience to all visitors will only get you so far (even if it’s optimized). I call this “lowest common denominator” optimization. Are you taking advantage of CRM type data (what did they buy in the page) and anonymous data (traffic source, search terms, geo-targeting, visit number, etc).

6. The optimization process itself is efficient (optimized).
It takes a while for the optimization process to go smoothly for all tests. Like anything new it takes a while for all of the parts to be in sync.

I can’t help but thinking a 7th bullet point would make a nicer headline (7 always sounds sexier than 6).
Any thoughts on what to add?

Thanks in advance,

New Features You Need on Apparel Product Pages

A couple of weeks ago I was looking to buy a new spring jacket. While there are plenty of options online, I ultimately made an order based on two features on the product detail page:

  • A video of the product
  • The height and weight of the model as well as the size they are wearing

A couple of examples to see this in action:

The Saks page has the model height and product size on the page:

Saks page with model height and dress size


while the Altec page only has it in the video: product video with model height & weight and product size


I’ve been saying to myself for years that these really should be must-have features for any apparel product detail pages. Just having a picture of the product doesn’t cut it anymore, especially if your competitors are doing it.

This is also an excellent opportunity for clothing manufactures. Creating a video for every product would be cost prohibitive for some smaller online retailers who could use the assets created by the manufacturer.

Are you aware of other examples of providing model height & weight and product size pictured?

More importantly, has anyone tested this? :)

Let me know,

Test Fatigue – Why it Happens

First of all super thanks to all of the great comments on my previous post about Test Fatigue. If you didn’t read my previous post or you don’t know what I mean by Test Fatigue, then please go ahead and read it now. I’ll wait.

Now, to the point – why do we often see the lift from a challenger in a split test decrease after it seems to be going strong and steady?

Statistical significance is for the winner, not the lift.
First and foremost, most split testing tools (I’ve only used Test&Target and Google Website Optimizer extensively) will provide a confidence level for your results. If the control has a conversion rate of 4% and the challenger a conversion rate of 6% (a 50% lift) with a 97% confidence level, the tool is NOT telling you that there is a 97% chance that there will be a 50% lift. The confidence level is referring to the confidence that the the challenger will outperform the control.

You don’t have enough data and there are many variables outside of your control.
We tend to think that in a split test all variables other than the visitor being presented with the control vs. the challenger are identical. In reality there are many external variables outside of our control, some of which we aren’t even aware of. All things being equal, we often see fluctuations in conversion rates even when we don’t make any changes in our site. Meta Brown provided some excellent points in her comments in my previous post.

Results aren’t always reproducible. Learn to live with it.
Lisa Seaman pointed out an excellent article from the New Yorker magazine about this very same phenomenon in other sciences. This is a must read for anyone doing any type of testing in any field. Read it. Now: The Truth Wears Off

What was especially eye opening for me was this part of the article (on page 5). Here is a shortened version of it:

In the late nineteen-nineties, John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of.

The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.

The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise.

So there you have it. While I know you really want a silver bullet that will make your positive results always stay the same, reality isn’t so simple.

They say that conversion optimization is part art and part science, but I think we have to accept that it’s also part noise :)


Test Fatigue – Conversion Optimization’s Dirty Little Secret

I’m going to expose to you a phenomenon that’s fairly common when split testing, but no one seems to be talking about it (other than veteran split testers) and I don’t think it’s ever been blogged about (please add a comment if I’m wrong).

It has to do with the question:
“Will the lift I see during my split test continue over time”?

Let’s start by looking at a scenario commonly used by practically everyone in the business of split testing.

Your web site currently is currently generating $400k a month is sales which has been steady for the past few months. You hire a conversion optimization company, which does a split test on your checkout page.

After running the test for 3-4 weeks, the challenger version provides a 10% lift in conversion and RPV at a 99% statistical confidence level. The conversion rate company turns off the test and you hard code the winning challenger.

First of all – Wooohoo!!! (Seriously, that’s an excellent win.)

A 10% lift from $400k a month is an extra $40k a month. Annualized that amounts to an extra $480k a year. So your potential increased yearly revenue from using the winning checkout page is almost half a million dollars. Sounds pretty good to me.

Here’s the problem.

All things being equal, by using the winning version of the checkout page and not your old checkout page, there is a good chance you won’t be making an extra $480k in the next 12 months.

Don’t get me wrong. You will indeed be making more money with the winning checkout page than with the old one, but in all likelihood, it will be less than simply annualizing the lift from during the test itself.

The culprit is what I like to call “Test Fatigue” (a term I think I just coined).

Here’s what often happens if instead of stopping your split test after 3-4 weeks you could let it run for an entire year. There is a phenomenon that I’ve often, but not always seen with very long running split tests; after a while (this might be 3 weeks or 3 months) the performance of the winning version and the control (original) version start to converge.

They usually won’t totally converge, but that 10% lift which was going strong for a while with full statistical confidence is now a 9% lift or an 8% lift or a 5% lift or maybe even less.

As I mentioned before this doesn’t always happen and the time frame can change, but this is a very real phenomenon.

Why is does this happen?

Please read my next posting – Why Test Fatigue Happens where I provide some explanations on why this happens.

Also, I’d love to hear if you have also seen this phenomenon with your own tests and what your personal theories are as to why it happens.


How can I help you with conversation optimization?

I just realized it’s been almost six months since I last posted on this blog. While I have plenty of ideas for posts, I figured it might be best to ask you – my readers (all three of you) how I can help you. Specifically there are two major ideas I’ve had in my head for a while and I’m debating between which one to write about next.

The first idea is a technical overview of how the web works, going into detail on web analytics and split testing. Everything someone who is not a techie needs to know in order to gain a better understanding of what the data really means from a technical perspective as well the implications on how technical decisions impact business decisions.

The second idea is making conversion rate optimization more of a science and less of an art. I’ve read just about every book out there that deals with site and page optimization. I’ve also conducted countless split tests and have analyzed more sites than I can remember. What I’ve found is that there seems to be a major gap in the process where what to do next and how to do it becomes more of an art and less of a science.

Plenty of smart marketers can see a web page and know intuitively that it won’t convert well. Often it’s even easy to identify specific elements which are “broken” and need to be fixed, but more often than not (at least for me), it’s usually not so simple to explain the internal thought process of converting an OK page into a great one. This is something I’d like to address.

So, my loyal readers, please let me know what I should write about. Even if it’s something other than the two topics I’m thinking about let me know.


The 3 Levels of Conversion Rate Optimization Maturity

If you’re reading this article, I hope you realize that split testing is no longer optional if you want to increase the performance of your web site. So, what is split testing? It’s simply presenting different versions of content to different visitors and measuring which version of your content gets the most desired results. Here are what I consider to be the three levels of conversion rate optimization maturity with an emphasis on split testing.

1 – Lowest Common Denominator

The first level of conversion rate optimization maturity is what I like to call “Lowest Common Denominator Split Testing”. This means treating all of your visitors the same. You simply split test all of your traffic together and see what performs best.

Not so long ago, if you were doing any split testing, you were ahead of the game since most of your competitors weren’t. That’s not the case anymore. I’m willing to bet the vast majority of the top 100 eCommerce sites are already doing some form of split testing.

The problem with lowest common denominator split testing is that your visitors are not all the same. While you’ve found what works best for the group as a whole, you are not taking advantage of obvious differences in terms of why they came to your site and what will get them to take action.This brings us to the second level of maturity.

2 – Segmentation

The second level of conversion rate optimization maturity takes advantage of “standard” information you know about your visitors. For example, where they came from – was it search (organic or paid), direct traffic (they typed in your url directly), a referral (a link from another site) or maybe an internal email. Have they come to your site before (new or returning users). Where are they located? What Browser are they using? etc.

This is the type of information you’ll usually find in a web analytics tool. Most of it is what’s available to you at the time of the visit itself.

This type of targeting is also known as segmentation. Basically, instead of putting everyone in a single bucket, you can now segment your visitors into several buckets. Instead of split testing all of your site traffic together you can measure the difference in behavior for each segment and more importantly serve up different tests to different segments.

The analogy I like to use is that of a sales person at a store who greets someone who just walked into the store. A good salesperson will try to put that visitor into a “segment” such as male, female, age, income, etc and propose products that person will most likely be interested in. If you’re not segmenting, it’s like having a blind and deaf salesperson.

Using segments together with split testing is way better than just split testing on it’s own, but you’re still treating everyone in each segment the same. What if you could actually treat every visitor as an individual? This brings us to the next level.

3 – Profiling

The final level of conversion rate optimization looks at each visitor as an individual. While segmenting on it’s own takes advantage of what you know about a visitor at the time of the visit, profiling also takes advantage of everything a visitor previously did as well as everything all of your other visitors have done.

Profiling gives each visitor a history that you can fully take advantage of. If a visitor bought shoes on their last visit, show them a banner for socks. You can even track what type of shoes they purchased in order to know what type of socks to offer.

Going back to the salesperson analogy, using segmentation on it’s own is like never having the same salesperson at your store. Every visit for every visitor has a different salesperson. Profiling is like having one single super-salesperson that remembers everything every visitor ever did.

Profiling can also be automated. If most people who purchased products A & B also bought product C, then automatically show product C to anyone who purchases products A & B. While profiling on it’s own is very powerful, but the ultimate in optimization is profiling together with split testing.

Netflix and Amazon are two examples of companies that are already doing profiling (and split testing). Wouldn’t you like to be like them?

As always, please leave questions and comments in the comments section.

Thank you

The Future of Split Testing and Conversion Rate Optimization

I’ve been fortunate enough to see and experience first hand the evolution of the Internet, from even before the web till today.

I’ll spare you a lengthy history lesson explaining how we’ve gone from brochureware sites to where we are today, but I do want to share some thoughts and perspective on where I think things are going.

When marketers started to understand the potential of dynamic web sites, there were two terms everyone was throwing around:

Personalization & Customization.

Fast forward to today (2011). The user experience is still exactly the same for all visitors (other than on a handfull of sites such as

For the most part, web site Personalization has failed. Sure it sounds good in theory, but trying to tailor the web site experience at the individual level is extremely difficult. It is difficult both from a technological perspective but mostly by trying to create an optimal user experience based on data from a single individual.

There is no doubt in my mind that in the future (and to some extent today) the user experience when visiting a web site will be created dynamically based on what gets the best results, but based on “anonymous” information which is common to large groups of visitors, and not based on a single person.

This reminds me of the concept of Psychohistory from the science fiction series “Foundation” by Isaac Asimov.
Wikipedia explains it better than I can:

The premise of the series is that mathematician Hari Seldon spent his life developing a branch of mathematics known as psychohistory, a concept of mathematical sociology (analogous to mathematical physics). Using the law of mass action, it can predict the future, but only on a large scale; it is error-prone on a small scale. It works on the principle that the behaviour of a mass of people is predictable if the quantity of this mass is very large. The larger the number, the more predictable is the future.

I also like to think of this in terms of what usually happens at (successful) brick and mortar stores.

When you walk into a store, the salesperson probably doesn’t know you personally, but will probably try to help you based on certain public traits such as gender, age, if you’re by yourself or with someone else, etc.

Which brings me back to what actually prompted me to write this article in the first place :)

While I’ve been split testing since 2005 in order to improve conversion rates, the majority of the time, it’s still about what works best for the site as a whole, opposed to split testing together with segmentation (which is what we really want).

Until recently, there haven’t been many options out there to achieve this level of targeting and testing (at least not priced for small to mid sized businesses) but over the past few months, I’ve been starting to see more and more startups trying to bring this level of sophistication to the masses.

While I haven’t had a chance to use any of these services first hand, there is no doubt in my mind that business that truly embrace this level of targeting and split testing will eventually lead the pack and leave most one-size-fits all web sites in the dust.