Selection Bias

Last modified: July 10, 2019

Selection Bias is the tendency to analyze groups that are not representative of the population you are interested in measuring. There are two types of selection bias:

  • Selection
  • Self Selection

Selection

You choose a group of people to analyze in a way that is not representative of the population. These groups are usually chosen because of convenience.

An example graphic where a small group wants to face left but are the only ones represented

Example:

You ask three of your friends if your new feature is valuable. While easy to ask them, are they actually representative of your customer base?

Common Biased Selections:

  • Engaged customers
  • Latest cohort of users
  • People in a particular geographic region

Self-Selection

The group of people who opt in to be analyzed have characteristics that are not representative of the whole population.

Problem of self Selection

Example:

You send out a survey to all of your customers to gauge their satisfaction with your product. While this seems like it would provide for good feedback, you are likely to get responses from people who are very opinionated, very angry, or people who are trying to waste time at work.

Common Biased Self Selectors:

  • Very negative people
  • Very positive
  • Early adopters
  • Power users

Selection Bias in Business

Let’s say you want to introduce a premium feature in your BI Tool. You send out an email to the most active users asking them if they are interested in trying it out. Several people respond to the email and you begin giving them access to the feature.

This seems rational, they are the most engaged, they deserve a sneak peek and they might have great insight about the feature.

Why might this selection of people cause our analysis of the feature to be wrong?

  • They might try every new feature regardless of if it provides value to them.
  • They might see this as a way to get in touch with someone at the company.
  • They might want to share their ideas for features.

While these motivations aren’t necessarily bad, their feedback can be misleading.

How to fix:

Be deliberate about reaching out to a representative sample of people to test new features. Use qualifying questions to understand more about them and to give you an opportunity to select a balanced sample.

If you do send out a large email and most of the people who respond are early adopter types you may want to aggregate their feedback down to what portion of your customers they represent. Then you can weigh their feedback more evenly with the few typical customers who tried out the tool.

Summary

  • Selection Bias
    • Make sure the group of people you test something on is representative of the population you want to impact. Do this by randomizing your sample.
  • Self Selection Bias
    • Make sure the people who voluntarily participate in something you are analyzing are representative of the population you want to analyze. Do this by having qualifying questions.

Written by: Matt David
Reviewed by: Mike Yi , Matthew Layne , Blake Barnhill

Next – Survivorship Bias

Get new data chapters sent right to your Inbox