What to Look Out for When Reading Data

We all know that data can empower your decision-making with insights, giving you the confidence that you’re making the right decision. Businesses are shifting to collecting more and more data to uncover new business opportunities, predict future trends, and monitor team performances for bottlenecks or issues. However, having more data doesn’t automatically improve your insights. There are several common biases and fallacies that even expert data practitioners fall for.

What is Cognitive Bias/Fallacy

Things are not always as they appear. Our mind is susceptible to cognitive biases. Cognitive bias is an error in reasoning and evaluating information which affects the decisions and judgments people make. These errors are analogous to cognitive ‘illusions’ that affect lots of people. They occur from how the human brain categorizes certain information and takes shortcuts to jump to a conclusion. This leads to irrational or incorrect judgment for decision making.

As a data practitioner, you need to be aware of the different ways biases can manifest. Let’s dive into some biases that you need to remember when reading data.

List of Biases that Affect Readers of Data

Availability Heuristic

The availability heuristic is a shortcut that your mind takes due to lack of time or resources when making decisions. This is influenced by how easy it is to recall some things. Faced with the need for an immediate decision, availability heuristic allows people to quickly arrive at a conclusion. Such conclusions could come from previous experiences that vividly that stands out in your mind. Availability heuristic could also be affected by recent events as they tend to be fresh on your mind.

How it can affect your decision: When faced with a choice, we often lack the time or resources to investigate in greater depth. Recent events or news could ‘explain a trend in the data’ where there isn’t any and lead you to jump to an incorrect conclusion.

How to overcome: Be aware when your mind is taking shortcuts. Take your time to analyze the data before jumping to a conclusion.

Base Rate Fallacy

Also known as base rate neglect, this fallacy occurs when the mind places less importance on a base rate of probability.

Take this example: Isabelle is a Broadway fan and loves to paint in her own time. She grew up in a family of musicians and has a love for Tchaikovsky. With this story in mind, which of the following is more likely for Isabelle?

  • Isabelle is a professional ballet dancer.
  • Isabelle is a nurse.

Many will jump to conclude that Isabelle is more likely a dancer, due to her love of art. However, there are only about 10,000 professional dancers employed in the United States. In contrast, there are over 3 million nurses in the US. In this example, there is a tendency of connecting the similarity between her characteristics and her profession, without accounting for the underlying statistics.

How it can affect your decision: As shown in the example above, failing to consider the underlying statistics can bring up a more unlikely case.

How to overcome: Where possible, have the corresponding numbers ready for calculation to arrive at more accurate judgements. In the example above, having the numbers of orchestras and nurses readily available in a dashboard can combat base rate neglect. If they are not available, talk to your data visualization creator to add the base rates.

False Causality/Correlation vs. Causation Error

This is when you falsely assume that because two events appear related, one must have caused the other. When reading data, you need to be careful about the differences between correlation and causation. Correlation is how similar two data points are, while causation is how one variable affects another. Correlation does not automatically imply causation, as illustrated in the image below. This fallacy is similar to the logical fallacy, post hoc ergo propter hoc (after this, therefore because of this) sometimes referred to as the post hoc fallacy.

In the chart, you see that as the number of ice cream sold increases, so does the number of shark attacks. Before you decide that selling ice cream results in more shark attacks, keep in mind that these variables are closely correlated, but one is not necessarily causing the other. One explanation for the correlation might be that the summer months mean more people buy ice cream, and more people are on the beach, which increases the risk of shark attacks.

How it can affect your decision: You may unknowingly attribute particular variables as causing certain effects on other variables, leading to erroneous decisions.

How to overcome: As seen from the ice cream example above, when reading charts and dashboards, first identify which variables are closely correlated. Then, explore other ways that could explain a scenario that causes these different variables to be correlated.

McNamara Fallacy, Danger of Summary Metrics

The McNamara Fallacy is relying solely on metrics in complex situations and losing sight of the bigger picture. This is also known as quantitative fallacy.

How it can affect your decision: The summary on your dashboard might show all things are great and lull you to a sense of safety, where in actual fact there might be other factors creeping up that are not currently measured in the dashboard. For example, you notice an increase in total revenue, but your company might not actually be growing. This can appear when extra revenue was just due to some late accounts paying their arrears. You might attribute the increase as a trend and not a one-off event.

How to overcome: Be aware that summary metrics, though powerful, are not showing the whole picture. Be sure to occasionally poke into the data and talk to people in your company to make sure you get the real story.

Negativity Bias

Negative information stands out more than positive information. For example, going to a restaurant with great food, atmosphere, easy parking, etc. could be overturned by a waiter messing up the order. The negative news sticks out much more than the other news.

How it can affect your decision: Negativity bias can accentuate the feeling of things not going well and cause you to make irrational decisions that are geared towards negative outcomes.

How to overcome: As with other biases, being aware of this bias goes a long way. Write out the good and bad in a list side-by-side to compare how things are.

Gambler’s Fallacy

Mistakenly believing that because something has happened more frequently than usual, it’s now less likely to happen in future (and vice versa).

For example, say you’ve flipped a fair coin (even weight on both sides) five times and heads come up for all those times. You might feel like the coin is due for a flip to the tails side. In reality, the next flip is still equally likely to be heads or tails, or 50/50.

Source: http://factsongambling.com/gamblers-fallacy/

How it can affect your decision: This fallacy can lead to poor decisions based on past events using your ‘gut feeling’ that ‘feels right’. However, as we’ve shown in the example above, future odds should be calculated on their own, not based on past odds.

How to overcome: We’re hardwired to recognize patterns, so be aware that this fallacy affects us all. Calculate the odds of each event if they are independent events. Also, make sure that your ‘gut feeling’ is not inhibiting you from rational evaluation.

Automation Bias

Automation bias is an over-reliance on automated aids and decision support systems. This bias often happens when an algorithm is programmed based on past data that is biased. Biased data are systemic or human biases that affect the integrity of the data. One example is selection bias, where a human could cherry-pick particular data points. To know more about how data generated could be biased, have a look at our tutorial “What to look out for when generating data visualizations”.

Biased data exists in many industries and systems, including health care systems, recruitment strategies, insurance quotes, etc. When biased data is input into an algorithm, biased data will also be the output, resulting in a vicious recycling effect of biased data. These intelligent systems proceed to learn human prejudices from the biased data.

How it can affect your decision: Over-reliance on the algorithms puts a sense of complacency in your thinking and may lull you into thinking less critically. This would mean your decisions are vulnerable to errors in the algorithm.

How to overcome: Knowing that computers and algorithms are fallible, regularly have your team to double-check if there are issues in the algorithm or the data.

Survivorship Bias

Drawing conclusions from an incomplete set of data because that data has ‘survived’ some selection criteria.

Source: https://xkcd.com/1827/

How it can affect your decision: “Survived” data can skew your perception of how risky (or unrisky) a particular outcome is, which may result in unnecessarily choosing precarious decisions. Survivorship bias also kicks in when looking at only competitors that succeed and not learning from those that have gone bankrupt.

How to overcome: Understand the context of the data so you can have multiple data inputs, both of the survivors and those that have ‘lost’.

Hindsight Bias

This is bias where you think you ‘could have’ predicted the correct outcome after the fact.

How it can affect your decision: You might over attribute some specific variables that you have credited to causing the outcome thanks to your hindsight bias. This might leave you overconfident with a poor decision.

How to overcome: Be aware of the hindsight bias as the world might be more complex than you make it up to be.

Wrap up

Now that you know some of the common biases, you can be aware of the situations they can manifest and take appropriate measures against making irrational decisions. For those who create data visualizations and reports, have a look at our article “What to look out for when creating data visualizations”.

Resources

Pronin, E., Gilovich, T. & Ross, L. (2004). Objectivity in the Eye of the Beholder: Divergent Perceptions of Bias in Self versus Others. Psychological Review, 111(3): 781-99.

Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus and Giroux.

Angwin, J; Larson; J. Mattu, S. (2016, May 23). Machine Bias. From ProPublica.

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

http://nobaproject.com/modules/judgment-and-decision-making

https://www.investopedia.com/terms/b/base-rate-fallacy.asp

http://www.dangreller.com/jumping-to-conclusions-base-rate-neglect/

https://www.fs.blog/2011/08/mental-model-availability-bias/

https://www.psychologytoday.com/us/articles/200306/our-brains-negative-bias

http://factsongambling.com/gamblers-fallacy/

Jonathan Kurniawan

About Jonathan Kurniawan

Hi! I'm Jonathan Kurniawan. I have 4 years of experience working as a software engineer at Dolby on various different products. I'm currently pursuing my MBA from Hult International Business School and received my Bachelor in Computer Science from University of New South Wales, Australia. I'm excited to share my knowledge at the Data School by Chartio.