The Scatter Plot Chart

What is a Scatter Plot Chart?

A scatter plot is a data visualization that is used to compare two different quantitative variables. Scatter plots are basic depictions of data, but also convey a great amount of information. These plots also have a number of extensions that produce even further analysis of your data. A scatter plot is depicted on a cartesian plane (x and y axis). One variable has values which lie on the x-axis, and the other on the y-axis. In some cases, the variable on the x-axis is referred to as the explanatory variable, and variable on the y-axis is referred to as the response variable.

Additionally, a third dimension of data can be added to a scatter plot through the color of the data points. This is usually a categorical variable which puts the data points in separate categories. This helps with further analysis by comparing different categories. For example, imagine a scatter plot which depicts the petal width and sepal width of flowers. We could make data points different colors to correspond to the different species of flower.

When to Use a Scatter Plot Chart

The primary use case for a scatter plot is to determine the relationship between two variables. When reading a scatter plot, we want to deduce if the variable on one axis has any connection with the variable on the other axis.

Also, when data points on a scatter plot are broken down into categories (shown as different colored points) we can use the scatter plot to determine if each category forms a “cluster” and are therefore related to each other.

Using the information above, let’s take a look at an example of a scatter plot we built using Chartio:

Chart made using Chartio

In this example, we are observing the petal width and petal length of multiple Iris flower species. We notice that the petal length is plotted on the x-axis and the petal width on the y-axis. Also, each species is denoted by a different color, and we notice that there is a distinct clustering between the different Iris species. We can also determine that there is a positive correlation between petal length and width. In other words, as the petal length increases, the petal width also increases.

When NOT to Use a Scatter Plot Chart

Determining when to use a scatter plot is usually simple, but there are some areas where a scatter plot is not the ideal visualization choice. First, if there is not a clear relationship between the two variables, then a scatter plot has no use. While it can be useful to see that no relationship exists between the variables, there is no point in keeping that visualization on hand. Second, having a scatter plot with too many data points can be messy and likely difficult to determine if there is a relationship. In this case, it would be useful to try and filter your data by some other dimension and try again to see if a relationship exists.

Let’s take a look at an example where the scatter plot chart is NOT a useful visualization:

Chart made using Chartio

In this example, we’re interested in looking at heights and weights for a sample of men across the United States. Ultimately we’re trying to answer this question: Is a person’s weight related to their height? The x-axis represents weight, measured in pounds, and the y-axis represents height, measured in centimeters.

So why is a scatter plot chart a poor choice of visualization for this data? Well, it’s obvious that there are a lot of data points displayed in this chart– possibly too many. Though there aren’t so many data points that the chart is overcrowded and it’s hard to distinguish between each point, it’s important to note that all of these data points don’t really show a pattern. They’re all so evenly dispersed that it’s difficult to gain information from this chart. So is a person’s weight related to their height? With this visualization, we can assume not, but it’s hard to know for certain.

Comparison of Like Chart Types

The scatter plot is a useful data visualization which compares two quantitative variables. The bubble chart is another type of visualization which can be used to show the relationship between variables. The table below gives the use case and pros and cons of the scatter plot and bubble chart:

Scatter Plot

Bubble Chart

Use

  • Compare two quantitative variables
  • Compare three quantitative variables

Pros

  • Show larger quantities of data
  • Easy to see clustering and correlation
  • Add another dimension (size of bubble) to the scatter plot for more comparison

Cons

  • Useless if no clustering or trend exists
  • Bubble size can make chart crowded

 

 

About Patrick Gibson