What is a scatter plot chart?
A scatter plot is a useful data visualization that is used to compare two different quantitative variables. Scatter plots are basic depictions of data, but also convey a great amount of information. These plots also have a number of extensions that produce an even further analysis of your data.
A scatter plot is depicted on a Cartesian plane. One variable has values which lie on the x-axis, and the other on the y-axis. In some cases, variables on the x-axis are referred to as explanatory variables and variables on the y-axis as response variables. Here is an example of a scatter plot chart.
The scatter plot above shows the petal width and sepal width of iris flowers. Petal Width is across the x-axis and Sepal Width is along the y-axis.
When to use a scatter plot chart
Scatter plot charts compare two quantitative variables. When using a scatter plot we want to see how two variables relate. This is known as correlation. Variables can have high, low, or no correlation. For example, flower petal length and petal width have a high correlation. The longer a flower petal is, the wider it will be.
How to create a scatter plot chart
In Chartio, creating a scatter plot is simple using Interactive Mode. For this example, we will use a basic dataset on Iris plant species.
- Use the Data Sources tab to connect your data source clicking +Add a Data Source. Chartio offers a wide selection of data sources to connect to. In our example, we are using a CSV file. More information on preparing a more advanced CSV file to upload can be found here.
- Use the Explore tab in the upper left-hand corner to begin building our plot.
Now we are in the Data Explorer and ready to build our chart. Our data source is located on left side of the page. The middle of the page is where we build our query to select the data we want. Finally, the left side of the page is our chart preview. Notice there is a tab for Interactive Mode and SQL Mode. In our case, we will use Chartio’s Interactive Mode where we can drag and drop our data fields with no SQL knowledge.
- Choose two Measures in our data table and drag them across to the query builder. For this example, let’s select Petal Length and Petal Width to drag into the measures section of the query builder.
- Ensure that we have the correct Aggregation by clicking each measure.
When we drag our measures into the query builder, Chartio will predict what aggregation to choose based on your data. In our case, we want our data Unaggregated. By clicking each measure, we see the image above. Under the Aggregation tab, select unaggregated. Do this for each measure.
- Select Run Query at the bottom of the query builder section. At this point, nothing has appeared in the chart preview even though we moved our measures into the query builder. We won’t have an output until we run the query.
- Select the Scatter Icon from the chart preview window. The query has run and we have our output. Note that the default chart type for the data we have is a Table Chart. At the bottom of the chart preview is the selection of available charts. On the bottom row, select the Scatter option. Now, we have our scatter plot chart below:
- Edit the chart title, axis labels, and other options as desired. Under the Settings tab above the chart, there are a number of edits we can make to customize the scatter plot chart.
Best practices for creating a scatter plot chart
After observing the relationship between the variables, we can add to our existing scatter plots to include additional insights. Many scatter plots will include a line of best fit, or a trendline, which fits the data to a continuous linear line. The scatter plot below includes its trendline, although we must be careful estimating data with the trendline. Trendlines are rarely perfect and are not safe to extrapolate the data.
Scatter plots can be further manipulated by changing the color, size, and/or shape of the data points. For example, the scatter plot used earlier relates the petal length and width of iris flowers. Suppose we also knew the species of each of the iris flowers. We can group the species by using different colored dots and compare not only the individual samples but the samples as a group. Grouping by species shows a clear clustering of the data which we would not be able to explain with the previous scatter plot.
Other general practices and ways to expand your scatter plots:
- Make sure your axis bounds make sense so you can discover trends. When values for x-axis and y-axis values vary, look to using logarithmic scales to keep data close together.
- Changing the size of the data point can add another measure to analyze.
- Axis labels are critical in order to not confuse the variables.