How SQL COUNT Aggregation Works

This is the most common type of Aggregation in SQL, it counts the number of rows in a column or table. COUNT(*) tells SQL to count the number of rows of the whole table. COUNT(some column) tells SQL to count the number of non-null rows in that column.

SELECT COUNT(*)
FROM facebook

SQL goes through all the rows of the table and creates a table with total count of rows.

With more complex data, SQL can answer more complex questions such as how many rows are there per state.

The COUNT aggregation can answer this, but SQL needs to be told what to group by in the query. SQL also needs the column it is grouping by in the SELECT statement so that the final table can show the data in its proper groups.


SELECT State, COUNT(*)
FROM facebook
GROUP BY State

First SQL will group together data by the column or columns listed in the group by statement in this case state. Then SQL will perform the same COUNT operation as before but check which group to assign the count to.

The data and question can be even more complicated, in this example there is a city column in the table as well. The question could be how many rows are there for each State and City.

The COUNT aggregation will get us our answer as long as it is grouped by state and city. Remember SQL needs what we are grouping by in the SELECT statement as well so that these groups will show in the final table.


SELECT State, City, COUNT(*)
FROM facebook
GROUP BY State, City

SUM and AVERAGE Aggregation

After COUNT, for all other types of aggregation will need you to define the column you are aggregating.

WrongRight
SELECT SUM(*)
FROM facebook
SELECT SUM(# of friends)
FROM facebook

How would it SUM all the columns in the facebook table? The name column has text and cannot summed. Also there are multiple columns so we cannot SUM the row as a whole like the way we could COUNT the row as a whole. SQL needs to know which column the aggregation should be on, in this case the only numeric column is # of friends.

The Average aggregation operates similarly to SUM. The data will be more complex for this example.


SELECT State, AVG(# of friends)
From facebook
Group by State

To get an average SQL needs the sum of each group and then divide it by the count of rows in each group.

NULLs

There are a few scenarios to be aware of when aggregating data that may make you misinterpret the results of your query.

Some cells will not have a value in it. This type of cell is considered null. Null is different than 0 or space “ ”.

[Null]0

The COUNT(*) aggregation will count all rows including Null values. However the COUNT(some column) will count all rows without Null values. Since other aggregations require you specify a column they will exclude Nulls in their calculations.

COUNT(*) counts Nulls                                   COUNT(# of friends) does not count Nulls

Matt David

About Matt David

Hi! I'm Matt David. I have over 7 years industry experience using data and currently work at Udacity as Product Lead for the School of Data Science. Data has become a prerequisite skill set for more and more non-data jobs and I am passionate about making data concepts more easily understood to increase data literacy for everybody.