Fetching the raw data is nice and all, but now we’re going to start actually doing some aggregations and transformations to it! The first and probably most commonly used aggregation function we are going to learn is COUNT. The COUNT function takes whatever you give it and returns the count of how many there are.
The following SQL will count how many albums are in our database. Put another way, we’re going to query for a count of the number of rows are in the ablums table. Play around yourself and find how many are in the artists and tracks tables as well.
COUNT of specific columns
As the COUNT goes over the data results it only increments its tally if the data is not NULL (NULL means empty in SQL jargon). Doing a
COUNT(*) will always return the full count of the number of rows that exist in the table as the splat (
*) represents each column and there’s no way that all the columns in the row would be NULL. If you specify a specific column however, you’re returning the COUNT of the number of rows where that column is not NULL. So doing a count of the composer column:
Will be less than the COUNT(*)
because the composer column has some NULL values (aka. it’s empty sometimes).
A commonly used clause with the count function is DISTINCT. The DISTINCT clause changes the count to only tally the number of unique values in the data. Above we fetched how many tracks had composers listed. If we actually wanted to see how many unique composers were in our tracks table we could use the COUNT with the DISTINCT clause as shown here:
Can you modify the query above to find how many different genre_ids are the tracks table?
A quick aside here: Notice that the column headers on the above datasets weren’t all that clear. SQL does a okay job of finding a name for what you’re fetching but often, especially as we start making more complex functions, you’ll want to use your own alias for the data. You can do so with the AS key word following your selections:
Be sure to use double quotes (“) around your Aliases as double quotes are used for column titles.
The following is a list of the most commonly used functions in SQL. They work similar to COUNT but perform different calculations on the data.
|MAX||returns the largest (maximum) number in a sets|
|COUNT||returns a count of the # of values in a set|
|COUNT DISTINCT||returns a count of the # of unique (distinct) values in a set|
|EVERY||returns true if all data inside is true (same as bool_and)|
|AVG||returns the average (mean) of the set of numbers|
|SUM||returns the sum of all the values in the set|
The following example gives the range and average prices of the tracks using the MIN, MAX and AVG functions.
Can you modify the above query to return how much it would cost to buy one of every track in the database?
We only covered the most commonly used aggregation functions here. If you’d like to see more checkout the full list of PostgreSQL Functions