Syntax Conventions

Last modified: September 08, 2020

Syntax conventions allows people to easily scan through your query and understand what metrics are being measured and analyzed.

All Caps SQL Commands

Show example queries with ALL CAPS, First Letter Caps, and lowercase SQL commands.

For readability, all SQL commands should be written in uppercase letters. This allows the reader to identify the keywords in the SQL statement and easily determine what the query is executing. Avoid writing first letter caps and lowercase queries as they are difficult to scan quickly and separate into different clauses.

Select 
    created_at, 
    gender, 
    age_at_registration, 
    province
from 
    users

SELECT 
	created_at, 
	gender, 
	age_at_registration, 
	province
FROM 
	users

New Lines and Indenting

Lay out each column or logical statement on new lines so that it easy to identify what is being queried and filtered. When they are placed in the same line as the SQL command they may not even fit within your SQL editor window. Keep Aliases on the same line to make it clear what they are referring to.

Keep the SQL keywords on lines of their own and indent the other parts of the query. This makes it easy to read, edits and spot any unexpected errors.

SELECT u.created_at, u.age_at_registration,u.gender,AVG(p.amount) AS average_user_payment_amount
FROM users AS u
INNER JOIN payments AS p ON u.user_id = p.user_id
WHERE p.has_refund = FALSE

SELECT
	u.created_at,
	u.age_at_registration,
	u.gender,
	AVG(p.amount) AS average_user_payment_amount
FROM
	users AS u
	INNER JOIN payments AS p ON u.user_id = p.user_id
WHERE
	p.has_refund = FALSE

When writing simple queries for up to 3 three (3) columns, you can write the columns in line with the SQL commands.

SELECT created_at, age_at_registration, current_age
FROM customers

GROUP BY Numbers and Names

GROUP BY allows you to separate data into groups, which can be aggregated independently of one another. It is generally recommended to use column names to be clear as to what you are grouping by but when you are grouping many columns or long titled columns you may substitute numbers that correspond with the select statement order.

SELECT
	date_of_customer_registration,
	number_of_user_logins,
	number_of_user_payments,
	COUNT(*) AS total_user_engagement
FROM
	user_metrics
GROUP BY
	date_of_customer_registration,
	number_of_user_logins,
	number_of_user_payments
ORDER BY
	total_user_engagement DESC

Due to the length of the column names, rewriting them for the GROUP BY statement can become tedious and time consuming. Instead, the GROUP BY could number the columns in order (1, 2, 3).

✅ ?

SELECT
	date_of_customer_registration,
	number_of_user_logins,
	number_of_user_payments,
	COUNT(*) AS total_user_engagement
FROM
	user_metrics
GROUP BY
	1, 2, 3
ORDER BY
	total_user_engagement DES

At a quick glance the numbers become difficult to determine the metrics under analyses. Another risk with this method is if you update any column in the SELECT statement, and you used numbers in the GROUP BY it will not produce any errors so you may assume nothing is wrong. The numbers reference whatever columns are in the SELECT. You will need to check to see if the numbers are still corresponding to the columns you are expecting in the SELECT statement to have an accurate query.

Happy syntax-ing!

Written by: Katherine Chiodo
Reviewed by: Matt David

Next – Third Normal Form

Get new data chapters sent right to your Inbox