Syntax Conventions
Last modified: July 28, 2020
Syntax conventions allows people to easily scan through your query and understand what metrics are being measured and analyzed.
All Caps SQL Commands
Show example queries with ALL CAPS, First Letter Caps, and lowercase SQL commands.
For readability, all SQL commands should be written in uppercase letters. This allows the reader to identify the keywords in the SQL statement and easily determine what the query is executing. Avoid writing first letter caps and lowercase queries as they are difficult to scan quickly and separate into different clauses.
❌
Select
created_at,
gender,
age_at_registration,
province
from
users
✅
SELECT
created_at,
gender,
age_at_registration,
province
FROM
users
New Lines and Indenting
Lay out each column or logical statement on new lines so that it easy to identify what is being queried and filtered. When they are placed in the same line as the SQL command they may not even fit within your SQL editor window. Keep Aliases on the same line to make it clear what they are referring to.
Keep the SQL keywords on lines of their own and indent the other parts of the query. This makes it easy to read, edits and spot any unexpected errors.
❌
SELECT u.created_at, u.age_at_registration,u.gender,AVG(p.amount) AS average_user_payment_amount
FROM users AS u
INNER JOIN payments AS p ON u.user_id = p.user_id
WHERE p.has_refund = FALSE
✅
SELECT
u.created_at,
u.age_at_registration,
u.gender,
AVG(p.amount) AS average_user_payment_amount
FROM
users AS u
INNER JOIN payments AS p ON u.user_id = p.user_id
WHERE
p.has_refund = FALSE
When writing simple queries for up to 3 three (3) columns, you can write the columns in line with the SQL commands.
✅
SELECT created_at, age_at_registration, current_age
FROM customers
GROUP BY Numbers and Names
GROUP BY allows you to separate data into groups, which can be aggregated independently of one another. It is generally recommended to use column names to be clear as to what you are grouping by but when you are grouping many columns or long titled columns you may substitute numbers that correspond with the select statement order.
✅
SELECT
date_of_customer_registration,
number_of_user_logins,
number_of_user_payments,
COUNT(*) AS total_user_engagement
FROM
user_metrics
GROUP BY
date_of_customer_registration,
number_of_user_logins,
number_of_user_payments
ORDER BY
total_user_engagement DESC
Due to the length of the column names, rewriting them for the GROUP BY statement can become tedious and time consuming. Instead, the GROUP BY could number the columns in order (1, 2, 3).
✅ ?
SELECT
date_of_customer_registration,
number_of_user_logins,
number_of_user_payments,
COUNT(*) AS total_user_engagement
FROM
user_metrics
GROUP BY
1, 2, 3
ORDER BY
total_user_engagement DES
At a quick glance the numbers become difficult to determine the metrics under analyses. Another risk with this method is if you update any column in the SELECT statement, and you used numbers in the GROUP BY it will not produce any errors so you may assume nothing is wrong. The numbers reference whatever columns are in the SELECT. You will need to check to see if the numbers are still corresponding to the columns you are expecting in the SELECT statement to have an accurate query.
Happy syntax-ing!
Written by:
Katherine Chiodo
Reviewed by:
Matt David