Multicolumn indexes (also known as composite indexes) are similar to standard indexes. They both store a sorted “table” of pointers to the main table. Multicolumn indexes however can store additional sorted pointers to other columns.
Standard indexes on a column can lead to substantial decreases in query execution times as shown in this article on optimizing queries. Multi-column indexes can achieve even greater decreases in query time due to its ability to move through the data quicker.
CREATE INDEX [index name] ON [Table name]([column1, column2, column3,...]);
Multicolumn indexes can:
- be created on up to 32 columns
- be used for partial indexing
- only use: b-tree, GIN, BRIN, and GiST structures
What is a multicolumn index?
Multicolumn indexes are indexes that store data on up to 32 columns. When creating a multicolumn index, the column order is very important. This is due to the structure that multicolumn indexes possess. Multicolumn indexes are structured to have a hierarchical structure. Take for example this table:
A traditional index on this table would look like this:
The index points back to the table and is sorted by year. Adding a second column to the index looks like this:
Now the index has pointers to a secondary reference table that is sorted by make. Adding a third column to the index causes the index to look like this:
In a three column index we can see that the main index stores pointers to both the original table and the reference table on make, which in turn has pointers to the reference table on model.
When the multicolumn index is accessed, the main portion of the index (the index on the first column) is accessed first. Each entry in the main index has a reference to the row‘s location in the main table. The main index also has a pointer to the secondary index where the related make is stored. The secondary index in term has a pointer to the tertiary index. Because of this pointer ordering, in order to access the secondary index, it has to be done through the main index. This means that this multicolumn index can be used for queries that filter by just year, year and make, or year, make, and model. However, the multicolumn index cannot be used for queries just on the make or model of the car because the pointers are inaccessible.
Multicolumn indexes work similarly to traditional indexes. You can see in the gifs below how using a multicolumn index compares to using both a sequential table scan and a traditional index scan for the following query:
SELECT * FROM myTable WHERE year=2017 AND make='ACURA' AND model='TL';
- Scans every row for correct entry or entries
- Can filter out wrong years using the index, but must scan all rows with the proper year.
- Can filter by all 3 columns allowing for much fewer steps on large data sets
From these gifs you can see how multicolumn indexes work and how they could be useful, especially on large data sets for improving query speeds and optimizing.
Multicolumn indexes are so useful because, when looking at the performance of a normal index versus a multicolumn index, there is little to no difference when sorting by just the first column. For an example look at the following query plans:
These two query plans show that there is little to no difference in the execution time between the standard and multicolumn indexes.
Multicolumn indexes are very useful, however, when filtering by multiple columns. This can be seen by the following:
Create standard index:
CREATE INDEX standard_index_vehicle_year ON traffic_data(year);
Create multicolumn index:
CREATE INDEX mult_col_idx_vehicle ON traffic_data(year, make, model);
Query run in Images:
EXPLAIN ANALYZE SELECT * FROM traffic_data WHERE year='2001' AND make='CHEVROLET' AND model='TAHOE';
The table above shows the execution times of each index on the given query. It shows clearly that, in the right situation a multicolumn index can be exactly what is needed.
- Multicolumn indexes:
- Can use b-tree, BRIN, GiST, and GIN structures
- Can be made on up to 32 columns
- Can be used for Partial Indexing
- Perform comparably to traditional indexes on their single column
- Perform much better once additional Columns are added to the query.
- Column order is very important.
- The second column in a multicolumn index can never be accessed without accessing the first column as well.