Multicolumn Indexes

Last modified: July 19, 2019

Multicolumn indexes (also known as composite indexes) are similar to standard indexes. They both store a sorted “table” of pointers to the main table. Multicolumn indexes however can store additional sorted pointers to other columns.

Standard indexes on a column can lead to substantial decreases in query execution times as shown in this article on optimizing queries. Multi-column indexes can achieve even greater decreases in query time due to its ability to move through the data quicker.

Syntax

CREATE INDEX [index name]
ON [Table name]([column1, column2, column3,...]);

Multicolumn indexes can:

What is a multicolumn index?

Multicolumn indexes are indexes that store data on up to 32 columns. When creating a multicolumn index, the column order is very important. This is due to the structure that multicolumn indexes possess. Multicolumn indexes are structured to have a hierarchical structure. Take for example this table:

Table that will be used as an example

A traditional index on this table would look like this: Image of a Traditional Index on the table

The index points back to the table and is sorted by year. Adding a second column to the index looks like this: Image of a two column index

Now the index has pointers to a secondary reference table that is sorted by make. Adding a third column to the index causes the index to look like this: Image of a three column index

In a three column index we can see that the main index stores pointers to both the original table and the reference table on make, which in turn has pointers to the reference table on model.

When the multicolumn index is accessed, the main portion of the index (the index on the first column) is accessed first. Each entry in the main index has a reference to the row‘s location in the main table. The main index also has a pointer to the secondary index where the related make is stored. The secondary index in term has a pointer to the tertiary index. Because of this pointer ordering, in order to access the secondary index, it has to be done through the main index. This means that this multicolumn index can be used for queries that filter by just year, year and make, or year, make, and model. However, the multicolumn index cannot be used for queries just on the make or model of the car because the pointers are inaccessible.

Multicolumn indexes work similarly to traditional indexes. You can see in the gifs below how using a multicolumn index compares to using both a sequential table scan and a traditional index scan for the following query:

SELECT * FROM myTable
WHERE year=2017
AND make='ACURA'
AND model='TL';

Table Scan

gif of a table scan

  • Scans every row for correct entry or entries

Traditional Index

gif of a traditional index scan

  • Can filter out wrong years using the index, but must scan all rows with the proper year.

Multicolumn Index

gif of a multicolumn index scan

  • Can filter by all 3 columns allowing for much fewer steps on large data sets

From these gifs you can see how multicolumn indexes work and how they could be useful, especially on large data sets for improving query speeds and optimizing.

Performance

Multicolumn indexes are so useful because, when looking at the performance of a normal index versus a multicolumn index, there is little to no difference when sorting by just the first column. For an example look at the following query plans: image comparing query plans and run times between traditional and multicolumn index scans on a single filter query

These two query plans show that there is little to no difference in the execution time between the standard and multicolumn indexes.

Multicolumn indexes are very useful, however, when filtering by multiple columns. This can be seen by the following:

Create standard index:

CREATE INDEX standard_index_vehicle_year ON traffic_data(year);

Create multicolumn index:

CREATE INDEX mult_col_idx_vehicle ON traffic_data(year, make, model);

Query run in Images:

EXPLAIN ANALYZE SELECT * FROM traffic_data
WHERE year='2001' AND make='CHEVROLET' AND model='TAHOE';

Comparison between all 3 scans on a query with three filters

The table above shows the execution times of each index on the given query. It shows clearly that, in the right situation a multicolumn index can be exactly what is needed.

Summary

  • Multicolumn indexes:
    • Can use b-tree, BRIN, GiST, and GIN structures
    • Can be made on up to 32 columns
    • Can be used for Partial Indexing
  • Perform comparably to traditional indexes on their single column
  • Perform much better once additional Columns are added to the query.
  • Column order is very important.
    • The second column in a multicolumn index can never be accessed without accessing the first column as well.

References

http://www.postgresqltutorial.com/postgresql-indexes/postgresql-multicolumn-indexes/

https://medium.com/pgmustard/multi-column-indexes-4d17bac764c5

https://www.bennadel.com/blog/3467-the-not-so-dark-art-of-designing-database-indexes-reflections-from-an-average-software-engineer.htm

Written by: Matthew Layne
Reviewed by: Blake Barnhill , Matt David

Next – Start Modeling Data

Get new data chapters sent right to your Inbox