What is a Data Mart?
A data mart is a filtered (and sometimes aggregated) subsection of a data warehouse to make it easier for a particular group to query data. It provides a smaller schema with only relevant tables to query.
Five reasons to build a Data Mart
- Relevance to use cases. Limiting the schema to the tables that you need allow you to parse the schema easily.
- Accessible to a variety of people and teams. Data marts allow you to expose more people to data without overwhelming them.
- Customized architecture for different use cases. Aggregations, metric calculations, and PII can all be handled individually for teams.
- Maintainable with less time and effort. Having the data monitored by team leads makes it easier to identify data issues.
- Separated levels of data access. Easily protect sensitive data by limiting what teams can see in their data marts.
This section of the Data Governance book will explain why you should create data marts, and how to implement them so that you get all the benefits it can deliver your business. Before we dive in deep let’s look at the data issues you are facing with a data warehouse.
The Problem with Data Warehouses
As an organization scales the amount of data they are tracking and the number of people who want to access it scale too. This results in more people with less context about more and more of the schema.
We want to go from a complex schema:
To a siloed schema, where each department has the data they need:
So while going from Lake to Warehouse was mostly about cleaning up tables, going from warehouse to marts is about cleaning up schemas. Different departments need different parts of the data warehouse schema.
How Data Marts are different than Data Warehouses
Use modeling to create separate schemas where all of the tables are provided to the right team or individual. These will be your company’s Data Marts. Structure wise they should be the same, the data should have been cleaned at the data warehouse stage.
Data Marts are not very different than your data warehouse, the heavy lifting was already done. Data Marts make it a bit easier for people within departments to navigate the schema and provide extra oversight of the data for that department.
Data Mart makes your data:
- Relevance to your job and use cases
- Accessible to a variety of people and teams
- Customized architecture for different use cases
- Maintainable with team leads
- Separated to protect sensitive information
Data Lake data is pile of products in your building
Data Warehouse is those products sorted, shelved, and tagged.
Data Mart is those products shipped out to relevant stores for sale