What is a Data Mart?
A Data Mart is a filtered (and sometimes aggregated) subsection of a Data Warehouse to make it easier for a particular group to query data. It provides a smaller schema with only the relevant tables for the group.
This stage is right for you if:
- You want to get democratized and enable others in your company to explore and understand data themselves
- You’re prepared to teach and enable business users in your company - hopefully using the many resources of The Data School
- You have projects that require different formats of the source of truth for easier use
- Having truly informed employees is important to your company’s competitive success
You’ve outgrown this stage if:
- You can’t really! You can make any number of marts, and even put leveling in your marts if you’d like. Implementing this stage will result in a complete, well architected and governed stack that will continually evolve and support your informed competitive company.
Five reasons to build a Data Mart
- Relevance to use cases. Limiting the schema to the tables that you need allow you to parse the schema easily.
- Accessible to a variety of people and teams. Data marts allow you to expose more people to data without overwhelming them.
- Customized architecture for different use cases. Aggregations, metric calculations, and PII can all be handled individually for teams.
- Maintainable with less time and effort. Having the data monitored by team leads makes it easier to identify data issues.
- Separated levels of data access. Easily protect sensitive data by limiting what teams can see in their data marts.
This section of the Data Governance book will explain why you should create data marts, and how to implement them so that you get all the benefits they can deliver your business. Before we dive in further let’s look at the data issues you are facing with a Data Warehouse.
The Problem with Data Warehouses
As an organization scales the amount of data it is tracking, the number of people who want to access it scale too. This results in more people with less context about a large portion of the schema.
We want to go from a complex schema:
To a siloed schema, where each department has the data they need:
So while going from Lake to Warehouse was mostly about cleaning up tables, going from Warehouse to Marts is about cleaning up schemas. Different departments need different parts of the Data Warehouse schema.
How Data Marts are different from Data Warehouses
Use modeling to create separate schemas where the tables are provided to the appropriate team or individual. These will be your company’s Data Marts.
The table structures should be the same, as the data should have been cleaned at the Data Warehouse stage. Data Marts are not very different from your Data Warehouse since the heavy lifting was already done. Data Marts make it easier for people within departments to navigate the schema and provide extra insight of the data for that department.
Data Marts make your data:
- Relevant to your job and use cases
- Accessible to a variety of people and teams
- Customized architecture for different use cases
- Maintainable with team leads
- Separated to protect sensitive information
A Data Lake is a pile of products in your building.
A Data Warehouse is those products sorted, shelved, and tagged.
A Data Mart is those products shipped out to relevant stores for sale.