Data Lake Security

Last modified: October 15, 2019

Prior to having a Data Lake, all analysis had to be done by working with data in many different places. These tools required individual logins which are hard to track and maintain the appropriate levels of access.

Data Lakes provide a way to centralize access to data and make it simpler to manage permissions.

Data coming in to the Data Lake is likely not cleaned yet so there may still be sensitive information coming through. Don’t Extract or Load sensitive data/columns, this is configurable within the ELT tools. Also be wary of who you grant access to the Data Lake since it gives much more access to all of the data than giving access to a particular source.

Access in central place

Remove access from individual tools, move all access to Data Lake. This will cut down on tickets requesting access and mishaps where people retain access to information they shouldn’t have.

Permission tiers

Now that you can easily grant anyone access to data, you will have more people accessing data. While this is advantageous for analysis and exploration you do need to be mindful of what schemas and tables they can see.

Start out by creating two users groups on the Data Lake as follows:

  1. For admin and engineers
  2. For analysts and business users

You can prevent the second group from accessing sensitive data in the Data Lake by having that group not have access to the schemas or tables with that type of data.

In general set broad controls so it is easy to manage. As you grow in sophistication and turn your Data Lake into a Data Warehouse and Data Mart you can create more refined permissions settings.

Written by: Matt David
Reviewed by: Dave Fowler

Next – ETL vs ELT

Get new data chapters sent right to your Inbox