Building a Data Warehouse: Storage

In our Data School tutorial “Basics of Building a Data Warehouse”, we identify some benefits of using a data warehouse and introduce the basic structure needed to build one. The structure consists of three different components: a storage mechanism, operational software, and human resources. In this tutorial, we’ll break down the storage aspect of a data warehouse.

Storage aspect of building a data warehouse.

Storage is the structural foundation for a data warehouse. Specifically, it’s where your warehouse lives. When it comes to storage, there are two options: an in-house server or the cloud. Either is a feasible option for data warehousing and it all depends on your needs. Let’s further discuss the two options, including the pros, cons, and costs of each one.

In-house server

An in-house server is an internal hardware that’s set up within your office that is, in this case, used for storing data. Using an in-house server for a data warehouse can be beneficial if you already have a sizeable server established for your business. Even if you don’t already have one, establishing in-house storage might be your best option. Let’s take a look at the pros and cons:

  • Pros
    • Control the level of security of your data.
    • Customize it to your business’s needs.
    • Expand as your business grows.
  • Cons
    • Expensive initial hardware investment.
    • Upkeep to the hardware, like updates and renewals, have to be managed by you.
    • Extra resources may be needed to securely manage the server, such as a Systems Manager or Database Architect.

For an in-house server, the initial installation, configuration, constant maintenance and support are all very costly considerations. Below, we’ve provided some examples of in-house servers and the price associated with each:

*Note: The price listed is only for the server hardware — it does not include cost of warranty, installation, maintenance, etc.*

ServerHPE ProLiant ML350 Gen10Dell PowerEdge R630
Features
  • 2.1 GHz
  • 16 GB RAM
  • No HDD
  • Tower
  • 2.1 GHz
  • 8 GB RAM
  • 300 GB HDD
  • Rack-mountable
Price$2,526.99$2,493.99

Cloud server

A cloud server refers to storing your data on an external platform that doesn’t exist physically within your office. Businesses rent space in this external platform from third party vendors depending on the capacity needed for their data and the amount of time they plan on using the service. Using cloud storage may be a feasible option for businesses wanting a faster and more-easily scalable storage solution. Here are the pros and cons:

  • Pros
    • Management and upkeep responsibilities are shifted to third parties.
    • Scalable to fit your growing business.
    • Cheap to start up and no initial hardware investment is required. Instead, replace a traditional capital expense with an operating expense that can be easily budgeted and is less of a commitment.
    • Connectivity allows users to freely share and access data at any time, from anywhere, on any device.
  • Cons
    • Dependent on reliable internet access.
    • Security is out of your control — controlled by the third party service.
    • Limitations on bandwidth for some providers could result in extra operational costs.

With the cloud, many of the costs associated with an in-house server aren’t applicable i.e. hardware installation, configuration, and maintenance. However, cloud storage services charge additionally for accessing data and querying, and the price will depend on which service you use. Most rates for cloud storage solutions are per gigabyte stored, and some start at a few cents per GB. Unfortunately, unless you know exactly how many gigabytes you’ll use, the total cost for cloud storage is pretty unpredictable. Below, we’ve provided some examples of cloud servers and the price associated with each:

*Note: the price listed is only for storage- it does not include cost of network usage, operations, retrieval, transfers, etc. Prices also may vary by region.*

Server Amazon S3Microsoft AzureGoogle Cloud
PriceStandard Storage:
  • $0.023 per GB per month for the first 50 TB
Basic “Block Blob” Storage:
  • $0.0184 per GB per month for the first 50 TB
Frequent Multi-Regional Storage:
  • $0.026 per GB per month

Conclusion

When considering building a data warehouse, it’s important to start thinking from the bottom up, beginning with how you will store your warehouse. In-house servers and on the cloud are both viable options, each coming with their own benefits and drawbacks. When deciding which storage option is best for your warehouse, it’s important to think about your business’s overall goals and needs. Once you have a clear idea of the direction your warehouse should take, choosing a storage service should be simple.

Resources:

About Bryn Burns

Hi! I'm Bryn Burns. I am a current senior at Virginia Tech pursuing degrees in Statistics and Mathematics. Data science and visualization are two things I'm very passionate about, as well as working with numbers and helping people learn. I'm thrilled to share my knowledge here at The Data School!