What is a Graph Database?

In math, a graph is simply a collection of elements – typically called nodes (also called vertices or points) – that are joined together by edges. Each node represents some piece of information in the graph, whereas each edge represents some connection between two nodes.

A directed graph is a special type of graph where edges always have a direction associated with them. Conversely, an undirected graph is one where the edges are simply links with no direction associated with them.

A graph database is essentially a datastore that uses nodes, edges, and properties to retrieve and store data. To fully understand the basic structure of a graph database, one must understand two required components: nodes and edges.

The node is a basic entity within a graph (person, place, thing, etc). The nodes carry useful information about the entity in which they are representing—similar to a row in a relational database. Think of a node as a circular container with data in it. The edge is essentially a line that connects one node to another.

Example of a Graph Database

Suppose we have data similar to Facebook, where users are friends with other users, and they can also like pages. Structuring this data in relational databases would look like this :

In order to store the relationships between users and pages, it is necessary to create 2 tables (as shown above) that connect the identifiers for users and pages. For developers who are not used to working with tables, finding out the pages that a user likes is not a trivial task.

This trouble to visualize the relationship between entities in a relational database is the reason to introduce graph databases. Given the concept of a graph, we can represent the same scenario presented previously in a different structure, as shown in the image below. This is a simple and intuitive way to illustrate the relationship between objects.

With this graphical representation, figuring out who is friends with whom, and which pages a user likes is more intuitive.

Graph Database Use Cases

  1. Fraud Detection – Traditional fraud prevention measures focus on discrete data points such as specific accounts, devices, or IP addresses. However, many fraudsters escape detection by forming fraud rings comprised of stolen and synthetic identities. To uncover such fraud rings, it is essential to look beyond individual data points to the connections that link them. A graph database uncovers such difficult to detect patterns.
  2. Real-Time Recommendation Engines – To make relevant recommendations in real time requires the ability to correlate product, customer, inventory, supplier, logistics, and even social sentiment data. Graph databases easily outperform relational databases by connecting masses of buyer and product data.
  3. Network and IT Operations – A graph database enables you to connect monitoring tools and gain critical insights into the complex relationships between different network or data center operations.
  4. Identity and Access Management – Using a graph database, identity and access authorizations and inheritances can be easily tracked with substantial depth and real-time results.
  5. Search – Graph-based search is a new approach to data management originally pioneered by Facebook and Google. The key to this enhanced search capability is that on the very first query, a graph-based search engine takes into account the entire structure of available connected data. Since graph systems understand how data is related, they return much richer and more precise results.

Conclusion

While other databases compute relationships at query time through expensive JOIN operations, a graph database stores connections as first class citizens. So whenever you run the equivalent of a JOIN operation in a graph database, the database already has direct access to the connected nodes, eliminating the need for an expensive search/match computation. This allows you to represent complex interactions between your data in a much more natural form, and often allows for a closer fit to the real-world data that you are working with.

Resources

  1. https://medium.com/high-alpha/graph-databases-living-on-the-edge-f6307a6c5088
  2. https://www.infoworld.com/article/3251829/nosql/why-you-should-use-a-graph-database.html
  3. https://neo4j.com/why-graph-databases/
  4. https://medium.com/labcodes/graph-databases-talking-about-your-data-relationships-with-python-b438c689dc89

 

Rohan Joseph

About Rohan Joseph

Practicing the dark arts of data science. I am currently pursuing Master's in Operations Research at Virginia Tech and working with Chartio to democratize analytics in every organization.