If there’s one message I’d like you to get from this article, it’s that choosing the right option may be impossible if you don't know all the options.
This is essentially what this article is about. There are quite a lot of data storage approaches, but all the most popular ones are of the relational variety. And the main reason for that is because it’s usually the only way that developers know. Quite frequently, there are more suitable methods, but no one even considered them.
But let’s not get ahead of ourselves and start with the basics.
Why are Relational Database Management Systems (RDBMS) so popular?
As of now, a vast majority of applications use RDBMSs. According to the 2023 developer survey by Stack Overflow, 49.09% of professional developers use PostgreSQL, and 40.59% use MySQL. A few other database systems of the same type are also in the top 10.
To be perfectly fair, those numbers are totally understandable. The biggest strength of those approaches is their versatility. They truly can be used for pretty much every purpose. They’re certainly not perfect for some types of projects, but usually, they will do the job.
The concept itself is relatively simple, and it’s based on the relational model that’s more than 50 years old. In a relational database, data is organized into tables, with each table containing a specific type of information, whatever that may be. Columns define the attributes or characteristics of the data, while rows represent individual records or entries.
We’ve all seen a table. We know how that stuff works. Of course, some newer approaches bring a bit more to the table (sorry about that), but the core idea remains the same.
So, the simplicity of RDBMSs makes them applicable to pretty much any type of software. But it comes with limitations. At some point, they become slow and inefficient.
But even more importantly, they’re limited or even incapable of more complex types of analyses. Alternative database systems are the answer to those limitations.
Graph Databases
As the name suggests, graph databases rely on graphs instead of tables. More specifically, they form a network of interconnected nodes and edges, which make them perfect for handling complex and highly interconnected data. If complex relationships between data points are vital for a specific purpose, graph databases may be excellent for it.
Perhaps the most popular graph database is Neo4j, which is used for applications like Behance, eBay, and a number of financial and cybersecurity entities. Generally, perfect applications for graph databases include recommendation engines, social media platforms, and fraud detection systems.
One of the key differences between Neo4j and relational database systems is the use of Cypher instead of SQL as a programming language, which simplifies the process in a really significant way. It almost makes you feel that you’re drawing your queries instead of writing them.
The guide on the Neo4j website does a perfect job of explaining the difference.
Another notable example of a graph database system is Amazon Neptune.
To sum up, graph databases can be a really powerful and scalable tool, perfect for certain types of complex applications. For example, they can be used to create almost tailormade customer experiences for millions of users at a time or solve massive data challenges impossible to tackle in traditional methods.
At the same time, it’s fair to say that using graph databases will be more expensive. Only about 2% of professional developers are familiar with graph databases. But you can expect that from a somewhat niche solution that’s rarely used.
Event Sourcing
More often than not, a certain state in our app can be reached in multiple ways.
Think of a shopping cart: no matter the order you place items in, the end result will be pretty much the same unless the cashier scans something twice. The takeaway? There's a good chance we might have lost some details, perhaps ones crucial for our business.
Event sourcing is a software architectural pattern that represents the state of an application as a sequence of events. In this model, the application’s state is not stored as is but as a complete historical log of events that occurred and caused the final result. It means that all changes are recorded, and all past states can be reconstructed.
First of all, the trail of changes to the system can be very valuable for compliance, debugging, and analytics purposes. But more importantly, knowing all the details about the causality and effects of every single change means that we’re able to analyze processes and run more and more accurate simulations.
For example, event sourcing can be a wonderful tool in e-commerce for customer behavior analysis. Each action can become an event, and all events and relations between them can easily be analyzed. It’s not impossible to achieve using relational databases (there are even few libraries that make it), but it would get more and more cumbersome, slow, and inefficient. And we all know how customers and search engines love slow e-commerce websites.
And when it comes to examples, the most commonly used are EventStore and Axon Server.
Vector Databases
The last example of an alternative approach to databases I want to mention is especially important in the current era because they’re a natural choice for AI solutions. Vector databases can be described as a spatial approach to data management. Instead of a table, a vectorized database stores information as a collection of points, lines, and polygons.
The use of vector space to handle data makes all data-related processes significantly faster and more efficient. You can think of it as a map.
Imagine if you had to list all the streets of New York in a table format. While it would provide structured, detailed data, it might not intuitively give you a sense of relative distances or relationships between the streets. Now, when you visualize the same data on a map, you can easily see the spatial relationship between these streets, how they intersect, and how far they are from one another.
Similarly, in a vector space, data points (like documents or images) are positioned based on their features, and the "distance" or similarity between them can be easily measured. This doesn't mean that the data points are physically close, like streets in New York, but rather that their characteristics or features are similar.
In fact, massive multi-purpose maps called GIS (Geographic Information Systems) are one of the use cases for vector databases. Of course, the entire set of processes behind vector databases is a bit more complicated, but I hope you get the idea.
The most notable use cases for vector databases are all the large language models (LLM), such as GPT. Vector databases are used as a memory that empowers those tools, making them work significantly faster and more efficiently, opening possibilities for more applications.
Both Spotify and Facebook also base their recommendation engines on vector databases.