- Relational Database Management Systems vs. the Alternative: The Main Dilemma
- How to choose the right approach for a project?
Relational Database Management Systems vs. the Alternative: The Main Dilemma
If there’s one message I’d like you to get from this article, it’s that choosing the right option may be impossible if you don't know all the options.
This is essentially what this article is about. There are quite a lot of data storage approaches, but all the most popular ones are of the relational variety. And the main reason for that is because it’s usually the only way that developers know. Quite frequently, there are more suitable methods, but no one even considered them.
But let’s not get ahead of ourselves and start with the basics.
Why are Relational Database Management Systems (RDBMS) so popular?
As of now, a vast majority of applications use RDBMSs. According to the 2023 developer survey by Stack Overflow, 49.09% of professional developers use PostgreSQL, and 40.59% use MySQL. A few other database systems of the same type are also in the top 10.
To be perfectly fair, those numbers are totally understandable. The biggest strength of those approaches is their versatility. They truly can be used for pretty much every purpose. They’re certainly not perfect for some types of projects, but usually, they will do the job.
The concept itself is relatively simple, and it’s based on the relational model that’s more than 50 years old. In a relational database, data is organized into tables, with each table containing a specific type of information, whatever that may be. Columns define the attributes or characteristics of the data, while rows represent individual records or entries.
We’ve all seen a table. We know how that stuff works. Of course, some newer approaches bring a bit more to the table (sorry about that), but the core idea remains the same.
So, the simplicity of RDBMSs makes them applicable to pretty much any type of software. But it comes with limitations. At some point, they become slow and inefficient.
But even more importantly, they’re limited or even incapable of more complex types of analyses. Alternative database systems are the answer to those limitations.
As the name suggests, graph databases rely on graphs instead of tables. More specifically, they form a network of interconnected nodes and edges, which make them perfect for handling complex and highly interconnected data. If complex relationships between data points are vital for a specific purpose, graph databases may be excellent for it.
Perhaps the most popular graph database is Neo4j, which is used for applications like Behance, eBay, and a number of financial and cybersecurity entities. Generally, perfect applications for graph databases include recommendation engines, social media platforms, and fraud detection systems.
One of the key differences between Neo4j and relational database systems is the use of Cypher instead of SQL as a programming language, which simplifies the process in a really significant way. It almost makes you feel that you’re drawing your queries instead of writing them.
The guide on the Neo4j website does a perfect job of explaining the difference.
Another notable example of a graph database system is Amazon Neptune.
To sum up, graph databases can be a really powerful and scalable tool, perfect for certain types of complex applications. For example, they can be used to create almost tailormade customer experiences for millions of users at a time or solve massive data challenges impossible to tackle in traditional methods.
At the same time, it’s fair to say that using graph databases will be more expensive. Only about 2% of professional developers are familiar with graph databases. But you can expect that from a somewhat niche solution that’s rarely used.
More often than not, a certain state in our app can be reached in multiple ways.
Think of a shopping cart: no matter the order you place items in, the end result will be pretty much the same unless the cashier scans something twice. The takeaway? There's a good chance we might have lost some details, perhaps ones crucial for our business.
Event sourcing is a software architectural pattern that represents the state of an application as a sequence of events. In this model, the application’s state is not stored as is but as a complete historical log of events that occurred and caused the final result. It means that all changes are recorded, and all past states can be reconstructed.
First of all, the trail of changes to the system can be very valuable for compliance, debugging, and analytics purposes. But more importantly, knowing all the details about the causality and effects of every single change means that we’re able to analyze processes and run more and more accurate simulations.
For example, event sourcing can be a wonderful tool in e-commerce for customer behavior analysis. Each action can become an event, and all events and relations between them can easily be analyzed. It’s not impossible to achieve using relational databases (there are even few libraries that make it), but it would get more and more cumbersome, slow, and inefficient. And we all know how customers and search engines love slow e-commerce websites.
The last example of an alternative approach to databases I want to mention is especially important in the current era because they’re a natural choice for AI solutions. Vector databases can be described as a spatial approach to data management. Instead of a table, a vectorized database stores information as a collection of points, lines, and polygons.
The use of vector space to handle data makes all data-related processes significantly faster and more efficient. You can think of it as a map.
Imagine if you had to list all the streets of New York in a table format. While it would provide structured, detailed data, it might not intuitively give you a sense of relative distances or relationships between the streets. Now, when you visualize the same data on a map, you can easily see the spatial relationship between these streets, how they intersect, and how far they are from one another.
Similarly, in a vector space, data points (like documents or images) are positioned based on their features, and the "distance" or similarity between them can be easily measured. This doesn't mean that the data points are physically close, like streets in New York, but rather that their characteristics or features are similar.
In fact, massive multi-purpose maps called GIS (Geographic Information Systems) are one of the use cases for vector databases. Of course, the entire set of processes behind vector databases is a bit more complicated, but I hope you get the idea.
The most notable use cases for vector databases are all the large language models (LLM), such as GPT. Vector databases are used as a memory that empowers those tools, making them work significantly faster and more efficiently, opening possibilities for more applications.
Both Spotify and Facebook also base their recommendation engines on vector databases.
How to choose the right approach for a project?
The right choice has to start with considering all the options. Or at least some of the most popular ones. Unfortunately, the reality usually doesn’t look like this.
The key decision-makers, including company founders and CEOs, aren’t always up to date with the most recent technology solutions. They rely on developers’ suggestions. And because a vast majority of devs are familiar only with relational database management systems, that’s what they go for.
Of course, due to the versatility of RDBMS, that’s usually fine. But it’s not a guarantee. Not realizing that a project should use some alternative database approach can be very harmful and potentially deadly for a project.
The risk isn’t high, but it’s there.
To put it simply, imagine that you’re trying to repair something, and you realize halfway through that you don’t have that one weird screwdriver. You never needed it before, but you can’t continue without it.
A good, experienced developer has that comprehensive toolbox. Most of the tools will lay there unused. But they’re there, and they can be considered.
And finally, it’s fair to point out that there’s rarely a perfect solution that would solve all our problems. Both the traditional and newer approaches to data management will often come with inevitable trade-offs. A certain technology might help you overcome some of the most crucial challenges in a project, but it will present some other issues you’ll have to take care of.
As with everything, caution and thorough analysis is a must.