Ajay Kulkarni, the co-founder of Timescale DB wrote an article about “Why SQL is beating NoSQL,” which became an instant hit. He made a compelling case about how SQL is making a comeback, citing Google Spanner and CockroachDB.
The analysis was mostly agreeable, except for one major flaw. It is not SQL which is making a comeback, its NoSQL which is morphing into providing a familiar interface.
Bigtable and MapReduce were developed to deal with an enormous amount of data at Google. I was part of Google’s incremental indexing system, Caffeine. A single distributed Bigtable instance handled petabytes of web data with new information being added at the rate of hundreds of terabytes per day. A feat like this is unachievable by any SQL system popular even today.
Typical SQL database has not changed. It is still a single instance, scaling which is pain and requires application side complexity. There’s a real need for scalable systems, something that traditional SQL systems still haven’t evolved to provide. We do not have a distributed MySQL or a distributed Postgres.
A learning curve is commonly cited as a counter to new entrants to database world (also made by Kulkarni).
“Each NoSQL database offered its own unique query language, which meant: more languages to learn”
I do not buy it. Most companies add new tech despite the unfamiliarity of their devs. Redis, Elastic Search, Memcache are (or at least recently were) all unfamiliar technologies that are commonly part of a typical tech stack.
In fact, contradictory arguments are used in both ways. Devs like “everything new and shiny,” and at the same time, “learning curve limits adoption.” Both arguments were made by Kulkarni on why NoSQL became big and why it did not, respectively. Both fail to understand and explain the NoSQL movement.
As I write this, MongoDB has filed to go public. This is a huge success for any database company! Developers use Mongo not just because it is new and shiny (well, it no longer is new), but because it solves real problems for them.
So what has changed? What changed is a result of lessons learned by NoSQL folks. Jeff Dean, the Google fellow behind MapReduce and Bigtable, said that not supporting distributed transactions in Bigtable was a mistake.
In fact, many projects were using Megastore at Google despite poor write throughput, because it added transactions on top of Bigtable. Caffeine itself used transactions via Percolator. So Spanner was an evolution of Bigtable to provide distributed transactions. Four of Bigtable authors worked on Spanner, including Jeff Dean and Sanjay Ghemawat.
Once you evolve a NoSQL database into providing MVCC and transactions, then adding a SQL based query language is not hard. However, it could be any other interface. The interface does not matter. What matters is the fact that scalability and transactions are important. Bigtable traded off transactional access for scalability, something that Cassandra and MongoDB went on to copy. That and lack of joins is the primary cause of discontent among NoSQL developers.
So, like Spanner is an evolution of Bigtable, CockroachDB is an evolution of MongoDB. SQL has not changed or made a comeback, or beaten NoSQL. NoSQL has morphed into something developers have come to expect from their databases, namely transactions and joins.