The Legend of Cassandra

Image: The Guardian

According to Greek legend, Cassandra was a prophet. She foresaw terrible, tragic futures — but as if that wasn’t enough, she was also cursed to never be believed.

I had always wondered why on earth Apache Cassandra was named after a prophet who was never believed. Doesn’t give you the greatest confidence in their software, no?

But yesterday, I learned a ton about Cassandra from a knowledgeable coworker — and now, the reference to Greek legend makes sense.

What is Cassandra?

Cassandra is a distributed data management system built for handling large volumes of I/O with no single point of failure. With Cassandra, your database is just a file system, with files spread out over a number of nodes that are arranged in a cluster.

Image: PlanetCassandra.org

How is it different from MySQL?

Relational databases like MySQL and Postgres use a binary tree structure to store information. Binary trees are great for reading information quickly — they’re a data structure that accommodates binary search, dividing predictable/sortable information in half and search the new half for the desired row. However, adding new rows to a binary tree is kind of ugly. You add the new information as randomly-inserted leaf nodes in the tree and now your binary tree is out of order.

Cassandra solves this problem by getting rid of the binary tree and replacing it with a token ring. In a token ring, you start with a key (like “start time” or “id”) and generate a unique hashed key, an unsigned Long with a value somewhere between -264 and 263. The ring is divided into chunks, each of which have a node assigned to it. For example, you might have a chunk of hashed keys assigned to Node 1, another chunk to Node 3, another chunk to Node 2, another chunk to Node 1, another chunk to Node 3, etc. The chunks are distributed in a somewhat random way to try to fairly balance the load — you wouldn’t want a bunch of traffic going to Node 1 and overwhelming it with requests.

But, just like the prophet from Greek legend, never believe Cassandra.

Can’t trust Cassandra

Any one node by itself can’t be trusted. In fact, the token ring doesn’t just assign one node to each chunk of hash keys, but a list of nodes in the order in which to try to write the data. For example: chunk 256 => [2,3,1]. This will try to write to Node 2 first, then 3, then 1. The data will be stored on multiple nodes eventually.

Every cluster has a replication factor, a value that determines how many nodes to repeat the data on. For example, if our 3-node cluster has a replication factor (RF) of 3, then data will be stored on all 3 of the nodes. My coworker strongly believes that 3 is the minimum sensible RF — because if any of your nodes ever goes down, then you’d rely on a backup node, and you’d need 1 more node to double-check to verify the data is good. (3 nodes - 1 node = 2 nodes. 1 node for requesting the data, 1 to double-check that the first one is right.)

Image: HakkaLabs

Scalable system

One thing that’s amazing about Cassandra is how easy it is to add a new node and therefore add capacity to a cluster. You add the provisioned server to the cluster by simply starting Cassandra on the server (it’s a Java app), then adding its IP address to the list of known nodes. The Cassandra node gets the schema information from the “seed” nodes (the seed nodes are just the nodes, listed by IP address, found in the Cassandra config under “seeds”), the token ring breaks and re-forms to include the new node, and the new node begins streaming data from older nodes. Once it’s joined the cluster and owns all the information it’s expected to own, then it’s ready to receive traffic, and you’ve just added a terabyte of available storage space to your database.

There are no masters, there are no replicas. Just nodes.

It’s a cool system. The co-op nerd side of me loves the hierarchy-free model based on negotiation, shared ownership, and consent among nodes.

However, there are some serious drawbacks to using Cassandra.

What’s wrong with Cassandra?

The biggest downside of Cassandra that has come up so far in conversation: it was designed to solve the problem of slow writes. When MySQL or Postgres write something new to a binary tree, that information gets spread out over the binary tree – it’s not all in one place. Cassandra, on the other hand, writes new information all together. This is convenient when your model of storage is a spinning hard disk drive, where you’d want all the writes to be together.

But with the advent of solid-state drives, this problem might feel a little irrelevant. And given that Cassandra requires a lot of specialized knowledge outside the comfort zone of people who understand SQL really well, there might not be enough of an incentive to make the switch – or to even stop using Cassandra after having started.


Sources: