System Design

What is the CAP Theorem? (And Why It Matters)

Learn the basics of the CAP Theorem in distributed systems and why you can only choose two out of Consistency, Availability, and Partition Tolerance.

What is the CAP Theorem? (And Why It Matters)

CAP Theorem

The CAP Theorem states that, in a distrubuted system (a collection of interconnected nodes that share data.), you can only have two out of the following t hree guarantees accross a write/read pair:

  1. Consistancy
  2. Availability
  3. Partition Tolerance

One of the must be sacrified.

Img
  • Consistency - A read is guaranteed to return the most recent write for a given client.
  • Availabillity - A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout)
  • Partition Tolerance - The system will continue to function when network partitions occur.

Why?

Object Oriented Programming != Network Programming

There are assumptions that we take for granted when building applications that share memory, which break down as soon as nodes are split across space and time.

One such fallacy of distributed computing is that networks are reliable. Network and parts of networks go down frequently and unexpectedly. Network failures happen to your system and you don't get choose when they occur.

Given that networks aren't completely reliable, you most tolerate partitions in a distributed system, period. Fortunately, though, you get to choose what to do when a partition does occur. According to the CAP theorem, this means we are left with two options Consistency and Availability

  • CP - Consistency/Partition Tolerance - Wait for a response from the partitioned node which could result in a timeout error. The system can also choose to return an error, depending on the scenario you desire. Choose Consistency over Availability when your business requirements dictate atomic reads and writes.

  • AP - Availability/Partition Tolerance - Return the most recent version of the data you have, which could be stale. This system state will also accept writes that can be processed later when the partition is resolved. Choose Availability over Consistency when your business requirements allow for some flexibility around when the data in the system synchronizes. Availability is also a compelling option when the system needs t o continue to function in spite of external errors (shopping carts, ect..)

The decision between Consistency and Availability is a software trade off. You can choose what to do in the face of a network partition - the control is in your hands. Network outages, both temporary and permanent, are a fact of life and occur whether you want them to or not - this exists outside of your softwrae.

Falaccy of distributed computing is:

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn't change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous