logo
4

CAP Theorem in System Design

A clear, practical explanation of the CAP Theorem and its implications for building distributed systems.

The CAP Theorem is a foundational concept in distributed systems design, often discussed in system design interviews and real-world architecture decisions. It explains why distributed systems must make trade-offs between key properties when failures occur.

What is the CAP Theorem?

In a distributed system, CAP stands for:

  • Consistency – Every read returns the most recent write or an error.
  • Availability – Every request receives a response (success or failure) without guaranteed freshness.
  • Partition Tolerance – The system continues to operate even when network partitions (failures in communication between nodes) occur.

The CAP Theorem states that in the presence of a network partition, a distributed system can only guarantee at most two of these three properties simultaneously. You cannot fully achieve all three at the same time.

Why CAP Matters

Distributed systems run across multiple machines or data centers. Network issues such as dropped messages or delayed communication are inevitable in real environments. The CAP Theorem shows that when such partitions happen, a system must choose:

  • Consistency: make sure data is correct everywhere, even if some parts stop responding.
  • Availability: always respond to client requests, even if some responses may be outdated.

This trade-off influences how systems behave under failure and how users experience data correctness and availability.


The Three Properties in Detail

Consistency

Consistency in CAP means:

After a write completes, all future reads will return that write from every node.

This is stronger than typical ACID consistency — it’s about global visibility of the latest data across every node. Strong consistency ensures correctness but can reduce availability when parts of the network fail.


Availability

Availability means the system will always respond to client requests, even if some parts of the system are unreachable.

However, an available response doesn’t guarantee that the data is up-to-date when partitions occur. Some nodes may return stale data.


Partition Tolerance

Partition tolerance is the system’s ability to continue functioning even when communication breaks between nodes due to network faults.

In practice, partition tolerance isn’t optional — network failures happen — so real-world distributed systems almost always have to tolerate partitions.


CAP Trade-Offs

When a network partition occurs, a system must compromise one property:

CP: Consistency + Partition Tolerance

  • Ensures correct, synchronized data across nodes.
  • May sacrifice availability by rejecting requests during partitions.

Example: Banking systems where data accuracy matters even if some requests are delayed.


AP: Availability + Partition Tolerance

  • Always responds to requests.
  • May serve stale or inconsistent data during partitions.

Example: Social media feeds where responsiveness matters more than immediate global consistency.


CA: Consistency + Availability

  • In theory, serves up-to-date data and responds to requests.
  • Not realistic in distributed systems with partitions, because partition tolerance can’t be ignored.