How to CrAP


  • John D. Rowell

There's a lot of CRAP being said about CAP



The buck stops here

CAP Definition

  • Wikipedia

CAP Definition

  • Wikipedia

CAP Definition

  • Wikipedia
  • The CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency, Availability, Partition Tolerance

Cargo Cult


Cargo Cult v1.0


Cargo Cult v1.0


Cargo Cult v1.0


Cargo Cult v1.0


Cargo Cult v2.0


Cargo Cult v2.0


Cargo Cult v2.0


Cargo Cult v2.0


Enterprise


FUQs: Frequently Unasked Questions


FUQs: Consistency

  • read-after-write?

FUQs: Consistency

  • read-after-write?
  • read-after-1s-of-write?

FUQs: Consistency

  • read-after-write?
  • read-after-1s-of-write?
  • read-after-10s-of-write?

FUQs: Consistency

  • read-after-write?
  • read-after-1s-of-write?
  • read-after-10s-of-write?
  • read-after-another-session-wrote?

FUQs: Consistency

  • read-after-write?
  • read-after-1s-of-write?
  • read-after-10s-of-write?
  • read-after-another-session-wrote?
  • distributed → latency → no strong consistency

FUQs: Consistency

  • read-after-write?
  • read-after-1s-of-write?
  • read-after-10s-of-write?
  • read-after-another-session-wrote?
  • distributed → latency → no strong consistency
  • eventual consistency: converge to a consistent state, usually in order of ms, entropy reduction

FUQs: Availability

  • read-and-write?

FUQs: Availability

  • read-and-write?
  • always-respond?

FUQs: Availability

  • read-and-write?
  • always-respond?
  • authoritativeness-of-response?

FUQs: Availability

  • read-and-write?
  • always-respond?
  • authoritativeness-of-response?
  • location-awareness?

FUQs: Availability

  • read-and-write?
  • always-respond?
  • authoritativeness-of-response?
  • location-awareness?
  • arbitration?

FUQs: Availability

  • read-and-write?
  • always-respond?
  • authoritativeness-of-response?
  • location-awareness?
  • arbitration?
  • handover?

FUQs: Partition Tolerance

  • read-on-both-sides?

FUQs: Partition Tolerance

  • read-on-both-sides?
  • write-on-both-sides?

FUQs: Partition Tolerance

  • read-on-both-sides?
  • write-on-both-sides?
  • conflict resolution?

FUQs: Partition Tolerance

  • read-on-both-sides?
  • write-on-both-sides?
  • conflict resolution?
  • read-and-write-on-n-partitions?

FUQs: Partition Tolerance

  • read-on-both-sides?
  • write-on-both-sides?
  • conflict resolution?
  • read-and-write-on-n-partitions?
  • data-on-other-side?

FUQs: Partition Tolerance

  • read-on-both-sides?
  • write-on-both-sides?
  • conflict resolution?
  • read-and-write-on-n-partitions?
  • data-on-other-side?
  • partition reporting?

FUQs: Partition Tolerance

  • read-on-both-sides?
  • write-on-both-sides?
  • conflict resolution?
  • read-and-write-on-n-partitions?
  • data-on-other-side?
  • partition reporting?
  • failure-during-partition?

Quiz - Is Redis CA, CP or AP?


Quiz - Is Redis CA, CP or AP?

  • Master fails → No automatic write failover → No "A"

Quiz - Is Redis CA, CP or AP?

  • Master fails → No automatic write failover → No "A"
  • Partition → endpoint unreachable → No "P"

Quiz - Is Redis CA, CP or AP?

  • Master fails → No automatic write failover → No "A"
  • Partition → endpoint unreachable → No "P"
  • No strong persistence (default setup) → No "C"

Quiz - Is Redis CA, CP or AP?

  • Master fails → No automatic write failover → No "A"
  • Partition → endpoint unreachable → No "P"
  • No strong persistence (default setup) → No "C"
  • Doesn't have any CAP attributes but is still extremely useful

Quiz - Redis with master failover and sync writes


Quiz - Redis with master failover and sync writes

  • Master fails → Automatic write failover → BUT the clients doesn't have cluster awareness → No "A"

Quiz - Redis with master failover and sync writes

  • Master fails → Automatic write failover → BUT the clients doesn't have cluster awareness → No "A"
  • Partition → Automatic slave promotion → BUT the clients doesn't have cluster awareness → No "P"

Quiz - Redis with master failover and sync writes

  • Master fails → Automatic write failover → BUT the clients doesn't have cluster awareness → No "A"
  • Partition → Automatic slave promotion → BUT the clients doesn't have cluster awareness → No "P"
  • Strong persistence and durability → Has "C" (yay! terrible performance :P)

Quiz - Redis with master failover and sync writes

  • Master fails → Automatic write failover → BUT the clients doesn't have cluster awareness → No "A"
  • Partition → Automatic slave promotion → BUT the clients doesn't have cluster awareness → No "P"
  • Strong persistence and durability → Has "C" (yay! terrible performance :P)
  • Slower writes than master/slave or master/master RDBMS, less features

Quiz - Is MongoDB master/slave CA, CP or AP?



Quiz - Is MongoDB master/slave CA, CP or AP?

  • Master fails → No automatic write failover → No "A"

Quiz - Is MongoDB master/slave CA, CP or AP?

  • Master fails → No automatic write failover → No "A"
  • Partition → Host unreachable → No "P"

Quiz - Is MongoDB master/slave CA, CP or AP?

  • Master fails → No automatic write failover → No "A"
  • Partition → Host unreachable → No "P"
  • No single server durability → No "C"

Quiz - Is MongoDB master/slave CA, CP or AP?

  • Master fails → No automatic write failover → No "A"
  • Partition → Host unreachable → No "P"
  • No single server durability → No "C"
  • Lack of single server durability makes this setup a #devops nightmare

Quiz - Are MongoDB replica sets CA, CP or AP?



Quiz - Are MongoDB replica sets CA, CP or AP?

  • Master fails → Automatic failover → Has "A" (yay!)

Quiz - Are MongoDB replica sets CA, CP or AP?

  • Master fails → Automatic failover → Has "A" (yay!)
  • Partition → Automatic promotion → Has "P" (yay!)

Quiz - Are MongoDB replica sets CA, CP or AP?

  • Master fails → Automatic failover → Has "A" (yay!)
  • Partition → Automatic promotion → Has "P" (yay!)
  • No consistency during/after partition → *discards conflicting data* → No "C"

Quiz - Are MongoDB replica sets CA, CP or AP?

  • Master fails → Automatic failover → Has "A" (yay!)
  • Partition → Automatic promotion → Has "P" (yay!)
  • No consistency during/after partition → *discards conflicting data* → No "C"
  • Lack of single server durability means that any local glitch requires a full sync to master

Quiz - Is Riak CA, CP or AP?




Quiz - Is Riak CA, CP or AP?

  • No master, N copies of data → Has "A" (yay!)

Quiz - Is Riak CA, CP or AP?

  • No master, N copies of data → Has "A" (yay!)
  • Consistent hashing → Automatic handover → Has "P" (survives most partitions, yay!)

Quiz - Is Riak CA, CP or AP?

  • No master, N copies of data → Has "A" (yay!)
  • Consistent hashing → Automatic handover → Has "P" (survives most partitions, yay!)
  • No strong consistency but does converge by using vector clocks, some application level resolution still required (or use most recent) → No "C"

Quiz - Is Riak CA, CP or AP?

  • No master, N copies of data → Has "A" (yay!)
  • Consistent hashing → Automatic handover → Has "P" (survives most partitions, yay!)
  • No strong consistency but does converge by using vector clocks, some application level resolution still required (or use most recent) → No "C"
  • Riak is indeed AP and very close to C. The tradeoffs are performance and ease of use (no advanced data type/document support).

Play on words

  • Consistency means that all clients have the same view of the data, but if data was lost it's consistently crappy

Play on words

  • Consistency means that all clients have the same view of the data, but if data was lost it's consistently crappy
  • Availability means that a node failure must not hamper others, but if the other option is to risk data corruption during a partition, it's better to be unavailable

Play on words

  • Consistency means that all clients have the same view of the data, but if data was lost it's consistently crappy
  • Availability means that a node failure must not hamper others, but if the other option is to risk data corruption during a partition, it's better to be unavailable
  • Partition Tolerance means the system continues to operate despite message loss, but to remain consistent your must degrade your service for some clients

Turns out it's not as simple as
C - A - P

  • You must plan for and implement different levels for each of C, A and P

Turns out it's not as simple as
C - A - P

  • You must plan for and implement different levels for each of C, A and P
  • These levels depend on the database being used, the configuration of that database, the client library and language, AND your application logic

Turns out it's not as simple as
C - A - P

  • You must plan for and implement different levels for each of C, A and P
  • These levels depend on the database being used, the configuration of that database, the client library and language, AND your application logic
  • On polyglot systems you have the above variables ^n

But wait, there's tunables too!

But wait, there's tunables too!


N, W, R - Consistency

  • R + W > N

N, W, R - Consistency

  • R + W > N

N, W, R - Availability

  • Default quorum is (N/2)+1

N, W, R - Availability

  • Default quorum is (N/2)+1
  • Decrease W for greater write availability
    (eg. N=3, W=1, R=3)

N, W, R - Availability

  • Default quorum is (N/2)+1
  • Decrease W for greater write availability
    (eg. N=3, W=1, R=3)
  • Decrease R for greater read availability
    (eg. N=3, W=3, R=1)

N, W, R - Availability

  • Default quorum is (N/2)+1
  • Decrease W for greater write availability
    (eg. N=3, W=1, R=3)
  • Decrease R for greater read availability
    (eg. N=3, W=3, R=1)
  • Still (eventually) consistent, but guaranteeing different kinds of availability

DW - Durability

  • DW <= W, number of nodes that must complete durable writes before returning success

DW - Durability

  • DW <= W, number of nodes that must complete durable writes before returning success
  • N + R > W and DW = W is already half of ACID

DW - Durability

  • DW <= W, number of nodes that must complete durable writes before returning success
  • N + R > W and DW = W is already half of ACID
  • Atomicity can be had by using data sets and documents

DW - Durability

  • DW <= W, number of nodes that must complete durable writes before returning success
  • N + R > W and DW = W is already half of ACID
  • Atomicity can be had by using data sets and documents
  • Isolation can be pushed to the client - nobody wants to see you trans-acting

Obligatory reference


Conclusion


CAP's merit is not in its mathematical proof or the factors it uses to analyze distributed systems. The important contribution is to the discussion of distributed system design and limitations. The best lesson to take from CAP is to try to come up with a better method for analyzing these systems, based on your own requirements and experience.

Ktksbai


  • John D. Rowell