Distributed systems fundamentals: CAP and consistency

Distributed systems fundamentals: CAP and consistency

13 min
distributed-systemsarchitectureconsistency

Building distributed systems means dealing with network partitions, consistency trade-offs, and failure modes. Explore CAP theorem, eventual consistency models, and consensus algorithms like Raft and Paxos.


Markup Showcase: Distributed Systems

Headings and narrative

In unreliable networks, partitions are normal. Systems choose trade‑offs between availability and consistency. Client experience often improves with monotonic reads and read‑your‑writes guarantees.

See the excellent overview of CAP and Martin Kleppmann’s writing on eventual consistency. Internal notes live in the architecture docs.

Lists

  • Partition tolerance is mandatory in distributed systems
  • Choose between stronger consistency vs higher availability
    • Session guarantees can improve UX
    • Use CRDTs for conflict‑free merges
  1. Detect leader failure
  2. Elect new leader (Raft)
  3. Resume log replication

Quote: “The network is reliable” — a famous fallacy of distributed computing.

Inline code

Client sessions can pin to a leaderId and fall back when heartbeat timeouts occur.

Code blocks

Pseudo‑Raft (very simplified):
onHeartbeatTimeout() {
  becomeCandidate();
  requestVotes();
  if (votes > quorum) becomeLeader();
}
# Simulate packet loss with tc on Linux
sudo tc qdisc add dev eth0 root netem loss 10% delay 50ms 10ms distribution normal
type LogEntry = { index: number; term: number; data: Uint8Array };

function append(entries: LogEntry[], next: LogEntry) {
  const last = entries.at(-1);
  if (last && last.index + 1 !== next.index) throw new Error("hole in log");
  return [...entries, next];
}

Alignment table

ModelConsistencyAvailabilityNotes
Single leaderStrong (R/W)mediumfailover required
Leader + followersStrong (R/W)highread replicas possible
Multi‑leaderEventualhighconflict resolution
Leaderless (quorum)Tunable (R/W)highchoose R/W quorums

Horizontal rule


Sub/sup and long URL

Replication factor Rf=3 is common; latency p95th should remain stable under node loss.

https://example.org/distributed/systems/research/papers/2025/long/path/with/many/segments/to/test/wrapping

Image

Network topology illustration