- Raft stands for Replicated and Fault Tolerant
- Raft is something you can build out of a collection of logs
- Raft is something you can use to get away from the island of Paxos
Goal: Replicated Log
- Replicated log => replicated state machine
- All servers execute same commands in same order
- Consensus module ensures proper log replication
- System makes progress as long as any majority of servers are up
- Failure model: fail-stop(not Byzantine), delayed/lost messages
Approaches to Consensus
Two possible approaches to consensus:
- Symmetric, leader-less:
- All servers have equal roles
- Clients can contact any server
- Asymmetric, leader-based:
- At any given time, one server is in charge, others accept its decision
- Clients communicate with the leader
- Raft uses a leader:
- Decomposes the problem (normal operation, leader changes)
- Simplifies normal operation (no conflicts)
- More efficient than leader-less approaches
Leader Election
1 | +------------------+ |
- Safety: allow at most one winner per term
- Each server gives only one vote per term(persist on disk)
- Majority required to win election
1 | B can't also Voted for |
- Livebess: some candidate must eventually win
- Choose election timeout randomly in [T, 2T] (e.g. 150-300ms)
- One server usually times out and wins election before others time out
- Works well if T >> broadcast time
Randomized approach simpler than ranking
Normal Operation
- Client sends command to leader
- Leader appends command to its log
- Leader send AppendEntries RPCs to all followers
- Once new entry committed
- Leader executes command in its state machine, returns result to client
- Leader notifies followers of committed entries in subsequent AppendEntries RPCs
- Followers execute committed commands in their state machines
- Crashed/slow followers
- Leader retries AppendEntries RPCs until they succeed
- Optimal performance in common case:
- One successful RPC to any majority of servers
Log Structure
1 | type Command string |
- Must survive crashes (store on disk)
- Entry committed if safe to execute in state machines
- Replicated on majority of servers by leader of its term
Log Inconsistencies
Crashes can result in log inconsistencies
- Raft minimizes special code for repairing inconsistencies
- Leader assumes its log is correct
- Normal operation will repair all inconsistencies