A seminal question to answer in a distributed event driven "fault tolerant" system is how to recreate state should there be a need for fault tolerance and recovery.
A typical and easy to rationalize approach is the assignment of a globally increasing sequence to the events flowing across the system. The intent is to bring a global "order" or "sequence" into the system.
Later on this global ordering will pave the way for a resilient yet distributed system.
It will be:
- predicable - replaying the events from message commit logs will always produce the same net effect.
- reliable and fault tolerant - as long as there are commit logs maintained, faults recovery and late joining protocols will always have a way to catch up.
This post attempts to answer the 2nd question in an expository style (and will not delve into details).
......................................................................................................................................................................
In a distributed system, there are typically two main roles for processes: Senders and Destinations
Senders send the messages, destinations (receive and deliver) the messages. This is how the communication is achieved.
In order to achieve global "ordering" or "sequencing", a 3rd role is needed i.e. the role of a "Sequencer".
It turns out that a "Sequencer" based ordering is not the only ordering scheme. As we go through the following sections, we'd see that "total ordering" can be achieved without a sequencer as well.
Achieving global 'total order' with sequencers
In a "fixed" variant, the sequencer's role is fixed to a certain process in the system and the responsibility is not transferred to another process. Senders send the "un-sequenced" messages to the sequencer first which stamps them with a sequence number and publishes to the destinations.
In the "unicast" variant of "fixed" sequencer, senders unicasts the message to the sequencer which then stamps it with a sequence and broadcasts to destinations.
In the "BB broadcast broadcast" variant as shown above, sender broadcasts to all destinations including the sequencer. The sequencer then stamps and broadcasts again to all destinations. This incurs more messages flowing in the system but the trade-off is that this approach is more tolerant to the failure of sequencer.
The 3rd variant of "fixed" sequencer is where the sender requests for a sequence from the sequencer first and then broadcasts the message to all destinations.
As we can see the 1st and 3rd variants suffer from single point failure of the sequencer process but incur less messages in the system.
'Moving sequencer' employs a different approach. The the role of sequencer becomes transferable as shown here. Multiple sequencers in the system distribute the load among themselves hence addressing the issue of 'single point failure' as was the case with 'fixed sequencers'.
In a 'moving sequencer' variant, the sender sends a message to the sequencers ring which circulates a token (carrying the sequence number) and all sequenced messages. Upon receipt of the token, a specific sequencer stamps the un-sequenced messages with the next sequence number, broadcasts such new messages to the destinations and passes the token to next sequencer in the ring.
In this way, all sequencers always have the latest history of sequenced messages thus addressing the failure issue.
However a trade off with 'moving sequencer' approach is its complexity which makes 'fixed' variant a preferred choice.
Achieving global 'total order' without sequencers
In a system, where sequencers are absent, total ordering can still be achieved.
In a 'privilege' based system as shown below, the idea is that the senders can only broadcast a message when they have the privilege to do so.
That privilege manifests in the form of a token which is circulated among the senders. Only the currently owning sender can broadcast sequenced messages to the destinations. If another sender wishes to publish, it must wait for the token ownership. The token carries the same information as was in the case of 'moving sequencer'. In this way all senders always have the history of so far sequenced messages thus bringing total order in the system.
Another prominent scheme to achieve global total order without sequencers is based on agreement among destination processes on how the messages must be delivered. This is the space where 'consensus algorithms' exist.
Such protocols strive to arrive at a consensus in the form of a number or id which will be used as an additional trait to stamp the messages during that round of consensus.
Each round of consensus comes up with a consensus number K, and next round's number K' such that
K < K'
Thus all the messages stamped with K will be delivered before any messages stamped with K'. Hence achieving the effect of global total order.
Popular variations of consensus exist as Raft, Paxos, ZAB, ViewStamped Replication.
It turns out that contemporary distributed systems use consensus to achieve net effects of global total order as we see here:
In this post, we discussed the problem of "global sequencing" in a distributed system and the approaches which eventually lead to atomic broadcast and consensus protocols. For further information, here are some references:
- ZAB: http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf
- Raft: https://raft.github.io/
- Paxos: https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf
- Atomic Broadcast: https://en.wikipedia.org/wiki/Atomic_broadcast
Comments
Post a Comment