Skip to main content

Achieving global total order in distributed systems without Sequencers

A seminal question to answer in a distributed event driven "fault tolerant" system is how to recreate state should there be a need for fault tolerance and recovery. 

A typical and easy to rationalize approach is the assignment of a globally increasing sequence to the events flowing across the system. The intent is to bring a global "order" or "sequence" into the system.

Later on this global ordering will pave the way for a resilient yet distributed system. 

It will be:
  • predicable - replaying the events from message commit logs will always produce the same net effect.
  • reliable and fault tolerant - as long as there are commit logs maintained, faults recovery and late joining protocols will always have a way to catch up.
The question then turns to "who should assign the order or sequence"?

This post attempts to answer the 2nd question in an expository style (and will not delve into details).

......................................................................................................................................................................

In a distributed system, there are typically two main roles for processes: Senders and Destinations

Senders send the messages, destinations (receive and deliver) the messages. This is how the communication is achieved. 

In order to achieve global "ordering" or "sequencing", a 3rd role is needed i.e. the role of a "Sequencer".

It turns out that a "Sequencer" based ordering is not the only ordering scheme. As we go through the following sections, we'd see that "total ordering" can be achieved without a sequencer as well. 

Achieving global 'total order' with sequencers

In a "fixed" variant, the sequencer's role is fixed to a certain process in the system and the responsibility is not transferred to another process. Senders send the "un-sequenced" messages to the sequencer first which stamps them with a sequence number and publishes to the destinations.
 

In the "unicast" variant of "fixed" sequencer, senders unicasts the message to the sequencer which then stamps it with a sequence and broadcasts to destinations.


In the "BB broadcast broadcast" variant as shown above, sender broadcasts to all destinations including the sequencer. The sequencer then stamps and broadcasts again to all destinations. This incurs more messages flowing in the system but the trade-off is that this approach is more tolerant to the failure of sequencer.


The 3rd variant of "fixed" sequencer is where the sender requests for a sequence from the sequencer first and then broadcasts the message to all destinations. 

As we can see the 1st and 3rd variants suffer from single point failure of the sequencer process but incur less messages in the system.


'Moving sequencer' employs a different approach. The the role of sequencer becomes transferable as shown here. Multiple sequencers in the system distribute the load among themselves hence addressing the issue of 'single point failure' as was the case with 'fixed sequencers'.

In a 'moving sequencer' variant, the sender sends a message to the sequencers ring which circulates a token (carrying the sequence number) and all sequenced messages. Upon receipt of the token, a specific sequencer stamps the un-sequenced messages with the next sequence number, broadcasts such new messages to the destinations and passes the token to next sequencer in the ring. 

In this way, all sequencers always have the latest history of sequenced messages thus addressing the failure issue.

However a trade off with 'moving sequencer' approach is its complexity which makes 'fixed' variant a preferred choice.

Achieving global 'total order' without sequencers

In a system, where sequencers are absent, total ordering can still be achieved.

In a 'privilege' based system as shown below, the idea is that the senders can only broadcast a message when they have the privilege to do so.

That privilege manifests in the form of a token which is circulated among the senders. Only the currently owning sender can broadcast sequenced messages to the destinations. If another sender wishes to publish, it must wait for the token ownership. The token carries the same information as was in the case of 'moving sequencer'. In this way all senders always have the history of so far sequenced messages thus bringing total order in the system.


The trade off with such system is the induced wait for token ownership before an aspiring sender can publis
h. This approach is different from 'moving sequencer' because there is no central role of sequencers and senders are responsible for broadcast. The liveness of the senders is managed by time quanta so that all senders have an equal chance. 

Another prominent scheme to achieve global total order without sequencers is based on agreement among destination processes on how the messages must be delivered. This is the space where 'consensus algorithms' exist.


Such protocols strive to arrive at a consensus in the form of a number or id which will be used as an additional trait to stamp the messages during that round of consensus. 

Each round of consensus comes up with a consensus number K, and next round's number K' such that 

K < K' 

Thus all the messages stamped with K will be delivered before any messages stamped with K'. Hence achieving the effect of global total order. 

Popular variations of consensus exist as Raft, Paxos, ZAB, ViewStamped Replication.

It turns out that contemporary distributed systems use consensus to achieve net effects of global total order as we see here:

                                                           
In this post, we discussed the problem of "global sequencing" in a distributed system and the approaches which eventually lead to atomic broadcast and consensus protocols. For further information, here are some references:

  • ZAB: http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf
  • Raft: https://raft.github.io/
  • Paxos: https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf
  • Atomic Broadcast: https://en.wikipedia.org/wiki/Atomic_broadcast










Comments

Popular posts from this blog

C++ logging using Apache Log4cxx on Linux

I'd demonstrate the set up of Apache log4cxx on Ubuntu 12.x in following steps: 1. Download all required packages 2. Build Log4cxx via Apache Maven 3. Use the libraries in a NetBeans C++ project on Ubuntu Linux. Reference URL: http://www.yolinux.com/TUTORIALS/Log4cxx.html This URL is quite detailed and explains things for other *nix operating systems as well. I wanted to start with minimum steps required, hence this post. I have a Windows 7 desktop and have Ubuntu 12.x 64-bit running on it via Oracle Virtualbox. So Ubuntu is running as a guest on my Windows 7 host. [The reference URL mentions different versions of  'libaprXX' libs but we have to use 'libapr1-dev' and 'libaprutil-dev', will see that later.] Prerequisites (install all of the following one by one) autoconf and automake --> sudo apt-get install autoconf automake libxml2 and libxml2-devel ( http://xmlsoft.org/ ) --> sudo apt-get install libxml2 libxml2-dev gmp ( ...

C++11 std::thread Construction and Assignment Tricks

C++11 arrived with std::thread and related classes built within the standard. With the move semantics in place, the construction and assignment practices of 'std threads' deserve a small write up. Let's define a 'Runnable' first: class Runnable {    public:       virtual void operator ()() = 0;       virtual ~Runnable(){} }; class Task : Runnable {      public:          void operator ()()          {               //do sth here..          } };                                           ...

C++ Initialization Subtleties (significance of -Werror compiler option)

Modern C++ continuous integration build pipelines might produce huge logs which may be easily overlooked. Among other errors/warnings, a potential risk caused by invalid narrowing assignments might be lurking in those dark corners... This write up is a little reminder about an essential feature in modern C++ compilers and can help defend against that specific problem. Prior to C++11, the following was a valid assignment and still is even in C++20 or higher ... unsigned int unsignedSinceBirth = 10; unsignedSinceBirth = -10;  // assigning a negative value to an unsigned container printUnsignedVal(unsignedSinceBirth); which when compiled using these options "g++ -std=c++20 <cpp files> -o <executable-name>" does not even emit a warning. And an executable is generated successfully. The output of running that code is an arbitrary unexpected value. Modern C++ (post Cpp11) allow uniform initialization syntax which can help the compiler detect this situation as follows: ...