Skip to main content

Achieving global total order in distributed systems without Sequencers

A seminal question to answer in a distributed event driven "fault tolerant" system is how to recreate state should there be a need for fault tolerance and recovery. 

A typical and easy to rationalize approach is the assignment of a globally increasing sequence to the events flowing across the system. The intent is to bring a global "order" or "sequence" into the system.

Later on this global ordering will pave the way for a resilient yet distributed system. 

It will be:
  • predicable - replaying the events from message commit logs will always produce the same net effect.
  • reliable and fault tolerant - as long as there are commit logs maintained, faults recovery and late joining protocols will always have a way to catch up.
The question then turns to "who should assign the order or sequence"?

This post attempts to answer the 2nd question in an expository style (and will not delve into details).

......................................................................................................................................................................

In a distributed system, there are typically two main roles for processes: Senders and Destinations

Senders send the messages, destinations (receive and deliver) the messages. This is how the communication is achieved. 

In order to achieve global "ordering" or "sequencing", a 3rd role is needed i.e. the role of a "Sequencer".

It turns out that a "Sequencer" based ordering is not the only ordering scheme. As we go through the following sections, we'd see that "total ordering" can be achieved without a sequencer as well. 

Achieving global 'total order' with sequencers

In a "fixed" variant, the sequencer's role is fixed to a certain process in the system and the responsibility is not transferred to another process. Senders send the "un-sequenced" messages to the sequencer first which stamps them with a sequence number and publishes to the destinations.
 

In the "unicast" variant of "fixed" sequencer, senders unicasts the message to the sequencer which then stamps it with a sequence and broadcasts to destinations.


In the "BB broadcast broadcast" variant as shown above, sender broadcasts to all destinations including the sequencer. The sequencer then stamps and broadcasts again to all destinations. This incurs more messages flowing in the system but the trade-off is that this approach is more tolerant to the failure of sequencer.


The 3rd variant of "fixed" sequencer is where the sender requests for a sequence from the sequencer first and then broadcasts the message to all destinations. 

As we can see the 1st and 3rd variants suffer from single point failure of the sequencer process but incur less messages in the system.


'Moving sequencer' employs a different approach. The the role of sequencer becomes transferable as shown here. Multiple sequencers in the system distribute the load among themselves hence addressing the issue of 'single point failure' as was the case with 'fixed sequencers'.

In a 'moving sequencer' variant, the sender sends a message to the sequencers ring which circulates a token (carrying the sequence number) and all sequenced messages. Upon receipt of the token, a specific sequencer stamps the un-sequenced messages with the next sequence number, broadcasts such new messages to the destinations and passes the token to next sequencer in the ring. 

In this way, all sequencers always have the latest history of sequenced messages thus addressing the failure issue.

However a trade off with 'moving sequencer' approach is its complexity which makes 'fixed' variant a preferred choice.

Achieving global 'total order' without sequencers

In a system, where sequencers are absent, total ordering can still be achieved.

In a 'privilege' based system as shown below, the idea is that the senders can only broadcast a message when they have the privilege to do so.

That privilege manifests in the form of a token which is circulated among the senders. Only the currently owning sender can broadcast sequenced messages to the destinations. If another sender wishes to publish, it must wait for the token ownership. The token carries the same information as was in the case of 'moving sequencer'. In this way all senders always have the history of so far sequenced messages thus bringing total order in the system.


The trade off with such system is the induced wait for token ownership before an aspiring sender can publis
h. This approach is different from 'moving sequencer' because there is no central role of sequencers and senders are responsible for broadcast. The liveness of the senders is managed by time quanta so that all senders have an equal chance. 

Another prominent scheme to achieve global total order without sequencers is based on agreement among destination processes on how the messages must be delivered. This is the space where 'consensus algorithms' exist.


Such protocols strive to arrive at a consensus in the form of a number or id which will be used as an additional trait to stamp the messages during that round of consensus. 

Each round of consensus comes up with a consensus number K, and next round's number K' such that 

K < K' 

Thus all the messages stamped with K will be delivered before any messages stamped with K'. Hence achieving the effect of global total order. 

Popular variations of consensus exist as Raft, Paxos, ZAB, ViewStamped Replication.

It turns out that contemporary distributed systems use consensus to achieve net effects of global total order as we see here:

                                                           
In this post, we discussed the problem of "global sequencing" in a distributed system and the approaches which eventually lead to atomic broadcast and consensus protocols. For further information, here are some references:

  • ZAB: http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf
  • Raft: https://raft.github.io/
  • Paxos: https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf
  • Atomic Broadcast: https://en.wikipedia.org/wiki/Atomic_broadcast










Comments

Popular posts from this blog

C++11 std::thread Construction and Assignment Tricks

C++11 arrived with std::thread and related classes built within the standard. With the move semantics in place, the construction and assignment practices of 'std threads' deserve a small write up. Let's define a 'Runnable' first: class Runnable {    public:       virtual void operator ()() = 0;       virtual ~Runnable(){} }; class Task : Runnable {      public:          void operator ()()          {               //do sth here..          } };                                                 //later..      #include <thread>     Task runnable;     std::thread t1; //not attached to any thread                                                 std::thread t2 ( runnable ); //attached and running   Copy Construction of std::thread is not allowed            std::thread t3 ( t2 ); // compilation error            std::thread t4 = t2; // compilation error error : use of deleted function std::thread (const std::th

C++ logging using Apache Log4cxx on Linux

I'd demonstrate the set up of Apache log4cxx on Ubuntu 12.x in following steps: 1. Download all required packages 2. Build Log4cxx via Apache Maven 3. Use the libraries in a NetBeans C++ project on Ubuntu Linux. Reference URL: http://www.yolinux.com/TUTORIALS/Log4cxx.html This URL is quite detailed and explains things for other *nix operating systems as well. I wanted to start with minimum steps required, hence this post. I have a Windows 7 desktop and have Ubuntu 12.x 64-bit running on it via Oracle Virtualbox. So Ubuntu is running as a guest on my Windows 7 host. [The reference URL mentions different versions of  'libaprXX' libs but we have to use 'libapr1-dev' and 'libaprutil-dev', will see that later.] Prerequisites (install all of the following one by one) autoconf and automake --> sudo apt-get install autoconf automake libxml2 and libxml2-devel ( http://xmlsoft.org/ ) --> sudo apt-get install libxml2 libxml2-dev gmp (

..Where Apache Camel meets Spring JMS

This post is relevant for developers who are aware of Apache Camel and use its JMS component to consume from a JMS broker. We are only discussing the message consumption part here. I'm assuming you are conversant with Camel's terminology of routes, endpoints, processors and EIPs. If not, following is a good place to start: Link --->  Apache Camel Home   Ever wondered what happens underneath when a Camel consumer route is defined to take messages from a JMS endpoint ? This post is an attempt to explain how Camel integrates with JMS via Spring APIs and how the business logic defined in a Camel route is invoked ? Following is a simplified sequence diagram showing the set-up process which will eventually lead to message consumption from JMS broker. . Creation of JMS endpoints in a camel context is initiated by definition (via Java/XML DSL). A JMS endpoint takes a number of configurable attributes as seen here Camel JMS Component . As soon as context is bootstrappe