TL;DR. In any distributed system, it is all about coordination and data consistency.
When studying distributed systems, you will eventually stumble into the debate of whether consistency needs consensus or not.
But as a distributed system is basically a collective unit of two distinct parts (distributed computing and distributed data), we have to be laser focused as to what component are we talking to when we refer to consistency.
This is the N-tier (or the business logic tier) when referring to the Three-Tier Architecture. This is the stateless tier as far as the entire system is concerned.
However, that is oversimplification.
You also need to account the management aspect of distributed computing like Kubernetes, Nomad and the like.
It can refer to the following:
- Web server farm (RPC, pull, bi-directional)
- Message queue (Messaging, push, uni-directional)
- Resource scheduling — Kubernetes, Nomad
- Orchestration — Habitat.sh
With distributed computing, the keyword is coordination.
Here, we are not talking about single-node database system. The keyword in distributed data is consistency.
- Distributed SQL-based — CockroachDB
- Distributed NoSQL-based — Cassandra and the like
A Tale of Two Camps
Once you get the breadth and depth of distributed systems, it will dawn on you that distributed systems basically fall into one of two camps.
It is all about how you achieve overall system reliability in the presence of a number of faulty processes.
The quote above is from Wikipedia definition of consensus.
However, just as computing is nothing without data (input/output) and processing, distributed system is the same.
Distributed system is an intricate dance between coordination and data consistency.
a) Centralized Coordination
In a centralized coordination system, the buck stops with the master controller. The master controller sees the overall state of the system with the help of a distributed data store.
Here, intelligence is centralized. It is the typical command-and-control system. The master controller orchestrates the entire ensemble and kicks you out if you are not performing. It alerts humans when something goes wrong.
However, the master controller is not invulnerable.
It too can fail.
That’s why there is leader election when the master controller itself dies.
This is the gist of Paxos-based protocols and its derivatives like Raft.
It is called consensus.
b) Decentralized Coordination
Here, intelligence is localized and distributed among master nodes in a scale-free network fashion. On large scale, it resembles the architecture of the Internet but on local networks, it is more of a gossip-based protocol.
On the Internet, if your ISP is wiped out all of a sudden, the Internet can get around that failure. The packets may still find its way to its destination using the tried-and-tested resiliency of TCP/IP protocols.
With decentralized coordination, distributed computing is based on gossip protocol.
With decentralized coordination, distributed data uses eventual consistency or CRDT (consistency without consensus).
Examples of tools using decentralized coordination:
- Weave Net
- Hashicorp Serf
Distributed systems, whether they exhibit centralized or decentralized coordination, aim to be self-organizing and self-healing. In other words, all aim to be autonomic systems.
Whether you subscribe to one or the other, you owe it to yourself to read more about it here. Maybe, it is not about either/or. It is more about knowing the WHY so you won’t be fooled by media hype and snake oil.