Why ZeroMQ?

This is an excerpt from Pieter Hintjens’ book ZeroMQ:

Many applications these days consist of components that stretch across some kind of network, either a LAN or the Internet. So, many application developers end up doing some kind of messaging. Some developers use message queuing products, but most of the time they do it themselves, using TCP or UDP. These protocols are not hard to use, but there is a great difference between sending a few bytes from A to B and doing messaging in any kind of reliable way. Let’s look at the typical questions we face when we start to connect pieces using raw TCP. Any reusable messaging layer would need to address all or most of these:

  • How do we handle I/O? Does our application block, or do we handle I/O in the background? This is a key design decision. Blocking I/O creates architectures that do not scale well, but background I/O can be very hard to do right.
  • How do we handle dynamic components (i.e., pieces that go away temporarily)? Do we formally split components into “clients” and “servers” and mandate that servers cannot disappear? What, then, if we want to connect servers to servers? Do we try to reconnect every few seconds?
  • How do we represent a message on the wire? How do we frame data so it’s easy to write and read, safe from buffer overflows, and efficient for small messages, yet adequate for the very largest videos of dancing cats wearing party hats?
  • How do we handle messages that we can’t deliver immediately? Particularly if we’re waiting for a component to come back online? Do we discard messages, put them into a database, or put them into a memory queue?
  • Where do we store message queues? What happens if the component reading from a queue is very slow and causes our queues to build up? What’s our strategy then?
  • How do we handle lost messages? Do we wait for fresh data, request a resend, or do we build some kind of reliability layer that ensures messages cannot be lost? What if that layer itself crashes?
  • What if we need to use a different network transport? Say, multicast instead of TCP unicast? Or IPv6? Do we need to rewrite the applications, or is the transport abstracted in some layer?
  • How do we route messages? Can we send the same message to multiple peers? Can we send replies back to an original requester?
  • How do we write an API for another language? Do we reimplement a wire-level protocol, or do we repackage a library? If the former, how can we guarantee efficient and stable stacks? If the latter, how can we guarantee interoperability?
  • How do we represent data so that it can be read between different architectures? Do we enforce a particular encoding for data types? To what extent is this the job of the messaging system rather than a higher layer?
  • How do we handle network errors? Do we wait and retry, ignore them silently, or abort?

Take a typical open source project like Apache ZooKeeper. Read the C API code in src/c/src/zookeeper.c. When I read this code in 2010, it was 3,200 lines of mystery, and in there is an undocumented client/server network communication protocol. I see it’s efficient because it uses poll() instead of select() . But really, ZooKeeper should be using a generic messaging layer and an explicitly documented wire-level protocol. It is incredibly wasteful for teams to be building this particular wheel over and over. But how do we make a reusable messaging layer? Why, when so many projects need this technology, are people still doing it the hard way by driving TCP sockets in their code, and solving the problems in that long list over and over?

It turns out that building reusable messaging systems is really difficult, which is why few free and open source (FOSS) projects ever tried, and why commercial messaging products are complex, expensive, inflexible, and brittle. In 2006, iMatix designed the Advanced Message Queuing Protocol, or AMQP, which started to give FOSS developers perhaps the first reusable recipe for a messaging system. AMQP works better than many other designs, but remains relatively complex, expensive, and brittle. It takes weeks to learn to use it, and months to create stable architectures that don’t crash when things get hairy.

Most messaging projects (like AMQP) that try to solve this long list of problems in a reusable way do so by inventing a new concept, the “broker,” that does addressing, routing, and queuing. This results in a client/server protocol or a set of APIs on top of some undocumented protocol that allows applications to speak to this broker. Brokers are an excellent thing in reducing the complexity of large networks. But adding broker-based messaging to a product like ZooKeeper would make it worse, not better. It would mean adding an additional big box, and a new single point of failure. A broker rapidly becomes a bottleneck and a new risk to manage. If the software supports it, we can add a second, third, and fourth broker and make some failover scheme. People do this. However, it creates more moving pieces, more complexity, more things to break.

Also, a broker-centric setup needs its own operations team. You literally need to watch the brokers day and night, and beat them with a stick when they start misbehaving. You need boxes, and you need backup boxes, and you need people to manage those boxes. It is only worth doing for large applications with many moving pieces, built by several teams of people over several years.

So, small to medium application developers are trapped. Either they avoid network programming and make monolithic applications that do not scale, or they jump into network programming and make brittle, complex applications that are hard to maintain. Or they bet on a messaging product, and end up with scalable applications that depend on expensive, easily broken technology. There has been no really good choice, which
may be why messaging is largely stuck in the last century and stirs strong emotions—negative ones for users, gleeful joy for those selling support and licenses.

What we need is something that does the job of messaging, but does it in such a simple and cheap way that it can work in any application, with close to zero cost. It should be a library with which you link without any other dependencies. No additional moving pieces, so no additional risk. It should run on any OS and work with any programming language.

And this is ØMQ: an efficient, embeddable library that solves most of the problems an application needs to become nicely elastic across a network, without much cost.

Specifically:

  • It handles I/O asynchronously, in background threads. These communicate with application threads using lock-free data structures, so concurrent ØMQ applications need no locks, semaphores, or other wait states.
  • Components can come and go dynamically, and ØMQ will automatically reconnect. This means you can start components in any order. You can create “service-oriented architectures” (SOAs) where services can join and leave the network at any time.
  • It queues messages automatically when needed. It does this intelligently, pushing messages as close as possible to the receiver before queuing them.
  • It has ways of dealing with over-full queues (called the “high-water mark”). When a queue is full, ØMQ automatically blocks senders, or throws away messages, depending on the kind of messaging you are doing (the so-called “pattern”).
  • It lets your applications talk to each other over arbitrary transports: TCP, multicast, in-process, inter-process. You don’t need to change your code to use a different transport.
  • It handles slow/blocked readers safely, using different strategies that depend on the messaging pattern.
  • It lets you route messages using a variety of patterns, such as request-reply and publish-subscribe. These patterns are how you create the topology, the structure of your network.
  • It lets you create proxies to queue, forward, or capture messages with a single call. Proxies can reduce the interconnection complexity of a network.
  • It delivers whole messages exactly as they were sent, using a simple framing on the wire. If you write a 10KB message, you will receive a 10KB message.
  • It does not impose any format on messages. They are blobs of zero bytes to gigabytes large. When you want to represent data you choose some other product on top, such as Google’s protocol buffers, XDR, and others.
  • It handles network errors intelligently. Sometimes it retries, sometimes it tells you an operation failed.
  • It reduces your carbon footprint. Doing more with less CPU means your boxes use less power, and you can keep your old boxes in use for longer. Al Gore would love ØMQ.
Advertisements

Subjectivity aside, leave a reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s