With NSQ, the messaging pipeline goes like this:
Message producer —> NSQ —> Message consumer
The message producer and consumer can be written in any programming language that is supported by NSQ.
Once a producer posted its message to NSQ, it is the responsibility of NSQ to deliver the message at least once. That is because message delivery to the consumer may encounter an error (network split, server crash, etc), so NSQ is responsible to requeue the failed message in memory. In the case of repeated error from the message consumer, NSQ will try to slow down pushing the message to the consumer. This is called backpressure.
In short, at-least-once delivery and backpressure are responsibilities of the message queue server (in our case, NSQ).
But there is another issue like message loss.
To mitigate message loss, we can use persistence with or without replication.
- persistence with nsqd itself (setting -mem-queue-size = 0 persists all messages to disk but it is an SPOF)
- persistence without replication (e.g. single-node KV store like BoltDB or single-node SQL database)
- persistence with replication (distributed KV store like TokuMX, replication-enabled SQL database). Message producer pushes message to NSQ and then choose whether it synchronously/asynchronously writes the message to persistent store)
Note that since NSQ is in-memory message queue, it is up to you whether you want to use persistence. With NSQ, you are trading persistence for high-speed performance.
The message consumer processes the messages it received from the message broker. The message consumer is responsible for ensuring idempotency, that is, it takes into account that should there be duplicate messages, it can safely ignore it. Thus, the first task for a message consumer is deduplication (that is, tracking message duplication).
There are techniques to deduplication:
- use UUID, GUID in message producer
- hash the message content (issue: unbounded memory) and store it to groupcache, redis etc
- hash the message content (use reverse of Bloom filter)
- producer inserts message with unique ID, consumer updates ID record in SQL database (or key in KV store)
The second task for a message consumer is the actual processing itself of the message (once it determines that a message is not duplicate).
That’s all there is to it with NSQ.