Addressing Message Loss in NSQ

While reading NSQ design, it says no SPOF but actually, it is referring to no single point of failure on the NSQ consumer side (not the nsqd itself).

Here is one problem with nsqd:

This ensures that the only edge case that would result in message loss is an unclean shutdown of an nsqd process. In that case, any messages that were in memory (or any buffered writes not flushed to disk) would be lost.

If preventing message loss is of the utmost importance, even this edge case can be mitigated. One solution is to stand up redundant nsqd pairs (on separate hosts) that receive copies of the same portion of messages. Because you’ve written your consumers to be idempotent, doing double-time on these messages has no downstream impact and allows the system to endure any single node failure without losing messages.

So there you have it. There is really a SPOF on the nsqd side.

But don’t fret.

Adam Keys has one tip.

Keep your application state out of your queue.

In the case of NSQ, you can asynchronously store your messages before processing to a durable data store (like a SQL database or distributed key-value store). Once the NSQ consumer is finally done with the processing, you can update the message in the durable data store to DONE status (or something like that).

For distributed key-value store, there is ActorDB, Riak, CouchDB etc.

For SQL databases, well there is even a distributed SQLite like rqlite (but at this moment, still alpha stage).


Subjectivity aside, leave a reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s