Thoughts on Scaling

To quote from blog post of Alex Payne:

Scaling in the small

In a system of no significant scale, basically anything works.

Basically, you can apply every available anti-pattern and still come out the other end with a workable system, simply because the hardware can move faster than your bad decision-making.

Scaling in the large

In a system of significant scale, there is no magic bullet.

When your system is faced with a deluge of work to do, no one technology is going to make it all better. When you’re operating at scale, pushing the needle means a complex, coordinated dance of well-applied technologies, development techniques, statistical analyses, intra-organizational communication, judicious engineering management, speedy and reliable operationalization of hardware and software, vigilant monitoring, and so forth.

Scaling is hard. So hard, in fact, that the ability to scale is a deep competitive advantage of the sort that you can’t simply go out and download, copy, purchase, or steal.

Your choice of software will dictate your ability to scale based on respective tier of your application.

From the user tier perspective, you have no choice but JavaScript. Whatever information overload you may be getting regarding the so-called “JavaScript fatigue” is partly due to JavaScript’s expressive power and partly due to humans’ ability to ditch bad practices and come up with better ones. Let’s just say JavaScript fatigue is not information overload. It’s filter failure (to quote Clay Shirky). Or more appropriately, it’s failure to separate the wheat from the chaff.

That means it’s a never ending process of creating the next Browserify or VueJS. It never ends. Moreover, that means NodeJS is relegated as an aid to front-end development, not as backend workhorse (I expect to ruffle some feathers here).

From the business logic perspective (otherwise known as the backend), Go language is the clear winner. Okay, software wars is not a zero-sum game but Go is relatively easy to learn and easy to deploy (just a single binary). For multi-machine deployments, container is the right unit of abstraction. With Docker itself  written in Go, single-binary wrapped in a container is the smallest unit of deployment abstraction you can get (no need to include runtime software inside your container, just the bits you need).

That means it’s time to ditch .NET, JVM or your favorite dynamic programming language. Dynamic PL is not meant for stateless Web services.

Which brings us to the data tier perspective. Of course, SQL is not going to die. The current and single horse to beat for now is Apache Spark whose RDD (resilient distributed dataset) is the right abstraction for distributed data processing. However, Julia language is the next frontier. It’s just a matter of time if Julia Computing will come up with Spark alternative (or if it will simply leverage the Spark ecosystem).

All in all, this is purely subjective.

To each according to one’s subjectivity.

To summarize, here is the gist.

Front-end development (JavaScript) – VueJS, NodeJS, Browserify

Back-end development (Go) – oh, and no frameworks please

Data processing (Scala) – Apache Spark rules for now for scaling in the large, so Scala and the heavyweight JVM are here to stay (sigh) but hey, you can use Julia language for scaling in the small

The above choices are proven in production.

Now is the time to replicate FreeCodeCamp as your next business model and tweak according to your preferences.

This is what scaling is all about. Scale learning to masses and use software that scales in production. No corporate bullshit practices required.


Subjectivity aside, leave a reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s