How are threads relevant to Node.js performance?
I have been interested in ways of offloading CPU-intensive work from the Node.js main Event Loop thread ever since I noticed the promising webworker-threads library many years ago in the Node.js v0.10.x days. And since the Node v10.5.0 release, Worker Threads have been available in the standard library of a stable Node.js version, albeit still marked as experimental. The usefulness of performing work in Node.js threads other than the main thread, boils down to two main reasons, discussed below.
1) CPU intensive processing in the Node.js main thread kills throughput
A single Node.js process utilizing one CPU core can process at least in the order of 10 000 requests per second, if completing each request only requires a small amount of CPU work and e.g. your database engine or another backend service does the heavy lifting. This matches the workload of many simple REST API backends, e.g. those with straightforward CRUD APIs.
However, say that just a few times per second you get a request which requires a relatively innocent 100ms of main thread CPU work to complete, for a total of 200-300ms every second. While executing that CPU work for those few requests, you will have missed the chance to perform the database queries for at least several thousands of simple requests and your total throughput will be massively reduced!
I’ve found that there’s usually at least some CPU intensive operations in almost all real-world backend services. Typical examples include:
Serializing to JSON large responses to API requests, such as those resulting from GraphQL or objection.js relation graph queries
Evaluating complex business logic rules on large data volumes (e.g. dynamic pricing of many products)
Collecting and transforming larger bunches of data from the database for e.g. generating some single report or document
Parsing large complex documents to an easily queryable representation, e.g. when web scraping
If we could run these kinds of tasks outside the main event loop thread using Worker Threads, the main thread could be operating at peak performance, dispatching queries to the database and returning results to clients etc. all the time!
2) The main thread alone can’t fully utilize modern CPUs
All modern high-performance CPUs have more than one CPU core, and each core might also be able to be utilized more efficiently by running two different threads on it concurrently via hyperthreading. If all your backend code runs in the single Node.js main thread, you can’t utilize but one of the many CPU cores, and not even that one optimally as that would require at least two active threads. Even the smallest fixed-performance Amazon EC2 instance type that is suitable for an horizontal auto-scaling setup, c5.large, offers two hyperthreads of the same physical CPU as vCPUs. A single-threaded workload is only able to utilize about 80% of the CPU power of c5.large and other similar 2-vCPU EC2 instances.
One can better utilize the server instance with multiple logical CPU cores by running multiple Node.js processes on the same server instance and load-balancing incoming requests between them. This kind of cluster of processes is very easy to run using e.g. the pm2 process manager, something which I highly recommend if running Node.js applications on a multi-core or hyperthreading single-core server instance.
The drawback of the multi-process cluster model is that it requires lots of RAM, as certain memory items such as the JIT generated machine code will be duplicated in each process. An even bigger factor is that the different processes in the cluster can’t co-operate on when to garbage collect, so one process might fail to allocate sufficient memory for processing a request, even if another process could easily free up memory by running their GC. Worker Threads could utilize multiple CPU cores in a single Node.js process, with less memory usage than a process cluster setup.
Note that a cluster setup won’t make the more expensive requests finish any faster, as no matter how many CPUs you have, each request executed by one process in the cluster, and thus uses only one CPU core at a time. Some expensive requests could be completed faster by splitting them to parts that are run in parallel in separate Worker Threads, and thus be completed faster.
Talking to Worker Threads can be slow
Instead of sharing data directly, the main thread must send input to the workers and workers must send results back to the main thread by using the postMessage method. See e.g. this tutorial for a practical example.
The trouble is that, at least historically, postMessage has been pretty slow with any larger amounts of data. So slow, in fact, that with any kind of complex data, it has been faster to JSON.stringify the data in the sending end, transfer it as a string, and JSON.parse it in the receiving end. But, as noted before, JSON (de)serialization can be pretty slow, and well, if the operation you are trying to offload to a worker is JSON parsing or serializing, having to do the same operation also in the main thread to get the data to the worker pretty much nullifies any benefit you could otherwise get.
Note how all the examples I gave above of the kinds of CPU intensive tasks I’ve encountered in real-world Node.js applications are all expensive specifically because of the large amounts of data involved and/or the complexity of the data. The artificial examples with small data but still expensive processing like Fibonacci series generation with a bad algorithm, that you see in so many worker tutorials, just aren’t that common in the real world . So, the performance of transferring large and complex data back and forth with the worker threads is key to getting any benefit from executing processing in a threads.
Which kinds of tasks can actually benefit from Worker Threads, then?
So, I will do some benchmarking of my own. Maybe even the JSON parsing / serialization that we all have in pretty much any app could benefit? We’ll see - stay tuned for the results!