Network Latency As a Service

2 minute read Published:

A few years ago I worked on an internal management application. One of the mysteries we faced was that it was painfully slow. A page load could take up to 8 seconds. But when we looked at the problem at our local development machines pages loaded within 2 seconds. Not great but a lot better.

So I profiled the app looking for something that might cost more in production. After a while about 600 calls to memcached caught my eye. They where fast on our local machines, but in production we had to use memcached as a hosted service. Which defeats the purpose of heaving a memcached a bit because you add network latency to getting data from RAM. So there it was, 600 calls with 5 to 10 ms latency each sums up to 3000 till 6000 ms extra for each request.

Those calls where made to speed up our ACL checks. Instead the framework we used would have made a call to the database for each request to load the whole ACL. So at first I removed duplicated calls which still left me with about 150 checks. Still 1500 ms per request too many. So I removed the code entirely and loaded the ACL from the database which was much faster than caching the individual calls. So caching did not made things faster but slower in the first place. No one measured the solution, because it is memcached, it has to be fast.

Actually the fastest solution would probably have been to just load the ACL from the file system. The file would have been needed on each request so the disk cache would have it kept in memory anyway. A free local memcached if your hosting cannot provide one.

So think about network latency before making calls to a remote cache, API or database. It might kill you performance. And having caching as a hosted service is a bad idea in the first place. But if you have to have your in memory database slowed down by network latency your computation/request has to be more expensive than that to achieve a benefit. But 5-10ms are a lot for a CPU, so calculating something on each request is probably still faster. So …

tl;dr: measure and don’t forget the cost of network latency.