In a previous post, we talked about caching Spark models in memory for a web service so that the prediction latency is reduced. As with any web applications, caching strategy can get very interesting, but the patterns of caching a machine learning model are relatively straightforward, since the model is likely to be static unless there are updates. In particular, I find Guava provides some handy in-memory cache solutions for our use case.
In the rest of this post, I am going to walk you through some basic caching patterns using Guava. Note that, this is inspired by, but not limited to caching predictive models. As a reference example, we assume the goal is to serve a machine learning model, which is updated daily, in a web application built by the Play Framework.
To start with, a simple caching pattern is to load the model in-memory and evict it after a given time period (daily in our case). In our particular case, we will use
CacheLoader, since there is a default function (the machine learning model) to load associated with a key (model identifier); otherwise, you will need to pass a
Callable into a
With dependency injection, you could create a
CacheProvider for caching.
In this way, the cached model is evicted after 24 hours. For the immediate next query after eviction, the service will hang there until the model is loaded again so a higher latency is expected.
For timed eviction, if things went wrong during reloading, the service won’t be able to return anything because the old model is already evicted. This is of course is not ideal and may cause serious problems.
Instead, a better solution maybe timed refresh. The difference is that the old model (if any) is still returned while the key is being refreshed. Therefore, even if an exception is thrown while refreshing, the service is still able to return results from the old model, while the exception is logged and swallowed.
The change to switch from timed eviction to timed refresh is minimal - you just need to replace
The defauled refresh loads value synchronously. That means, the service will still hang there waiting for the new model to be loaded. This makes queries to have high latency during refresh and, thus, bad user experience.
Good news is that there is a way to set up the
CacheBuilder such that refresh happens asynchronously. Specifically, you need to overwrite the
reload method to be asynchronous.
Caching is one of the most interesting problems in web applications. Here I only talked about some most basic in-memory caching patterns, but they, especially the timed asynchronous refresh, seem to work well with predictive models, which is relatively static compared to other content.
As always, I would really appreciate your thoughts/comments. Feel free to leave them following this post or tweet me @_LeiG.