We had talked about various ways to serve machine learning results in production (see an ealier post Predictive Model Deployment with Spark for example). That article outlines three architectures of model serving systems. In particular, I found using Spark’s internal serialization logic to persist/load models to be both flexible and reliable. As a follow-up, in this post, I am going to show case how to serve a simple Spark MLLib machine learning model, i.e. Naive Bayes classifier as an example, in a web application built with the Play Framework, which is one of the most popular web frameworks in Scala/Java.
Today machine learning models are often trained on a large Spark cluster to take advantage of its powerful distributed computing capability and easy-to-use APIs. Since the offline training part is not the focus of this post, I will just build on top of a toy training pipeline from the official Spark tutorial
There are a few model storage options and I decide to use S3 for its simplicity and avaliability. Moreover, Spark provides nice support to save serialized model directly to S3. All you need to do is to configure your
SparkContext with the right S3 credentials and then add the following lines to the previous code snippet.
If you go to your AWS console, you could find
.parquet files stored under your S3 bucket
persisted-models/naivebayesexample, which represent the model internal serialization logic of Spark.
Note that I am using the
s3a URI schema to interact with S3. There are in total three variants as described in https://wiki.apache.org/hadoop/AmazonS3
S3 Native FileSystem (URI scheme: s3n) A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.
S3A (URI scheme: s3a) A successor to the S3 Native, s3n fs, the S3a: system uses Amazon’s libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema.
S3 Block FileSystem (URI scheme: s3) A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.
I choose to use S3A is because 1) it is an object-based overlay on top of S3, unlike S3 Block FileSystem, and 2) it has better performance and suports object up to 5TB compared to S3 Native FileSystem with 5GB object size limit.
Now let’s come to the online serving part. To save some time, I am assuming you already have a Play application up and running (I may write a follow-up post to explain how to set up a minimal web application using the Play Framework, which should be fairly straightforward). For now, let’s also assume the project name of the Play application is
yoda and its skeleton looks like the following.
The trained model is saved on S3 already at this moment, what we want is to load up the model in memory for
yoda. To achieve this, additional dependencies need to be added to the web application. Specifically, we need to add
guava: dependency injection and cache
hadoop: read files from AWS S3
spark: deserialize parquet files to Spark ML model and make predictions
You can add these dependencies in the
build.sbt file by adding the following lines.
With these dependencies, we can now load trained model from S3 to memory. Note that we can use some cache mechanism to keep the model in memory for prediction and refresh it if the model is updated. For similicity, we are going to keep the model in memory and refresh every 24 hours in this toy example. More complex caching logic should be use case specific.
The logic can be added to
Essentially, the web application
yoda runs a Spark in local mode and use its built-in functionality to load the saved model from S3. This approach is pretty generic in the sense that all types of models
.save to S3 can be loaded in memory by a web application with minimal code changes.
Once the Spark ML model is loaded in memory, making prediction based on incoming request is straightforward. Most Spark ML models have a built-in
.predict method that takes in an array of features and returns a prediction score.
You can add the following lines to your
Application.scala to make predictions on randomly generated features.
Building machine learning models are fun and challenging, but, at the end of the day, we want to serve them to our users. This often means expose some endpoints in a web service. The described method in this post aims to provide a generic way to serve all kinds of machine learning models to improve the experience of our users.
As always, I would really appreciate your thoughts/comments. Feel free to leave them following this post or tweet me @_LeiG.