MLOps: Monitoring

Once the Machine Learning model is out in production and live, it is important to monitor it. In this chapter of the MLOps tutorial, you will learn about monitoring your Machine Learning model in production.

Monitoring is the last step of MLOps that completes the MLOps cycle.

MLOps Monitoring

Machine Learning models can be monitored at two levels-

  • Functional Level- At a functional level, monitoring the Machine Learning models includes checking the model’s functioning and the resources used by the model such as processing power(CPU or GPU), memory, latency, etc.
  • Performance Level- At a performance level, monitoring the Machine Learning models implies checking the performance of the model based on a performance metric.

Monitoring the Machine Learning model at a functional level is no different than monitoring any other piece of code and has been a part of DevOps methodology for a long time. Therefore, the focus of this chapter will be entirely on the monitoring of the performance of the model.


Performance Monitoring

The performance of a model that is in production can go down due to a variety of reasons, the most usual culprit being driftLearn More. Therefore, it is important to monitor the performance of the model to check if the model is performing the way it was expected to perform, or if there is a degradation in its performance. To be more specific, the performance of the model in production should be approximately the same as its performance in Training and Testing. This chapter will focus on two of the commonly used methods for performance monitoring.

Assessment using Ground Truth

One way to evaluate the accuracy of the model is to wait till you get the ground truth for the prediction. And once you have the ground truth, you can compare it with the predictions made by your model and compute the performance of your model in Production. If there is a statistically significant degradation in the performance of the model compared to when it was trained and tested, then it might be time to retrain the model or create new models.

However, getting the ground truth is not always easy and in some cases, it can prove to be very expensive or straightaway impossible. And in cases where it is possible to get the ground truth, it is not available immediately. Hence, there is a time lag between the time of prediction and the time when ground truth is available.

Detecting Drift in Input

Another way to monitor the performance of a model is by monitoring the distribution of various input features and prediction values. If there is a (statistically) significant difference in the distribution of training data and current input data, then there is a very high probability that the performance of the model has degraded. There are various methods of detecting input drift, the most popular being the Univariate Statistical Tests- performing statistical tests on all features to see if there is a statistical difference between training data and current input data. Many different kinds of Univariate Statistical Tests can be performed depending on the requirement and features. There are also some model-based methods for detecting input drift.


Logging

Monitoring the functioning and performance of your model is one thing. But what good is this monitoring for if you have not stored that data somewhere for analyzing at a later point in time? It is very important to have a proper logging system in place which stores the information being monitored about various models present in different environments in a centralized fashion. Automatic or manual analysis of this data can be performed and used for continuously improving the model further.