Machine Learning models once developed and deployed need to be integrated with software applications. The Integration of the model is closely intertwined with Deployment. In the previous chapter, you already learned about deploying Machine Learning models. In this chapter of the MLOps tutorial, you will learn about the Integration of Machine Learning models.
Integration is the process of taking the deployed model and integrating it with the rest of the Software.
Integration Endpoints
The way the deployed model is integrated with the rest of the software depends on the integration end-point. The following are some of the various end-points used for integrating the models-
- Batch Inference– It is the process of making inferences on a batch of data at a time. This process is ideal for scenarios in which latency is not an issue. For example, if a Machine Learning system is used to predict the prices of houses stored in a database, latency is not an issue, and hence, batch inference can be used. Another advantage of batch inference is, issues with the model can be monitored and are not reflected back to the customers/ users immediately.
- Online Inference– It is the process of making inferences on the data in real-time. This process is desired in cases where the latency of inference is a big factor, such as cases that require a real-time prediction on the input data. For example, the online inference is required for applications such as self-driving cars and stock market prediction. A big drawback is that in online inference systems, any issues with the model are immediately reflected back to the customers/ users.
- Edge Inference– It is the process of making inferences on the edge device, such as a mobile phone or an IOT device. In edge inference, the edge device contains the models necessary for inferencing. The upside is that, since all of the computing is being performed at the edge device, no computing is required on the server-side. Therefore, Edge inference does not have scalability issues. On the downside, there is a limit to the model size that can be used for inferencing on an edge device. Another downside is that edge devices can have a host of different environments, such as hardware, operating systems, software, etc. Hence, it is difficult to maintain compatibility of the model within different
- API Inference– It is the process in which you expose your model through an API to your users/ customers. The inferences are made by making use of the API. This is ideal for cases in which service from a 3rd party is required. For example, making use of Google Text to Speech API inside your application for your specific purposes. Machine Learning models that are exposed through an API are typically billed services, and the use of API makes it easier to monitor their uses.