There are several people from many different backgrounds that are involved in any Machine Learning Project. In this chapter of the MLOps tutorial, you will learn about the different personas that are involved in a Machine Learning Project and their respective roles and skillsets.
Data Engineers
Data Engineers are responsible for retrieving data from various datasets and providing it to Data Scientists. They are also responsible for building the data pipelines for the flow of data from the databases for various purposes such as model training and testing, analytics, visualization, etc.
Data Engineers have knowledge of various types of relational and non-relational databases although variants of relational databases such as MySQL, Oracle, and PostgreSQL are the most commonly used. Nowadays, many organizations are storing their data on cloud providers storage systems. Therefore, having knowledge of various cloud computing and cloud storage providers such as Google Cloud Platform(GCP), Amazon Web Services(AWS), and Microsoft Azure is also required from Data Engineers.
Data Scientists
Data Scientists are responsible for building the Machine Learning models. This includes everything from analysis, exploration, and vizualization of the data to training the model and testing it.
Data Scientists are well versed with interpreted programming languages which are commonly used for building Machine Learning models such as Python, Julia, R, etc. Along with their programming skills, they also have a solid understanding of Mathematical concepts such as Statistics, Probability, Linear Algebra, Probabilities, and Derivates which are required to understand Machine Learning. As many organizations are already storing their data on the cloud, and since processing the data and training models frequently requires the use of dedicated computer hardware(such as GPUs, TPUs, etc), choosing to train the models in the cloud itself is increasingly growing popular. Therefore, having a knowledge of various cloud computing and cloud storage providers such as Google Cloud Platform(GCP), Amazon Web Services(AWS), and Microsoft Azure is also required from Data Scientists as it is required from Data Engineers.
Since Python is the most used language for training Machine Learning models, having knowledge of Machine Learning libraries and frameworks such as XGBoost, Tensorflow, Pytorch, etc is also required from Data Scientists.
Software Developers
Software Developers don’t have knowledge about Machine Learning or Databases, they are experts at building the software of applications. They are the primary stakeholders that are responsible for integrating Machine Learning models with the rest of the software, once it has to be moved to production.
Software Developers are skilled at building software applications by using programming languages such as Java, JavaScript, Python, Go, Rust, etc. With the rise of cloud computing, there has been a rise in the usage of Software as a Service, Platform as a Service, and Infrastructure as a Service being used for deploying applications. The ecosystem provided by leading cloud service providers such as GCP, AWS, and Azure makes it attractive to move to cloud as a one-stop shop. For this reason, having knowledge of various cloud computing providers can be really helpful.
DevOps and ITOps
The role of DevOps engineers is to perform the transition of the application(along with Machine Learning model) from Development to Operations. The role of IT Operations Engineers it to maintain the application in production.
DevOps Engineers have a very diverse skill-set and have experience with various tools such as Docker, Kubernetes, Git, Testing Frameworks, Scripting Languages, Jenkins, Ansible, etc. IT Operations Engineers have knowledge about monitoring, automation, Scripting Languages, Databases, etc.
With widespread adoption of Cloud, having knowledge of leading cloud comping providers such as GCP, AWS, Azure can be helpful.
MLOps Engineer
MLOps Engineer is an emerging role in Machine Learning projects. They are responsible for implementing best practices of MLOps in the Machine Learning Project.
MLOps Engineers have knowledge and skill set which intersects those of Data Scientists and DevOps Engineers while also having an understanding of best practices of MLOps. MLOps Engineers should have a knowledge about various Machine Learning and Deep Learning algorithms, Scripting Languages, Version Control Tools, Containerization Tools, CI/CD Pipeline tools, etc.
Other Roles
Besides the various personas mentioned in this chapter, there are various personas that perform roles depending on the project. These include, but are not limited to-
- Subject Matter Experts– Subject Matter Experts or SMEs have knowledge of Business and specify the use-case, goals, and feedback for Machine Learning models.
- Management– Management helps the Machine Learning project team achieve their goals.
- Risk Management– Manage the risk associated with the use of Machine Learning models.
- Compliance– People in Compliance ensure that both, data use and Machine Learning models comply with the local rules and regulations.
- Auditors– Auditors audit the Machine Learning models to check for fairness in predictions.