MLOps and Model Governance

Dieser Artikel ist auch auf Deutsch verfügbar

ML operations (MLOps) has received a lot of attention in recent times as it promises to bring machine learning (ML) models into production quickly, effectively, and for the long term. As the equivalent of DevOps in software engineering, MLOps is the extension of DevOps for the design, development, and sustainable deployment of ML models in software systems. It encompasses a host of processes and frameworks that help in the deployment of ML. Through the automatization of reproducible data and ML pipelines, the length of time required to bring models into production (time-to-market) is reduced. Figure 1 shows the most important phases of the ML life cycle according to CRISP-ML(Q). In total there are six interactive phases in the ML development process:

Business and data understanding
Data engineering
Model engineering
Quality assurance for ML systems
Deployment
Monitoring and maintenance

The operationalization of ML models is however not the only challenge facing many companies. The use of ML brings with it a range of responsibilities and obligations. In order to meet these requirements, a company requires processes through which it:

Monitors access to ML models
Implements guidelines and legal requirements
Tracks the interactions with the ML models and their results
Determines on what basis a model is created

Taken together, these processes are termed model governance.

Model Governance as a New Challenge

The importance of model governance often only becomes clear to companies when ML models are to go into production. Many enterprises have automatized their ML pipelines, but are now faced with the task of bringing the models in line with legal requirements. Not all companies are ready for this: According to an Algorithmia study from 2021, some 56 percent of respondents cited the implementation of model governance as one of the biggest challenges for successfully bringing ML applications into production. In Germany, IDG Research Services’ ML 2021 study found that 26.2 percent of companies see compliance risks as the biggest challenge. In addition, 35.8 percent cite legal considerations, such as the transparency of algorithmic decision-making, as a problem that is still to be resolved.

The Use of AI in Germany

Where exactly is ML being applied in German companies? According to the 2021 Machine Learning study, 73 percent of large organizations (with more than 10,000 employees) and 59 percent of smaller companies use ML. The main industries in which ML is used are IT (76%), production (57%), research and development (54.1%), logistics (52.9%), and sales (50.6%). Concrete use cases are quality assurance in production (53.8%), error reduction (43.8%), the automatization of processes (40.2%), automatized case management (36.7%), optimization of the supply chain (34.9%), chatbots (30.2%), route optimization (30.2%), predictive maintenance (29.6%), customer self-service (29.6%), and intelligent product development (29.6%). For all these use cases, companies must implement MLOps in order to deploy ML models, and at the same time construct model governance frameworks so that the models adhere to regulations and the quality of the systems is ensured.

Model Governance Will Not Be Optional

Companies already have to adhere to a variety of regulations. New, AI-specific requirements will be added to the existing catalog of regulations, adding to its complexity. In April 2021 the EU published a package as an initial legal framework for AI. The proposal adopts an approach that categorizes the different types of AI systems into four different levels of risk.

Fig. 2: The category of the AI system defines the measures that enterprises must adopt in order to meet the requirements of the respective category

Category 1 (“unacceptable risk”)

AI software that is classified as a clear risk for security, livelihoods, and human rights is forbidden (e.g., social scoring systems).

Category 2 (“high risk”)

AI software that falls into the high-risk category is subject to strict regulation. This includes the following aspects:: robustness, security, accuracy, documentation and logging, as well as appropriate risk assessment and risk mitigation. Further requirements are high-quality training data, freedom from discrimination, accountability, transparency, human monitoring as well as the need for conformity testing and proof of conformity through CE marking. Examples of ML systems in this category are private and public services (e.g., credit scoring) or systems used in school education or professional training to decide on access to education or on a person’s career development (e.g., marking of tests).

Category 3 (“limited risk”)

AI software in this category is subject only to an obligation of transparency. For example, chatbot users must be advised that they are interacting with AI software.

Category 4 (“minimal risk”)

AI software in this category is not subject to any regulation (e.g., spam filters).

As the regulations are not only to apply to companies and individuals located in the EU but also to any company offering AI services within the EU, the law would have a similar scope as the GDPR. The proposals will need to be approved by the EU parliament and pass through the legislative procedures of the individual member states. Once the law enters into force – in 2024 at the earliest – high-risk AI systems must undergo a conformity assessment. Only then can the AI system be registered in the EU database. As a final step a declaration of conformity is required, so that AI systems can obtain the necessary CE mark.

The EU draft has room for improvement though. Probably the greatest challenge for the regulations is the definition of an “AI system” on which the entire proposal is based. This definition has been very broadly conceived, which may lead to different interpretations and will probably be the biggest obstacle to implementation. Companies will struggle to bring their technical work in line with the definitions of the regulations, as the general character of the regulations means it remains unclear how the definitions, evaluation criteria, and requirements are to be implemented in practice. In order to counteract the resulting uncertainties on the part of companies, the draft should be reformulated and refer to specific sectors, industries, and use cases.

It is also important to note that the EU regulations are not the only decisive aspect for model governance. It is also relevant for ML systems outside a regulated domain that, although they fall into the low-risk category, are associated with a high level of business risk. For example, if a company sells a spam filter that frequently deletes important emails, its position in the market is at risk. Model governance is therefore needed not only for the fulfillment of legal requirements, but also for the quality assurance of ML systems.

The Integration of Model Governance and MLOps

The question of how the integration of model governance and MLOps could look is dependent on two aspects:

Strength of the regulations – determined by the primary business domain, the risk category of an ML model, and the business risk

There are industries which have a long tradition of strict regulations, such as the health or finance sectors. In the future however the EU categorization will also determine which requirements need to be fulfilled – the lower the determined risk, the weaker the requirements that need fulfilling. This means that existing standards can be complemented by the new AI regulations. And finally, the influence of the AI systems on the success of a business is decisive: If it is strongly dependent on AI systems, the management standards must be correspondingly strict.

Number of ML models that need to be integrated into the software systems

The number of ML models reflects two aspects: First, how strongly a company has integrated ML into the main business domain, and/or second, how ready the company is for the implementation of this undertaking from a technical and organizational perspective. A low number of ML models can mean either that ML does not play an important role for the business concept of a company, or that the company still has to work on deploying ML to the planned extent.

Figure 3 visualizes the integration of model governance and MLOps along the degrees of regulation and the number of models as Venn diagrams:

Fig. 3: The integration of model governance and MLOps along the degrees of regulation and the number of models as Venn diagrams

Below we explain each of these four boxes:

Variant 1: Many Models and Strict Regulation

Many models and a strongly regulated business domain is the most complex of the four scenarios. Model governance and MLOps are equally important here and closely interwoven – model governance must be integrated in every step of the MLOps life cycle (development, deployment, and operations).

Examples: Which Domains Are Subject to Strict Regulation?

Models deployed in the health and finance sectors are examples of strictly regulated domains. But models that fall into the high-risk category count as well. These include models deployed for process automatization in critical infrastructures (e.g., transport) or automatized evaluation processes in the fields of school education and professional training.

Framework for Model Governance

It is important to integrate model governance processes into every step of the ML life cycle from the very beginning. A good starting point is the framework for model governance, which covers both legal and corporate requirements and which is presented in this chapter.

The following table describes the main components of the framework for model governance, which should be integrated into every stage of the ML life cycle:


ML Life Cycle	Model Governance Components	Tasks and Artifacts
Development	Reproducibility	Model metadata management, model documentation
	Validation	Validation of the accuracy, of the KPI, of the reproducibility, and the explainability
Deployment & operations	Observation, visibility, control	Logging, metrics, auditing (e.g., the performance), ML infrastructure cost transparency, reporting on model use, managing of model endpoints and API, versioning of models and data sets
	Monitoring and alerting	Logging, metrics, and auditing
	Model service catalog	Model catalog or recording, saving and versioning, connection to the storage location of the models, connection to the source control of the models and associated data pipelines (GitHub, GitLab)
	Security	Data, information, and infrastructure security, compliance with IT standards, authentication, SSO and RBAC, management of model endpoints and API, management of keys and secrets, system testing
	Conformity and auditability	Model logging, metrics, audits, conformity testing and certificate of conformity (CE mark), authentication

Reproducibility and Validation

In the first phase of the ML life cycle, reproducibility must be created and the model validated.

Reproducibility is the ability to achieve the same result twice. In the same way that scientists specify test procedures precisely, ML reproducibility must make relevant metadata available in order to be able to reproduce models according to instructions. Model metadata management includes the type of algorithm, features, and transformations, data snapshots, hyperparameters, performance metrics, verifiable code from the source control, and the training environment.

In regulated domains documentation is often part of the regulations. But good documentation is also beneficial within the company, as the risk of technical debt can be minimized through transparency and replicability. The following aspects should be included in the documentation: the explanation of the business context, a high-level explanation of the algorithm, model parameters, selection and definition of features, adaptations to the algorithm(s), instructions on the reproduction of the model and examples for the training of the algorithms, and examples of the making of predictions by means of the algorithm. The documentation can be practically supported by means of toolkits such as model cards and data sheets. Data sheets record which mechanisms or processes were used for the data acquisition or whether ethical review procedures were undertaken. Model cards complement data sheets and provide information on the form of model creation, the assumptions made during the development, as well as expectations regarding model behavior among different cultural, demographic, or phenotypical groups.

The validation of ML models is a multistage process with a range of metrics. These include performance metrics such as accuracy or statistically significant improvement in comparison to a control group with reference to the key performance indicator (KPI) in A/B testing. Determining whether the ML problem is correctly formulated can also be based on KPIs. In addition, the development team should test the reproducibility of the models from the metadata.

A further important component of the validation is the explainability – can the performance of the model be explained? Can it be explained how individual features impact the prediction? Explanations for models with explainable AI can often only be approximated.

After the development phase, the model government processes must also be integrated into the deployment and operation phases of the ML life cycle:

Observation, Security, Control

Through these components, companies are able to ensure the transparency of model access, model and data changes, models, and data, thus achieving explainability of the process externally. This includes logging, metrics, and auditing: Values from model logging are processed and visualized in dashboards for the purposes of logging, analyzing, and communicating. Cost transparency provides transparency not only on costs but also on the associated use of resources, facilitating the accounting of different teams for the specific model and resource use. Model usage reports provide transparency on the success and acceptance of the individual models and can support access control.

The management of the model APIs also plays an important role: Those responsible for the management of the endpoints (including permissions for creation, changes, or deletions) need to be defined. The final component is the versioning of models and data sets. The versioning protects the principle of immutability of the model, whereby all models can be recovered without data loss or changes. This also ensures that a model prediction of the model version that it produces can be matched.

Monitoring and Alerting

Monitoring and alerting must occur automatically so that important metrics can be continually observed and abnormalities can be detected quickly. This monitoring system requires a suitable integration of platform and infrastructure with dashboard and monitoring tools, continuous monitoring of the uptime SLA as a metric for the stability and availability of the application, and alarm functions in the event of problems. Through the monitoring and alerting system it is ensured that productive models fulfil all relevant requirements.

Model Service Catalog

The model service catalog is an internal marketplace for all ML models in a company. The catalog itself should have a good UX, be connected with the location where the models are saved, and always show the relevant metadata for a model, such as the latest version and inputs and outputs. Employees with the corresponding rights can access the catalog, search for models, and retrieve information on the required models. In this way the model service catalog facilitates the usability and reuse of models within a company. At the same time it fulfils a function for stakeholders: As it provides a structured overview of all services, potential future users can access information on the abilities of the models.

Security

ML security is an important point on the model governance agenda. A study by Gartner estimates that by 2022 thirty percent of cybersecurity attacks will have an ML-specific character.

In order to secure against these attacks, measures to fulfil the security standards must be embraced. Models can be made accessible for example through the opening of an HTTP endpoint, but the risk of misuse then arises. This is why the adherence to IT standards (DNS, proxies, and load balancing for data traffic) is so important – but this complexity may require the services of third-party providers. Appropriate management must secure (for example by means of tokens) and manage the endpoints so that only authorized users can create, change, or delete endpoints. Authentication, SSO, and RBAC are further important points on the security list. ML models must be integrated into token-based authentication solutions, so that only eligible users can query the model. Access control also requires a role-based access control (RBAC) that checks permissions. To ensure the security of the infrastructure, this too must be secured by RBAC. In addition, key and secret management should provide the models with a solution for the creation, saving, and management of all secrets and keys. Finally, the models must undergo a security audit. Also for this reason it is so important to involve IT and company experts from an early stage in order to fully consider all security requirements.

Protection against ML-specific cybersecurity attacks is also relevant here. Data and information security play an important role here, as models are often trained using sensitive data. Moreover, hostile attacks on ML systems frequently work with data, so data security is a major challenge for companies. One solution is the Adversarial ML Threat Matrix. This matrix resembles classic attack chain modeling and builds on the established MITRE Att&CK as a globally accessible knowledge base on the tactics and technologies of attacks. MITRE Att&CK is used as the basis for the development of specific threat models and methods in the private sector, in governments, and in the realm of cybersecurity products and services. This concept picks up on the Adversarial ML Threat Matrix for ML security – it contains a collection of known weaknesses and the associated attacks and is therefore a practical framework for security.

Conformity and Auditability

For models in the high-risk category, the EU prescribes conformity testing and CE marking in its draft. In order to fulfill the compliance and auditability requirements of a heavily regulated domain and obtain a certificate of conformity, the model governance framework should be as automated, transparent, and complete as possible. But systems outside of the high-risk category also have to undergo conformity testing if they are associated with a high level of business risk. Model logging, metrics, and audits that demonstrate compliance with requirements are important for proving the fulfillment of conformity and auditability. This includes collated and visually documented model information as metrics in appropriate dashboards, model and data versioning, and the results of the audits (tested components of the validation in the development phase). But the fulfillment of security requirements must also be demonstrated. Permissions and authorized access to ML applications must exist alongside functioning role-based authentication. Each domain is subject to different regulations of different strengths. Conformity and auditability are very complex and often require extensive experience. All the more important it is to include compliance and security experts in model governance strategies from the very beginning, so that no gaps materialize.

Variant 2: Many Models and Little Regulation

A low level of regulation is found in domains requiring less strict regulation, where the ML model does not fall into a high-risk category, and where the associated business risk is low. But even in this instance, companies need at least a basic level of model governance as the high number of models needs managing. At the same time, the focus here is on operationalization: A large number of models requires efficient MLOps.

The weak regulation and the focus on MLOps highlight that model governance in this variant is part of MLOps, rather than an independent framework as in the previously presented variant. In order to understand how model governance can be integrated into MLOps, an overview of MLOps is useful. The Google paper “Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning” divides MLOps into seven integrated and iterative processes: ML development, training operationalization, continuous training, model deployment, prediction serving, continuous monitoring, and data and model management. We use this MLOps framework to explain the integration of model governance and MLOps in this second variant (many models and little regulation).

The ML development phase does not come to an end with a finished model, but rather with the implementation of a training pipeline. If new data are available or a loss in performance of the model is noticed, this acts as a trigger that kicks off the (continuous) training pipeline. The model is retrained and is then returned to production. The training pipeline consists of a data engineering component (data entry, data validation, data transformation) and a model component (model training, model evaluation, model validation), although the training pipeline must itself be versioned and tested. As a first step in the training, new training data are loaded from the data repository. Data and feature repositories standardize the definition and saving of the data, ensure data consistency for training and inference, and shorten the data preprocessing and feature engineering processes. The data pass through all steps of the data engineering process and are then used to retrain the model. After the training the model is evaluated and, after the verification of all performance metrics, saved as a candidate model in the model register. All metadata and artifacts arising during the training run are saved in the ML metadata and artifact repository. ML artifacts include statistics and data schemas, trained models, and metrics; ML metadata are the information about these artifacts (pipeline run ID, trigger, process type, step, start and end time, status, environmental configurations, and input parameter values).

Model registers manage the life cycle of ML models. After the recording in the register, model governance processes determine across multiple steps whether the candidate model will be deployed in the productive system (model deployment). After successful deployment the model provides predictions for every input (prediction serving). Model governance permanently monitors the performance of the productive system (continuous monitoring) and collates all relevant metrics in an independent report (e.g., accuracy). As a result of the monitoring, losses in performance or changing input data are immediately diagnosed and the retraining kicked off – the cycle begins again.

Fig. 4: With their central role in the continuous training pipeline, data and model management are cross-purpose processes in the MLOps life cycle

Model Governance as Part of Model Management

Model governance encompasses the recording, auditing, validation, approval, and monitoring of models for the deployment and is the final layer of control that a model must go through before it can be deployed in the production environment. The list below summarizes the model governance components with the necessary tasks and artifacts. For the implementation of these tasks, model governance draws on information from the ML metadata, the artifact repository, and the model register.

Saving/versioning of models
Evaluation and explainability
Audit
Approval
Report (summary, visualization, highlighting of metrics) on the quality assurance of the productive model

The saving and versioning of models comprises the adding or updating of model features as well as the tracking of model versions and feature changes. For this purpose the model registry saves all model versions in order to ensure reproducibility and accountability (similar to the model versioning in the “observation and control of model governance” component).

The evaluation and explainability play a second important role. Here, a model is compared with a model already in operation in production through the collation of metrics and with business KPIs (similar to the validation of the development phase).

The audit is a fundamental aspect. Here, changes must be checked and approved in order to control risks in different categories (for example business, financial, legal, security, data protection, reputational, and ethical risks) as described in this end-to-end framework for the internal auditing of algorithms. This component is equivalent to a less strict variant of the certificate of conformity. The approval governs the management of the model deployment and controls the data traffic that is directed to the model. Here it is particularly clear how model governance and MLOps interact: The automated deployment is dependent on the fulfillment of the model governance tasks.

Finally, a report brings together the summary, visualization, and highlighting of model performance metrics collected during the monitoring process.

Variant 3: Few Models and Little Regulation

This constellation applies to companies that develop models outside of regulated industries that do not belong to the high-risk category according to the EU proposals and whose use does not involve a high level of business risk. The low number of ML models can mean either that ML does not play an important role in the business strategy of the company or that the company is still in the experimental phase of ML development.

A low number of models in combination with light-touch regulation is the simplest of the possible cases, as the scope of the regulations and the number of models are both limited. The lack of requirements and the low number of models means that model governance is optional, although it is still recommended to ensure a high level of technical quality. The components of the development phase are also relevant for this scenario.

Variant 4: Few Models and Strict Regulation

In this scenario, where companies in strictly regulated industries have few models, the only difference to variant 1 is the low number of models. As fewer models need managing, MLOps is less extensive than in scenarios in which many models are employed. The integration of model governance frameworks with MLOps remains unchanged however: In heavily regulated domains, model governance must cover the complete MLOps life cycle, even if only a few models are in use (see variant 1).

Summary – The Main Components of Model Governance

Although the strength of the regulations and the number of models determine how model governance should be implemented, with the exception of the third scenario some common denominators can be found with regard to technical artifacts:

Comprehensive model documentation or reports. This includes the reporting of the metrics through appropriate visualization techniques and dashboards
Versioning of all models to create transparency for stakeholders (explainability and reproducibility)
Auditing processes (automated approval auditing or CE certification as part of conformity testing)
Comprehensive data documentation to guarantee high data quality and adherence to data protection
Management of ML metadata
Validation of ML models
Ongoing monitoring and logging of model metrics

Article