Dieser Artikel ist auch auf Deutsch verfügbar

The risks of an unregulated use of AI technologies was discussed at the EU level in April 2021 in a proposal entitled “Regulation on a European Approach for Artificial Intelligence.” This article considers the societal challenges associated with AI. The second article of this two-part series looks in more detail at the technical aspects of the topic.

AI Models Are Not Inherently Neutral

If being honest about the challenges faced by democratic societies, the risks of unregulated AI software cannot be ignored. Machine learning and artificial intelligence may be a technical method, but the problems that they can resolve do not necessarily have to be of a technical nature.

AI is fundamentally the use of algorithms for a specific problem. To solve this problem, the algorithms are trained with data. Algorithms belong to the field of mathematics, which continues to count as a neutral entity. Can we therefore say that the neutrality of mathematics also applies to its applications? No, because AI systems and their underlying use of mathematical algorithms are not inherently neutral. The data for AI systems that the algorithms learn from are provided by human beings. It is therefore human beings who set the direction in which AI is to assume an independent existence, and humans have proven to be unsuitable as objective guides.

Risks of Unregulated AI Bias in Facial Recognition Systems

In 2018 the Ghanaian-American computer scientist Joy Buolamwini, together with Timnit Gebru, a former ethical AI researcher at Google, published the study Gender Shades, in which they analyzed the facial recognition systems of Microsoft, IBM, and Megvii. They showed that the facial recognition was less accurate with people with darker skin colors than those with lighter colors. Differences in the accuracy of the algorithm were also found with regards to sex: all the tested facial recognition systems worked better with men than with women. IBM’s was the least accurate: the system had a reduced accuracy of around 34 percent for dark-skinned women compared to light-skinned men. Where did the bias come from?

So that facial recognition systems can achieve the desired results, the breakdown of the images in the training data should correspond to the distribution of the characteristics of faces that are to be expected in the real world. Only by means of a high level of diversity and a broad coverage of the training data can a model learn different manifestations of facial features. If training data contain biases that reflect social disparities in the data then these biases will be included in the generated models.

Given that facial recognition is used in the USA at airports for confirming the identity of travelers or by the police for criminal prosecution, the results of the study became the initial spark for a civil rights movement that demanded measures to counteract the problem of algorithmic bias. As a consequence of this widespread indignation, IBM decided to put its facial recognition system on ice. In addition to withdrawing its system, IBM went one step further and presented “Diversity in Faces,” a dataset of annotations of 1 million human facial images.

The data were generated under careful consideration of scientific facial coding schemes. The aim was to improve the diversity of datasets, ultimately leading to fairer and more balanced algorithmic decision logic. It was however found that the photos in the dataset had been taken from the photo-hosting website Flickr without the permission of the users. The attempt to resolve the problem of fairness became an issue of privacy instead.

Language Models on the Rise

Training data also play an essential role for language models. These are unsupervised AI systems that are trained to predict the likelihood of a token (character, word, or string) when either its preceding context or its surrounding context is given. Many people directly or indirectly use large language models, even if they are not always aware of it. Whether search engines, text generation, or conversation tools, the importance of language models has been proven and will continue to increase in everyday life. In order to satisfy the hunger of the language models, large amounts of data are collected from the internet as training data.

Using the internet as a source of data comes with the risk of “absorbing the hegemonic worldview from (the) training data,” as a paper from University of Washington puts it, and from the very beginning furnishing the algorithm with a language that is characterized by bias. This results in models that contain stereotypical and pejorative associations with regard to gender, race, ethnicity, and disabilities. When these models are rolled out, either as part of a classification system or as a language generator, this bias is strengthened in the application.

Moreover, a new cycle starts with the deployment of the model: the text produced by the language model is disseminated with its embodied bias, not only reinforcing stereotypes but also serving as training data for new models. The risk of a subtle bias or offensive language generated by the language model becomes a grave problem for people whom it is directed against. This includes both individual psychological damage and implications for society as a whole with the reinforcing of sexist, racist, and other prejudices. The strengthening of ideologies can in the worst case lead to violence.

Necessity of Regulations and Appropriate Frameworks

The Gender Shades study and the possible risks of large language models show how difficult it is to recognize, before the deployment of software, whether potential for harm is inherent in a system and what it is. If this potential does indeed occur after deployment, the locating of the error source can be impossible and have enormous consequences. There are of course a wide range of other uses of AI in a broad societal context. For example, AI can be used to make credit decisions or to support the hiring process of staff.

With the increasing integration of AI in society, opportunity loss, economic damage, and social stigmatization are the main risks of an unethical utilization of AI technologies. It is clear that the algorithms themselves cannot be made accountable – rather the companies that develop the AI software. To ensure that their handling of ethical guidelines is not merely a reaction to external pressure such as the threat of reputational damage, binding statutory requirements for the regulation of AI software are needed along with a realignment of company policies.

Ethical Guidelines: “Lukewarm, Shortsighted, and Deliberately Vague”?

This necessity has been recognized by the EU. In April 2021 the Regulation on a European Approach for Artificial Intelligence was published, the first legal framework for AI that addresses the risks of artificial intelligence. The document recognizes that the use of biometric identification systems involves “specific challenges for the protection of fundamental rights and freedoms.” The proposal also states that “technical inaccuracies can lead to biased results and entail discriminatory effects” (Section 70). Age, ethnicity, sex, or disabilities are particularly relevant here.

The critical importance of the training data on the functioning of the AI system is also highlighted (Section 45). The data should be “sufficiently relevant, representative, free of errors and complete in view of the intended purpose,” in order to ensure that the risks to security and fundamental rights are minimized. As this legal framework is to apply not only to companies and private persons located in the EU, but also to any company offering AI services within the EU, the geographical scope of the law would be comparable to that of the GDPR and have a global impact on the development of AI.

The idea of regulating AI software is not new for the EU. In 2019 an expert group commissioned by the European Commission published ethics guidelines for the development of AI software. The “Requirements for Trustworthy AI” describe the desired characteristics of AI systems, according to which artificial intelligence should be lawful, ethical, and robust. From these characteristics, seven key requirements were derived: human agency and oversight (AI systems should empower human beings to make informed decisions), technical robustness and safety, privacy and data governance, transparency, diversity, societal and environmental well-being, and accountability.

Although these guidelines were without doubt a step in the right direction, there was criticism of their content, which one member of the expert group described as “lukewarm, shortsighted, and deliberately vague”. Without practical guidance on the actual implementation and operationalization of ethical principles, an important precondition for being able to reinforce its own normative claims is missing. Whether the freshly published regulations can achieve exactly that is at the time of writing unclear.

Model Governance – A New Corporate Policy for the Regulation of AI Software

It is however clear that the European attempt at regulatory control makes companies that develop AI software accountable. In order to comply with this accountability, companies need a corporate policy (model governance) that stipulates a clear framework for the use of AI software. As a “system by which the whole organization is directed, controlled and held accountable”, model governance makes companies accountable and reduces the risks that the use of AI software involves.

The principle of model governance is not new – there are countless fields with a tradition of verifiable processes and control mechanisms, which although not always error-free would otherwise make it impossible to achieve safety standards. These include safety-critical and regulated industries such as medicine, but also air and space travel or finance.

In the field of AI, model governance must define the access controls as well as the inspection and validation processes. These are important to determine whether AI systems meet ethical requirements. Roles and responsibilities must also be determined along with standardized change logs and inspection records that log every change to data, model, or system. This recording not only ensures external transparency but also facilitates bug-fixing and helps to fulfill legal and regulatory requirements.

These include reproducible model results. Models are dynamic, but it must nonetheless be possible to match each model result to the model version that generated it. This can be important for example when users complain about a specific algorithmic decision. The clear
stipulation of roles, processes, and records allows a more consistent production process that minimizes the operational, legal, and regulatory risks for companies.

Model governance is thus the basis on which the aspects to be regulated, such as ethics, fairness and bias (overgeneralized assumptions about a specific group of persons), interpretability and explainability, and robustness and security of AI systems, are built on. These core aspects are summarized under the term responsible AI.

The Newness of the Field and Lack of Best Practices

At this point it is important to note that terms like model governance or responsible AI have to date not been defined on the basis of consensus, as it is a new field without established frameworks or best practices. There is agreement that AI should be ethical. What that actually means in concrete terms, what makes AI ethical, and which requirements are necessary for its implementation is however still fragmented.

The newness of the field might surprise at first. After all, AI has been greatly hyped in the field of data science in recent years. But while AI itself is not a new science, the applications and their market penetration are still immature. There is therefore a difference between the development of an AI model in a local Python program or Jupyter Notebook and the deployment of this model in a production environment. Here is the problem: only around 13 percent of all data science projects are successfully introduced into a production environment.

AI Systems as a New and Ongoing Challenge

How is AI software different from “classic” software without AI? In contrast to classic software with code as the main component, AI software is composed of the data, the model, and the code together. These three components are mutually dependent. For example, a change in the data causes a change in the model (the change-anything-change-everything principle). If after deployment in a production environment a model is confronted with data that differs significantly from the training data, the model can change.

Potential risks of AI systems can therefore emerge over time. As a result of the dynamic of the interdependence between data, model, and code, it is impossible to preemptively prevent these risks before deployment. For this reason, the Regulation on a European Approach for Artificial Intelligence calls for particular attention to be paid to AI systems that continue learning in a production environment and explains the necessity of a post-market monitoring system (Section 83). As an inspection and validation process, this monitoring system should be embedded in the model governance framework.

Is There Consensus for Ethical Principles?

As there has been a shortage of established practices for the successful long-term development of AI software, it is not surprising that they are also lacking in the implementation of ethical guidelines in AI systems. Despite the currently only weakly binding character of the guidelines, given the lack of universal standards they are an orientation aid that should be used as a “North Star” in AI development.
Can a course be found that is supported by the majority of stakeholders?

A study by ETH Zurich published in the journal Nature Machine Intelligence considers this question and looks at whether ethical principles converge to a few central concepts. For this purpose 84 ethical guidelines published by private companies, the public sector, or academic institutions were studied and scanned for similarities. While no one ethical principle was found in all 84 publications, the study found convergencies for transparency (87%), justice and fairness (81%), non-maleficence (71%), responsibility (71%), and privacy (56%).

Responsible AI Concerns Us All

Alongside the need for a legal and company-internal framework, it is clear that the responsibility is not limited to the developers. Domain knowledge has always been key for the creation of useful software products. This applies especially to AI software, for which alongside developers and domain experts also legal and compliance experts are required. And ultimately also a public that engages with the use of AI in a reflective way.

This article is therefore aimed at a broad audience and therefore attempted a balancing act between different groups of readers. On the one hand, a nontechnical explanation of the subject was required. On the other hand, technical principles are unavoidable if one wishes to understand how AI systems can be scrutinized with respect to their compliance with ethical principles and how standardized verification processes could look within model governance. This balancing act however also emphasized the close connection between AI on the one hand and society on the other. In the second part of this article, a more technical definition of fairness and test strategies will be considered.