István ÜVEGES: Ethical Machines – the Problem of Right Decision and Morally Right Decision in Machine Learning
As AI-powered systems permeate deeper into our lives, the specter of ethics looms larger than ever before. The technology’s ability to make decisions has opened Pandora’s box of moral questions. Can machines truly discern right from wrong? From trolley problems to cultural nuances, the journey towards ethical AI is riddled with challenges.
The moral implications of AI decisions have only recently been investigated in a meaningful way, and the results in some cases are contradictory.
Although the concept of robots walking among humans and interacting with them has been around since ancient mythology, it is only today that artificial intelligence capable of intervening in the functioning of the social order has become a reality. As we rely more and more on technology in our everyday lives, the question arises: will future AI-based solutions be able to make not only correct but also ethical decisions? In this post, we therefore briefly review the questions that arise in the context of ethical AI, as well as the main problems that are currently known, and the possible answers to them.
By 2023, AI-based solutions are present in more and more areas of our lives, some obvious, others more hidden. AI algorithms can be found in completely innocuous environments, such as spam filters in email clients, content recommendation systems in media providers, or even chatbots in the customer service departments of companies. While it is generally true that they help us to make decisions that make our lives easier or more convenient, none of these have a significant impact on our future. However, not all the applications we already know are so ‘harmless’. For example, social media platforms employ AI algorithms that can trap us in opinion bubbles by strongly prioritizing content based on our interests. Additionally, weaponized AI can be used to support activities such as election manipulation.
Based on current trends, the capabilities of machine learning-based systems are expected to continue to evolve, and their prevalence is expected to increase soon. Although the emergence of Artificial General Intelligence (AGI) is still decades away according to the most optimistic estimates, even existing solutions can pose social, financial, and security risks in irresponsible hands. In the (probably not too distant) future, AI will have to make decisions in situations where the outcome will have a significant impact on our daily lives. This raises the question of whether it is possible to encode information/knowledge into these systems that in some way maps the values associated with morally correct or ethical choices from a human perspective.
Current machine learning systems are based on data (training data), which is used by an algorithm to learn the patterns in the data and then to make ‘decisions’ in new situations. In its simplest approach, the creation of ethical AI can be imagined in a roughly similar way. Given a huge amount of training data, we can assume that the algorithm can learn the patterns that characterize human morality (without, of course, the self-reflection and empathy that are currently capabilities of humans only). To better understand the issue, it is worth starting from an existing practical example.
The Moral Machine Experiment was a 2018 project in which philosophers and data scientists collected around 40 million responses from more than 200 countries to the famous ‘trolley problem‘, perhaps one of the most famous thought experiments. Here, the decision-maker is tasked with making an ‘impossible’ decision in accordance with their own moral principles. In the basic situation, the respondent is driving a trolley when he notices that the brakes on the vehicle have failed. On approaching a turnout, he discovers that if he continues his current route, he will collide with five people on the track, who will die in the encounter. By changing the switch, he can divert the vehicle to another track, but there is also someone standing on it, and this decision causes their death. The question is therefore, in simple terms, does the respondent in such a situation not act and thus passively cause the death of 5 people, or does he act, thus ‘intentionally’ killing one person?
The experiment highlighted, among other things, the difference in preferences between collectivist and individualist societies. It also revealed significant differences between respondents from Western and Eastern cultures regarding their tendency to be passive (not to act). The diversity of responses and the country- and culture-specificity of them raise an important question. To what extent is it possible to create a stable basis for programming machine learning models capable of making ethical decisions by democratizing these decisions?
The situation is further complicated by the basic architecture of machine learning solutions. As mentioned above, they all learn the knowledge needed to solve their task from the data they receive. The example of the Moral Machine is a clear case where (if the answers were used directly as training data, for example in the case of self-driving cars judging traffic situations) a certain morality is already encoded in the training data from the start.
Current machine learning models, on the other hand, are inherently task-oriented, i.e., they are typically designed with a single goal in mind and are only capable of solving that goal (e.g., ChatGPT, which operates as a chatbot, is ‘just’ a state-of-the-art Question Answering system). The problem is that in most cases we have no insight into whether there is any moral value present in the data used by the algorithm to generate the model that makes the decisions. There are several examples where a model has accumulated racist or sexist biases during training, but these were only discovered after commissioning and use. A theoretical solution could be to further train existing models with data specifically representing preferred moral values. Although in practice this is often not feasible due to the technical features of machine learning models. As described above, machine learning algorithms store the acquired knowledge in a model. This model is later responsible for making decisions at the end of the learning phase. Unfortunately, if we show new examples of the already existing, well-functioning model, this result does not add to the ‘knowledge’ already present in it but overwrites it (catastrophic forgetting). As a result, the model – adding the new (ethical) information it has received – does not keep its original function but rather forgets its original purpose.
So, the solution is clearly to filter and validate the data before the training phase to ensure that the output of the model is working according to the expected ethical standards. Let us add that this is not a trivial task, given the amount of training data, especially since the potential biases present in it are in many cases not obvious at all, or even easily traceable.
The Moral Machine experience provides important information for the development of future machines that interact with and move among people, but it also leaves at least as many questions unanswered. Perhaps the most important of these, as we have already touched on, is whether the majority decision in determining moral values always correlates with the ethical decision. One need only think of WWII Germany to question this, but we must also remember that the values of the modern Western world have been shaped for centuries by Christian culture, which is strongly reflected in our views on ethics today. But this cannot be seen as a cross-cultural universal.
Moreover, due to the data-driven nature of machine learning, AI researchers would need to characterize ethical aspects as highly explicit, measurable, and quantifiable values to encode them into machine learning models of today and the near future. This would require at least the full agreement of human respondents (e.g., during data collection), which is not impossible, but again problematic. There is also no guarantee that a theoretical decision taken in a fictitious situation would be taken in the same way. Would a respondent who, in the case of the trolley problem, would pull the tram over to save 4 lives in the condition of the question, do so in reality, knowing now that his own active decision would cause the death of a person? However, it is also the case that the responses collected from respondents tend to reflect very specific situations. In real life, similar decisions may be required in a myriad of situations. Just as in the case of law, in the world of machine learning it is not possible to describe every conceivable situation in a concrete way in advance. This is why generalization must be the task of the model. The task to be solved in this context is therefore to ensure this capability of the model by describing only a few concrete situations and specifying the expected result.
The increasing awareness and popularity of the problem is well illustrated by the fact that in recent years a specific research direction has emerged. Such research aims to align the output of machine learning models to a general expectation, thus reducing the uncertainty that is still often present in the operation of ML solutions (AI alignment). In most cases, proponents of this school of thought argue that human decision-making is influenced by several factors (arising from the actual context of the decision). Examples of this might include individual experience, cultural norms, empathy, and personal values. All this allows us to distinguish between right and wrong. The aim is to ensure that this additional knowledge can be considered by machine learning models in each situation.
It should be noted, however, that the trend refers the solution mainly to the competence of the developers since they can have the greatest influence on the output of the models. However, according to the opponents, this raises several concerns. The most important one is that in such a case, only a small group of people would determine what is ethical. This is strongly opposed to the previously mentioned democratized approach. This is mainly a consequence of the fact that the supporters of AI alignment treat the coding of ethical principles into machine-learned solutions mainly as a technical and not a social, legal, or even philosophical problem.
Another counterargument is that AI alignment seeks to address problems that can only be addressed in a meaningful way in the context of AGI. This implies that the time and money spent on AI alignment will divert valuable resources from solving societal problems that already exist today. These problems are mainly caused by the irresponsible use of currently applied AI solutions. It is also worth noting that the concept of AI alignment is, in most cases, very vaguely defined. This vagueness may be a consequence of the fact that it is still a very young research area, which is only at the stage of finding its way.
As AI entwines deeper into our tomorrow, unlocking ethical decision-making is the key to its success. Additionally, the review of current solutions is becoming increasingly urgent, even though the problem is only now beginning to receive attention. The current state of AI research suggests that the best way to ensure that the expected moral principles are met is through careful selection of the training data and continuous, targeted quality assurance. Democratizing AI and increasing the transparency of its development are also essential to the solution. To make AI a truly life-enhancing and human-enabling technology, it is essential to prevent misuse and limit irresponsible applications, in which the legislator will have a key role to play.
 There are several variations of the problem, in which the age, health status, and even whether the respondent knows or is related to a person on the track, as described, may vary. These are all factors that have been investigated in a number of experiments to see how they relate to the respondent’s reaction to the problem.
István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.