Emotion Recognition—a Sheep or a Wolf in Sheep’s Clothing? (PART I.)
Machine-assisted emotion recognition has recently been in the spotlight again, with the new draft of the EU’s forthcoming AI Act. Based on the current state of the draft, a complete ban on Emotion Recognition (ER)-based systems is expected, for example in work or educational settings, or in border control. The draft has found both opponents and supporters, and both sides have convincing arguments at first hearing. But what is this technology exactly, what are its dangers for privacy rights, and in general—can we really use algorithms to monitor emotions?
To get a clearer picture of the problem, it is worth starting with the methods that have emerged for the automated identification of emotions in simpler scenarios.
Computer-aided emotion recognition has taken many forms in recent decades. The topic is so popular that it has become a separate research area within Artificial Intelligence (AI) and its sub-field, Natural Language Processing (NLP). NLP itself is the field of AI that deals with the development of algorithms and models that aim to understand texts written in human languages as they would be understood by a human. This is the field behind many of the solutions that are still in widespread use today, such as Machine Translation or applications that transcribe spoken speech to its written correspondent.
Almost from the very beginning, NLP has sought to identify the emotions in a written text its author intended to express. Such research is best summarized under the heading of Sentiment Analysis (SA) but is also commonly referred to as Opinion Mining or Emotion Analysis. Of the above, the first two are almost equivalent, but the third is slightly different. The main question is the granularity with which we want to analyze a writer’s attitude to a topic.
In the case of SA, we basically work with three categories, which traditionally represent the elements of the positive – neutral – negative axis. Here, the aim is to determine what kind of sentiment a given text or textual unit carries when viewed from a bird’s-eye view. In the case of emotion analysis, the situation is more complex in that the categories to be found are drawn from a much broader spectrum of specific emotions. This is where the difficulties start.
As with all machine learning tasks, the whole process starts with specifying the categories to be (automatically) recognized. So, the logical thing to do when analyzing emotions is to determine which emotions we are looking for. Without going into too much detail and the debates that have developed around the various theories, we can take basic emotion theory as an example. It starts from the basic assumption that humans have a limited number of biologically and psychologically “basic” emotions. Each of these is manifested in an organized and repetitive pattern of related behavioral components, i.e., they can be easily separated from each other. The key word is universality, since the theory was basically looking for an answer to the question of which emotions can be identified by identical external features (mainly facial expressions), independent of culture. However, the number of such emotions varies even among the theorists.
The two best-known classifications distinguish between 6 and 8 basic emotions. The former is attributed to Ekman and the latter to Plutchik. In the 6-class system, the basic emotions are just separated (anger, disgust, fear, happiness, sadness, and surprise), whereas, in the 8-class system, they are arranged in pairs of opposites (e.g., sadness – joy) and have a strength (e.g., annoyance – anger – rage). The latter is also known as Plutchik’s wheel of emotions. The situation becomes even more complex when we consider that the two systems above describe emotions as discrete, distinct psychological states, which is a divisive statement. Some approaches argue that all emotions can be placed in a multidimensional coordinate system, the axes of which represent the psychological components of the underlying emotions. Examples of such axes are the pleasant-unpleasant, tension-relaxation, and excitation-calm divisions. It is worth noting that in this system all emotions are essentially the same, differing only in their degree of intensity and pleasantness.
The important lesson from the above is that the system of human emotions can be both very clear and very complex. This is true whether we think of it as a psychological phenomenon or as a machine learning classification problem.
In the latter case, of course, training data must be produced somehow. In most cases, this is done by manual annotation. During this process, people label texts with appropriate sentiment or emotion categories. Errors in the labeling process can lead to inconsistent results in model training, so special attention must be paid to the validation of the dataset.
In addition, we must not forget that human behavior is extremely complex, and therefore our communicative intentions can be diverse and can be expressed both explicitly and implicitly. For example, irony or sarcasm is a common problem for SA in texts. In these cases, the intended meaning of the sentences is often the opposite of the literal one. This obviously also has an impact on the sentiment or emotion value to be detected. Such anomalies are particularly difficult to spot in a written context since it is often not possible to determine whether the speaker meant something ironically. Often this can only be deduced in live speech from the wider context or from our general knowledge of the world. The analysis of such situations is also one of the major limitations of the existing models.
In addition, choosing the right textual unit for analysis is far from trivial. It makes a difference whether we want to assign a label to the whole text, a paragraph, a sentence, or even a single clause. Today’s most sophisticated methods of analysis are not only able to classify each sentence of a text but also to identify, for example, which word in the sentence is associated with a given emotion (Aspect-Based Sentiment Analysis—ABSA).
It is important to note that all the solutions mentioned so far will also be highly domain-specific. It is not the same, for example, whether a model must work correctly on customer opinions about products or whether it must identify emotions in a political discourse.
Perhaps the most important aspect of the issue is that emotion recognition in human social interactions is essentially a multimodal process. When we instinctively try to assess the emotional state of the person with whom we speak, we are not judging solely based on what the person says. This is, of course, a necessary limitation of the methods used in NLP, since the subject of the field is, in this respect, merely the use of language. Note that in many cases this is sufficient to draw accurate and relevant conclusions. Emotion recognition systems go one (significant) step further by analyzing real-time human behavior.
István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.