The recent announcement by openAI has made it a particularly topical issue to address the issues raised by increasingly human-like Artificial Intelligence. The company says that its most popular model (chatGPT), which is perhaps the most widely used AI-based tool, can now be instructed by voice command, and can read out answers itself using speech synthesis. This fits in with the trend whereby, AI is becoming more and more human-like. This is due to the natural consequences of scientific progress and sometimes conscious human choices. What is behind this process and how will this trend affect our daily lives soon?
Artificial intelligence-based or AI-supported services are now part of our everyday lives. Whether we are aware of it or not, these services are an integral part of the algorithms behind social media content recommendations, behind the biggest search engines, and are present even in the most popular chatbots designed to answer questions. Technological advances are now at such a stage that AI applications that can interact with humans are often questioned as to whether their capabilities are human-like, or just purely simulated. In fact, in most cases, developers have a vested interest in making the capabilities of their products as distinctive as possible. The conclusions that can be drawn from these capabilities are then left to the individual.
One of the influential trends that characterizes today’s state-of-the-art generative artificial intelligences (or more precisely the models they are based on) is anthropomorphization. This issue can be approached from two angles. On the one hand, we can talk about the phenomenon where a user encounters an AI system and based on its perceived or real capabilities, accepts it as more or less human-like. On the other hand, we can examine the effects of manufacturers making their products more and more human-like (either to increase efficiency or to increase the level of trust surrounding them). First, let’s take a look at why we tend to see traces of human intelligence and human behavior where they are not actually present.
A good example of this is the operation of Large Language Models (LLMs), whose real-world performance and human-measurable capabilities have been a constant topic since the launch of ChatGPT. It is important to note upfront that most tech companies are extremely cautious about making statements that praise the human-like nature of the AIs they develop. In most cases, the issue is approached from the perspective of efficiency, as if to highlight the performance of a given device or service compared to humans. Nevertheless, the issue is not without its scandals. For instance, Google recently fired an engineer who publicly claimed that the conversational language model (LaMDA) developed by the company was sentient. We have written more about one possible explanation for this case here.
To endow inanimate things with living (human-like) qualities is by all accounts a typical human trait. More specifically, the term anthropomorphism is used to refer to the psychic phenomenon that humans tend to associate with non-human entities as if they were human beings. Such entities can be animals, plants, natural or social phenomena, or technological “tools” (both hardware and software). Just think of how children see their toys as living things in role-playing games, which is a natural part of their development. In their case, there is always a sharp boundary in the perception of the roles, i.e., children are in fact well aware that the object is not actually alive and does not have real human-like properties.
In childhood, this helps them to be able to interpret the world around them, even if the events often have causal links behind them that they cannot understand at that age. Bizarre as the parallel may seem at first sight, it is easy to see how a similar situation could arise with Generative Artificial Intelligence (GAI). One reason for this is presumably that we have very little information about how they work, and it is not entirely clear how their capabilities can be adequately measured.
The operation of machine learning models based on deep learning (DL) is notoriously difficult to interpret in practice. Similar models (such as the GPT family of LLMs) represent the information in the training data through the connections of hundreds of millions or even billions of artificial neurons, as well as the numerical values stored in them. To know the relationship between an input and an output, we would need to know not only which neurons are responsible for the latter, but also what the information encoded in them means. Creating this kind of interpretability is not just one of the biggest challenges in artificial intelligence research today, but also an active area of research.
In fact, it is often the case that even experts lack methods to explain the functioning of devices they have designed themselves. This provides an ideal situation not only for speculation and theorizing but also for users to equate, for instance, the quality of the output of a model with the human-like nature of the underlying artificial intelligence.
The uncertainty around the real-world capabilities of GAI is further exacerbated by the fact that, for example, when OpenAI announced their GPT-4 model (in March this year), they also published a very impressive list of a long list of professional and academic surveys that the model has completed. These include not only high school tests but also the bar exam, which requires a lot of background knowledge.
While the results listed here are indeed impressive, it is questionable how well they relate to the real-world (or human-like) intelligence of the model. It should be remembered that these tests are invariably designed to measure people’s abilities or their subject-matter knowledge in each domain. The truth is that we do not currently have a set of tools that can measure how intelligently a machine learning model can behave.
In the case of people, there is already a long history of how and why we measure certain skills that are commonly referred to as measures of intelligence. However, there are growing views that there is not just one, but as many as 9 different skill sets that can be called intelligence. For humans, we know that these tests measure something that indeed defines us. In the case of machine systems, experiments are only now being carried out to find out what it is that we should measure to draw similar conclusions.
All this uncertainty provides an excellent platform to attribute the results that today’s tools produce to some kind of human ability. In many cases, companies are also striving to humanize their products. It is quite common for personal assistants to be able to receive information by voice and respond to it in human language and in a human voice. This is just an extension of the already existing human-machine interface to the usual communication channel between people.
István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.