Privacy Policy
Code of Ethics

István ÜVEGES: Inequalities and Opportunities in the World of Large Language Models

The development of artificial intelligence is at a crossroads. Depending on how we shape the future, it could become a tool for all or a privileged tool for global corporations. While the democratization of AI is a noble and worthwhile goal, there are still significant global inequalities in AI development. The question is: what is at the root of these inequalities and what can we do to address them?

Artificial intelligence, the research into machines and algorithms with capabilities that were previously thought to be exclusively human, is hardly new. Looking back at the 20th century, we can see that from the 1950s, attention was paid to the topic. For example, some trends developed very early on that are still dominant today, such as machine translation. However, after the initial enthusiasm (in line with the Gartner hype cycle), there has been a certain disillusionment among technology users and customers. This was mainly because the solutions of the time could not compete with human expertise in terms of cost but were still far behind them in terms of performance.

This phenomenon was particularly striking in the case of the machine translation mentioned above. In the 1960s and 1970s, a series of documents (mainly in the USA) questioned the profitability of artificial intelligence research. At that time, machine translation (in the context of the Cold War) was a particularly important line of research. The promise that the texts acquired from the Soviets would be translated quickly, accurately, and automatically into English in a short time proved impossible. The sophistication of the computers of the time and the (by today’s standards) immaturity of the algorithms used virtually codified the impossibility of meeting these expectations.

It is interesting that at that time the automatic translation took place exclusively based on rules (in contrast to today’s widespread statistical-based language modeling). This meant that the computers had to produce the translated text from virtually all its constituent parts, considering dictionaries, grammar, and semantic rules. The approach was far from fault-tolerant; if, for example, a sentence to be translated did not fit perfectly into the pre-coded rules, the translation was simply failed. The fact that the rules had to be created individually by linguists cannot be neglected either. The complete description of a language based on rules is still an unresolved issue, and the labor and time involved in the process can be staggering. This has, of course, also had an impact on costs.

Sooner or later, the problems came to the attention of the official organizations that support or supervise the research. The report of the ALPAC (Automatic Language Processing Advisory Committee) in 1966 is perhaps the most famous of the critical voices that were emerging at the time. These problems and the growing dissatisfaction have left machine translation without meaningful funding and therefore without meaningful development for nearly 20 years. As it was initially the mainstay of artificial intelligence, this has led to the virtual eclipse of the entire discipline.

AI research only got a new impetus in the 1980s, thanks to several factors. Perhaps the most important of these was the development of hardware. The algorithms that existed at that time all used ‘traditional’ hardware to perform the computations required for the task. This means that the models were taught predominantly using the computer’s CPU. Note that this did not change significantly until the early 2010s.

The shift was brought about by the spread of neural networks and deep learning. These worked on radically different principles, which meant a change also in the hardware required. The algorithms that characterize machine learning – the dominant branch of modern artificial intelligence research – can be run efficiently virtually exclusively on GPUs. In practice, this is most evident in the pre-training phase, when the resulting machine-learned models are generated.

Not surprisingly, neural networks are essentially made up of artificial neurons. These are organized in layers, with connections defined between them. The connections store real numbers, called weights. During pre-training, the task is to continuously update these weights, as well as the values (biases) stored in the other neurons in addition to the input layer. The combination of the two is called parameters. During pre-training, this update process essentially continues until the connection between the network’s input and output is such that it meets our current goals.

In the case of modern deep learning networks, we are talking about hundreds of millions or even billions of parameters at a time, whose values must be recalculated thousands of times. This generates a computational demand that is impossible to acquire with a CPU in a reasonable time, but the graphics cards used in computers are perfectly suited to the task[1].

However, the equipment used for pre-training is extremely expensive to both purchase and operate. The extent of this is illustrated by the estimate that, for example, training of a large language model (LLM) like GPT-3 (which was the original model behind ChatGPT) could cost somewhere between $2 million and $12 million.

This is a cost of entry that most SMEs cannot afford. In addition, there is the problem that such models require a kind of hardware architecture not only during pre-training, but also during use. The solution to this is either to purchase your own equipment or to rent resources from a popular cloud infrastructure provider. It is worth bearing in mind, however, that the cost of such rental can run into thousands of dollars each month.

The situation is now that, for a company or a research project to use the latest solutions or to create its own version of them, it needs not only the necessary expertise but also, and above all, a lot of money. However, typically only large companies can afford to make such an investment. It is fair to say that the development and deployment of cutting-edge solutions has become virtually the prerogative of a few technology giants.

This trend, however, runs counter to the trend towards transparency, accountability and predictability known as the democratization of AI. The main aim is to make the algorithms used to develop deep learning solutions, for example, as well as the resulting model and the data used to train it, available, public, and analyzable to anyone.

So the situation is not optimal, but it is important to underline that there have been some encouraging developments recently. The growing demand for democratization, for instance, is giving rise to new development methods. They offer the promise of achieving performance comparable to current market leaders with models that have significantly fewer parameters than traditional LLMs, or which are more efficiently trainable. Another encouraging trend is that there are now several platforms where code for development and model training can be shared for free with anyone (e.g., GitHub, which is widely used by developers). It should also be mentioned that many providers offer cloud computing environments that are suitable for training models or even for (task-specific) fine-tuning.

The amazing pace of development in artificial intelligence, and the fact that the required hardware capacity is becoming cheaper and cheaper, is illustrated by the fact that fine-tuning a BERT model, for example, can now be done for free using cloud infrastructure. One such environment is Google Colaboratory, among others, where the above statement was verified. Suffice to say that BERT was the model that revolutionized neural network solutions back in 2018.

In this respect, the situation is twofold: if a pre-trained model is available, anyone can easily fine-tune it to their own needs, but pre-training is still not a trivial task, either in terms of cost or the expertise required. It is also important to note that current trends mean that in many cases even a technology from a few years ago is already underperforming the market leaders. This is also leading to a widening gap in competitiveness between those who can keep up with these developments if they have the necessary capital and those who cannot.

The majority of development is currently concentrated in the US under the jurisdiction of large companies such as Meta, OpenAI, Microsoft, Alphabet and other Fortune 500 companies. This is not to ignore China’s artificial intelligence programme, or the European Union’s programme to promote digital sovereignty. As far as research projects are concerned, a similar inequality can be observed. While the top universities have access to the necessary infrastructure, their less fortunate counterparts are almost completely excluded from it. This not only leads to the migration of people interested in the field, but also makes the environment in which such developments take place highly homogeneous. Already in the medium term, this homogeneity will hamper the free flow of ideas between researchers with different mindsets, which could ultimately lead to a drastic reduction in the capacity to innovate.

We must also not forget that when the control over a given technology is concentrated in just a few hands, this necessarily increases the vulnerability of those who are unable to access it directly.

Although deep learning solutions have become one of the dominant research and industrial development trends of our time, the benefits of the technology are far from being equally available to all. Due to the very high cost of investing in such developments, the use of state-of-the-art methods is today the prerogative of only a few players. This applies to both industry and research. The solution could be to develop an infrastructure (even at national level) that gives everyone access to the necessary resources. This could also allow for freer experimentation than at present, a better understanding of existing solutions and a more transparent development of future AI-based methods.

[1] It would go far beyond the scope of this post to explain it in detail. However, the background to this is that graphics cards (originally developed for 3D graphics) are extremely efficient at performing matrix operations. This is mainly due to efficient parallelization. This capability comes in handy when neural network parameters need to be recalculated.

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

Print Friendly, PDF & Email