
Small Language Models, Big Impact: The New Way of AI
The development of artificial intelligence has brought significant advances in the field of language models in recent years. These models can generate texts that are deceptively like human language usage. They can also analyze texts efficiently and translate between languages with high quality. They are also capable of performing increasingly complex tasks. For example, they can analyze the emotional content of texts, classify texts by topic, or even anonymize documents. Large Language Models (LLMs) such as GPT-4 or PaLM 2 are widely known for their capabilities. Recently, however, a new trend seems to be gaining momentum in the development of Small Language Models (SLMs). In this post, we will look at what exactly these models are, how they differ from LLMs, what advantages and disadvantages they have, and why they may become important soon.
The most striking difference between the widely known LLMs and SLMs is, of course, their size. LLMs have billions (or even hundreds of billions) of parameters and require a staggering amount of data to train. Just to give a sense of this volume, there are opinions that the Internet in its present form does not contain, or will not contain for a long time, sufficient training data for such models. LLMs are, of course, often capable of extraordinary achievements, even in areas that might at first seem surprising, such as doctor-patient communication or rationalizing research processes. However, due to their size, these models require significant computing power, which also implies the necessary use of expensive hardware.
There are several definitions of SLMs, probably because of the relative novelty of the concept. Some sources define them solely in terms of the size of the models (i.e. the number of parameters), as significantly smaller language models than LLMs. Perhaps more practical is the definition that sees the essence of SLM as being that it is in fact a smaller, “better optimized” version of a large language model. In essence, the aim of the transformation is to transfer the capabilities of the large model to a smaller, more efficient, more sustainable model, at the least possible loss. This can be done, for example, by reducing the precision of the weights stored in the model (quantization), for example by storing only 8 bits of information instead of the original 32-bit precision of the numerical representation, which reduces the resources needed for the model. Another promising solution is “pruning”, which removes irrelevant or redundant parameters from the neural network, which also has a positive impact on efficiency.
Although their size necessarily limits some of SLMs’ capabilities, their efficiency and lower resource requirements make them an ideal choice in many areas. To name just a few examples, unlike their larger counterparts, they can run in real-time on mobile phones, IoT devices, or other systems with limited capacity, while performing specific tasks with high accuracy.
The advantage of LLMs lies in their versatility. In theory, they can perform almost any language task, from reading comprehension to writing summaries to creative writing. However, their performance comes at a high cost: they require special hardware to run, which is not only expensive but also has a significant ecological footprint due to its high energy consumption. As an example, the open-source Bloom model, which is not even one of the largest by today’s standards, with 176 billion parameters, but which still consumed some 433 195 kWh of energy to run.
In practical applications, the question of how fast a model can perform a task also often arises. Compared to SLMs, LLMs have a slower response time, making them often impractical in real-time applications. In this comparison, SLMs are considered fast and cost-effective. They are easier to run on smaller devices, such as mobile applications or edge computing systems. However, they have the disadvantage that they are usually not able to generalize at the same level as LLMs and are not suitable for some complex tasks. Nevertheless, SLMs are preferable where speed and resource constraints are important.
One of the main reasons for the popularity of SLMs is their cost-effectiveness. Resource efficiency not only reduces energy consumption but also addresses the need for environmentally friendly solutions. This is particularly important today, where the technology sector is often pervaded by a proliferation of pseudo-solutions characterized by greenwashing. It is also important to note that, in the case of SLMs, the possibility of local processing improves data protection, as they can be used on local infrastructure. The latter is still a major source of concern for LLMs. The fact that critical data does not need to be transferred to the cloud is particularly important in industries such as healthcare or the financial sector, where the protection of sensitive information is a top priority.
Another advantage is customizability. SLMs are relatively easy to fine-tune for specific industry or business purposes, so for example a customer service chatbot can answer common questions faster and more accurately than a general-purpose LLM. This is even as emerging technologies such as Retrieval Augmented Generation have made significant advances in addressing business-specific use cases.
Technology companies such as Google and Meta are constantly working on improving SLMs to make them more efficient and more widely applicable. New directions in AI research often also focus on combinations of LLMs and SLMs, recognizing that these models can offset each other’s weaknesses. For example, a large model can perform complex text analysis and generalization tasks, while SLMs, due to their smaller size, can provide fast and targeted solutions in local environments. This collaboration not only enables resource optimization but also foreshadows a new generation of personalized AI solutions where performance and efficiency go hand in hand. This is particularly important for applications where real-time response, privacy, and sustainability are critical.
Large language models continue to play a prominent role in artificial intelligence, with their versatility and power to solve complex problems. However, the rise of SLMs is not only a trend but also a response to the challenges of the modern technological environment, such as resource efficiency, sustainability, and data protection. SLMs will allow AI solutions to become more widely available, not only for large enterprises but also for smaller organizations and new technology applications.
The key to future success will be how to harmoniously combine the benefits of LLMs and SLMs. This may determine not only the next steps in technological development but also the extent to which AI becomes sustainable and inclusive in practice. The emerging paradigm shift in the field of AI today lies in the shift from size to functionality, promising a new era in the world of intelligent systems.
István ÜVEGES, PhD is a Computational Linguist researcher and developer at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences. His main interests include the social impacts of Artificial Intelligence (Machine Learning), the nature of Legal Language (legalese), the Plain Language Movement, and sentiment- and emotion analysis.