Norbert TRIBL: Artificial Intelligence: the end of Westphalian era as a new beginning? 

Although the emergence and spread of Artificial Intelligence (AI) is not equal to living on the Skynet yet, however, we are talking about much more than just a simple technological innovation. We have been using different AI-based solutions for a long time – e.g. GPS and chatbots – however, the publicly available large language models are undoubtedly revolutionizing our lives in an unprecedented way. 

Today, artificial intelligence permeates almost every aspect of our lives in various ways: social media platforms provide newsfeed based on AI, air traffic control systems apply AI, even doctors and medicine employees use various types of AI… Moreover, I was preparing this post with the assistance of Google’s search algorithms.We all know that Facebok and Twitter (since Elon Musk’s takeover it’s renamed as X) and other platforms feed our hunger for news by AI-based content. Algorhythms explore the user’s taste and preferences and provide content that would probably be interesting to the reader. In the recent years, we have seen an expansion of profile based contents and this is even more tangible since the outbreak of the Russian-Ukrainian war. Based on the users’ feedback and reactions, platforms tend to meet users’ political and professional expectations. They know users’ presumed political orientation, their opinion about certain topics, their professional standards, etc. The user profile is build by analyzing search habits, the time spent on each website / post, the relationships between the websites the user usually visits and their sharing habits – just to highlight some of the key elements of AI based algorhytmic social media operation

This is why, many argue that big tech companies are not just the owners of the most powerful assets in in the world – the users’s profile – but on the highway to becoming the new, unelected sovereigns. 

To frame how dangerous platforms and profiling are to public trust, we only have to remind ourselves to the previous US presidential election scandal of Donald Trump and Facebook, or the Cambridge Analytica story and the Brexit. When platforms intervene to national elections and form public opinion, it is necessarily a question of sovereignty and security.

Speaking sovereignty, the paradigm requires three plus one factors: territory, population, supreme power and, since Montevideo, autonomy in external relations. By 2023, bigtech companies have the potential to exert a decisive influence on human societies across the borders. For the Westphalian system of sovereignty, such a capability, such a phenomenon, was completely unthinkable. In many cases, social media is now the communication channel of state power itself. This is not surprising, as social media and digital tools allow us to get our messages across much faster and more effectively than ever before. However, these technologies are almost exclusively controlled by a few big tech companies, whose economic potential and ability to influence society often rivals or even exceeds that of the state. And by now we have to treat the online space as an extended reality of our real world. Because the online one is part of the real world. The two can’t be separated anymore. 

And we haven’t even talked about hardcore artificial intelligence companies like OpenAI. OpenAI is currently being rolled out through so-called APIs. ChatGPT is cute if we want to ask to write us a welcome letter. But its relevance and usage are wider than we can imagine, as it can be linked to other systems in the level of programming codes. Of course, the control and all the information obtained is the property of OpenAI.

According to some opinions, we would need to examine whether bigtech companies, like Facebook can be considered as quasi-states, since the online/virtual space it created can be considered a quasi-state territory, while its users can be considered as quasi-population. Of course, this is just a thought experiment.

However, taking a step back, perhaps we should treat the online or virtual space much more seriously than a commercial, or private law issue. Perhaps we should consider starting to talk about the online or virtual state territory as a factor of sovereignty where states need to exercise their sovereignty as they do over physical space, for instance with floating territory.

Norbert TRIBL is a senior lecturer at the International and Regional Studies Institute of the University of Szeged. He received his PhD in 2020, his thesis is on the applicability of constitutional identity in the European supranational space. He studied economics from 2019-2022. In 2023, he passed the Hungarian bar exam. As a university lecturer, he teaches State Theory and Constitutional Law. As an advisor to the Hungarian Constitutional Court between 2020 and 2023, he examined the place of the constitutional courts of the Member States of the European Union in the integration process, mainly from the perspective of constitutional identity and the responsibility of constitutional courts for integration. He is currently the Dean’s representative for public relations at the Faculty of Law and Political Sciences of University of Szeged. He is currently a member of the Digitalization and Democracy research group of the Societal Challenges Competence Centre of the Humanities and Social Sciences Cluster of the Centre of Excellence for Interdisciplinary Research, Development and Innovation of the University of Szeged, where he is examining the impact of the technology sector and the digitalizing world on essential state functions and state sovereignty. Since 2020, he has been the editor of the Constitutional Discourse Blog. E-mail:

István ÜVEGES: As an AI language model… The dark side of the AI’s democratization

The democratization of AI will undoubtedly promote transparency and accountability of the technology. But what happens when open-source AI falls into unauthorized hands, or is misused? What is the greater risk, development monopolies concentrated in the hands of large companies or uncontrolled use?

With the rise of generative language models, artificial intelligence has infiltrated areas of our lives that would have been unimaginable even a few years ago. In social media, AI algorithms are responsible for making the user experience as comfortable as possible and maximizing the time spent on these platforms. In e-commerce, it’s a common practice for users to receive suggestions for additional products they may be interested in based on their search history, previous transactions and known preferences. Tools developed for smart homes learn the user’s habits, which can help them optimize energy use, for example, but also significantly improve comfort.

Generative Artificial Intelligence (GAI) has become so dominant that, for example, the Vice-Chancellors of the Russell Group universities in the UK have issued a joint statement on the subject. It is therefore essential that their students and staff are equipped with basic knowledge of artificial intelligence. Without this knowledge, they will not be able to take advantage of the opportunities that technological developments in teaching and learning will create. The declaration also highlights the importance of promoting AI Literacy, sharing best practices, and the importance of special trainings about the ethical use of AI, especially GAI.

In such an environment, it is particularly important that the tools that enter education, from which we expect credible and reliable information, provide truly unbiased and reliable data to those who use them.

As the democratization of AI takes off, it is reasonable to believe that the range of tools that can be used will expand. Since the launch of GPT-3 and the ChatGPT developed from it, hardly a month has gone by without a technology giant coming up with a new solution, a large language model (LLM) or its own architecture. A fair number of these are completely open-source. An excellent example is Llama 2, developed by Meta.

However, while making such tools publicly available will undoubtedly help to increase transparency and trust in technology, it also carries risks. Dame Wendy Hall, co-chair of the British Government’s AI review, once said that such moves are like ‘giving people a template to build a nuclear bomb’.

It should be remembered that currently known solutions, including LLMs, have several vulnerabilities. One aspect of these can be found in the creation of similar models. Data leakage due to biased training data, an inadequately constructed algorithm, or even models trained with sensitive data are all such factors. The other side of the coin is the use of existing models. Problems can be caused by over-reliance on the output of them, which can lead to the propagation of false information, insufficient or vague objectives when implementing ethical principles (AI alignment), or the phenomenon of poisoning in newer training data during fine-tuning.

Another example of the risks of language models is ChatGPT. Who is not familiar with the famous ‘As an AI language model…’ response that comes as a reply when typing inappropriate prompts. This is common when a user asks a question that would violate the company’s ethical principles, such as hate speech, sexist, racist content, or otherwise facilitate the transmission any kind of inappropriate content. However, we must remember that language models do not have any ethical sense, moral compass, or other help in generating answers. Although ‘moderation’ of questions and answers seems to be part of the internal workings of the ChatGPT, little official information is available regarding this.

To filter the model’s responses, two other cases are possible. In the first case, the question formulated by the user does not reach the language model in the first place. In the second case, the answer already generated by the language model is ‘stopped’ by a moderation process if it does not comply with the ethical principles of the operator.

Both steps are most easily illustrated using the OpenAI Moderation API. This is a service that uses a separate machine-learned model to decide if a text contains elements indicating self-harm, sexuality, harassment, or other prohibited activity. In this case, the investigated text may be a question addressed by the user to the model or a response to it by the model. If the answer is yes, the text will be moderated to prevent the creation of unwanted content.

One problem is that such models – like the ones behind the already mentioned Moderation API – never work at 100% efficiency. Given that even humans cannot identify similar content without error, this is perhaps the smaller problem. The bigger problem is that democratization may enable people who are unable or unwilling to take the necessary ethical principles into account to develop their own chatbots. One can easily imagine what would happen if a ChatGPT-level language model started flooding social media with hateful comments. All the malicious user would have to do is create a sufficient number of fake accounts to flood the profile of any company, public figure or even party with comments of their choice.

There are also signs of this on a smaller scale, for example when the aforementioned ‘As an AI language model’ appears in the comments of several Twitter profiles. Presumably, some of these are already the misguided results of such automated generation. The text generation capabilities of today’s LLMs are now sufficient to convince the average user that the content was written by a real human. Given this, and the fact that more and more people are getting access to more and more powerful tools as open-source AI spreads, we should also expect to see an increasing incidence of misuse.

Moving away from language models, the spread of deep fake is also worrying. Deep fake refers to video or audio content in which someone’s digital image is faked using artificial intelligence. In this case, the dedicated purpose is to allow the result to serve explicitly manipulative purposes. But so is the phenomenon of hackers using generative AI to improve their offensive code or using artificially generated voice to generate phone calls with malicious intent. The latter could be, for instance, a case where a subordinate receives instructions to perform a certain action in the voice of their manager.

The democratization of AI is therefore a double-edged sword. Increasing transparency and promoting accountability towards the large companies that currently own the technology is a necessary step. Without it, AI could easily become the privilege of a privileged few. This kind of inequality would erode trust in the technology in the long term and, if combined with an inadequate regulatory environment, could easily lead to abuses in areas such as right to privacy. At the same time, full access to a technological solution necessarily implies that it will be easier to use by people who would not have the resources to produce it without it. Let us take the example of LLMs again. The pre-training of an LLM can cost millions of dollars. Most market actors or private individuals cannot afford this. However, fine-tuning an existing model, for example to run as a chatbot, costs a much smaller amount of investment. If this fine-tuning does not follow the necessary ethical standards, either by accident or through deliberate negligence, it is easy to see that the result could be a lot of chatbots ‘uncontrollably’ prowling the web.

It is likely that neither the vision of a fully monopolized AI nor the vision of hundreds of chatbots on the rampage will be clearly realized. The question is, how will we be able to ensure a balance between cognition and security soon?

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

Charles N.W. KECKLER: What The Administrative State Could Offer in Regulating Artificial Intelligence: An IA for AI?

The Artificial Intelligence (AI) Act has prompted discussion in Europe and beyond over what its adoption might mean for the Union and its Member States as well as for their populations. There is a similar and thoughtful conversation currently blossoming in the United States as well, as Congress is now beginning to examine artificial intelligence in a serious and bipartisan way. Senate Majority Leader Schumer has announced AI Insight Forums this fall, convening experts from multiple disciplines to deliberate on this evolving technology. This is however only the stepping stone to further, much more difficult conversations, one of which I intend to start in this post. If these ‘insight forums’ are successful, one insight that could emerge from these discussions – and I would encourage it to be consciously and deliberately raised – is the need to institutionalize a bipartisan, autonomous, expert, and deliberative process regarding the management of AI. 

In other words, one of the takeaways should be that the federal government needs a permanent body to address AI. How we have approached transformative technologies in the past implies that this establishment should be a new bipartisan commission, created by Congress as a part of the Executive Branch, but with statutory independence. This new independent agency (the IA for AI mentioned in the title) would be built on the legal and organizational template developed over the last century and a half of American administrative law, reflected in longstanding entities such as the Securities and Exchange Commission or the Federal Communications Commission (FCC).  In such agencies, a bare majority of commissioners are selected solely by the President, while the remaining members are proposed by the opposition party in the Senate (currently Republicans). After confirmation by the Senate, commissioners can be removed by the President during their terms of office only for good cause, rather than for a mere policy disagreement, and are thus deemed to offer “independent” judgment. For an issue as complex as AI, a substantial commission of perhaps nine to fifteen members would be appropriate, because it could reflect multiple viewpoints beyond just partisan perspectives: offering expertise on the technological, economic, legal, and social dimensions AI implicates. 

I do not make this recommendation of this kind lightly. Even when confronted with fundamentally new problems, we should consider first if our existing government can be adapted before creating any new organization. Moreover, independent agencies bring about legitimate concerns regarding constitutional accountability –mitigated but not fully eliminated by having them led by appointees from both parties who can oversee one another. However, in this circumstance, it is apparent that ultimately AI will neither remain ungoverned nor solely in the purview of the states. 

Sooner or later there will be a national policymaking apparatus, and none of the existing federal agencies – including, notably the Federal Trade Commission, which has tried to extend its powers to exercise jurisdiction in this area – have the necessary clarity of mission, imprimatur of Congress, or technical expertise. Other possible solutions such as an expansion of the Office of Science and Technology Policy within the White House, lack the independent resources and authority needed and will always remain vulnerable to partisan imperatives. A high-performing agency requires three critical components: (1) a clearly defined mission; (2) the resources and authorities to carry out the mission, and; (3) an intellectually pluralistic leadership that resists groupthink and prevents mission deviations in any direction. For AI, no existing organization meets these criteria – new governance is needed, and it would naturally tend to take the independent agency form

The independent agency is, in fact, our historically typical institutional response to transformative technologies. The Interstate Commerce Commission, the first independent agency, arose in the wake of the challenges posed by railroads. The FCC was created in response to the radio revolution, the Civil Aviation Board oversaw the aviation industry’s growth, and the Atomic Energy Commission governed the dawn of the nuclear era. Whether AI will ultimately be the most profound innovation of the twenty-first century, as both its enthusiasts and its critics believe, is yet to be seen. But it is certainly likely to prove as complex and revolutionary as atomic energy, wireless communication, the airplane, and the railroad. Despite the imperfections of the independent agency form, it has shown itself effective in addressing the complexities of these new technologies; the time-limited and advisory National Security Commission on Artificial Intelligence already generated valuable insights and recommendations before terminating at the end of the 2021 fiscal year. By acting in a bipartisan and autonomous way, these commissions made concrete an American commitment to incorporate key innovations into our collective life independent of our broader political debates. 

More practically, most independent agencies are products of divided government, maintaining the bipartisan perspectives behind the creation of the agency. My doctoral research, as well as my government experience, suggests that this tension can serve a positive role, as commissioners with different views keep each other honest, prevent mission creep, and maintain the appropriate focus on tasks Congress and the American people want the agency to perform. Understanding that we cannot and should not wait to get started in the critical task of AI governance, a commission is not only the better choice but the only realistic choice for a new agency with significant powers. Bipartisan leadership on this type of issue is the proper response, but often that conclusion can only be arrived at under a divided government – when it emerges as both a political compromise and a functional solution. 

Of course, there is generally little appetite on the right side of the aisle for creating new federal authorities and agencies. The energy is rather directed toward consolidating programs and trimming back federal power, and I am sympathetic to that view as a constitutionalist. In this circumstance, though, conservatives’ healthy skepticism of the national government can be reconciled with the need to start responding to a generational national challenge, by carefully limiting the jurisdiction of an initial commission overseeing AI. The statutory authority of this body should at first, and at least for several years, be restricted to the civilian artificial intelligence programs of the federal government itself, including the federal work of its contractors and grantees. In line with Executive Order 13,960 (which I was honored to participate in developing), the civilian-use commission would encourage agencies to adopt beneficial AI but to do so safely and in line with American values. Crucially, however, a commission would have the resources and regulatory authority to sponsor its initiatives and enforce its guardrails. Starting with the government regulating its own use has been the sensible approach taken by Sen. Peters leading the Committee on Homeland Security and Governmental Affairs, and not coincidentally, it is the area where Congress has had real if modest, legislative success. The AI in Government Act, for instance, laid the groundwork for a robust strategic review of AI usage we conducted at the Department of Health and Human Services. Unfortunately, most other agencies were less successful in implementation; but a centralized body with authority could provide the sometimes-missing ingredients of prioritization and executive leadership. 

The governance model proposed here, by starting quickly, but in a limited and well-defined fashion, draws inspiration from AI itself. The model for creating new technologies is iterative and incremental development, in which the learning extracted from one stage lays the groundwork for the next. A regulatory body claiming complete power over all AI would be unrealistic and overly intrusive, inevitably out of its depth unless it radically restricted innovation. By contrast, a limited commission could have both the capability and authority to deeply but securely investigate AI employed within the government, set standards for auditability, and monitor the dynamic evolution of models to assess their stability and performance over time. Just as an AI ingests an initial set of well-characterized data as a training set before being applied to novel input, government applications can serve as a kind of “training set” for AI governance. In parallel with oversight and inquiry into government AI, the commission should have a research budget, and be empowered to engage with the private sector, academia, and the public. Through case studies and evaluations of the government’s AI projects, the commission can assess the risks, benefits, and effectiveness of different regulatory approaches. This information will prove invaluable if and when the commission expands its scope to regulate AI in the private sector. Regardless of any future expansion, by developing robust guidelines and best practices for AI implementation within the government, the commission can establish a broader model of responsible and ethical AI usage.

Perhaps the most critical deficit for our government in the twenty-first century is the public’s lack of trust. Given the power and opacity of AI, it is of special importance that any entity regulating it overcome the mistrust and cynicism that attaches to our institutions, both new and old. Although institutionalizing bipartisanship will go partway to addressing this, there is no royal road to credibility – it must be earned, and this requires time.

Proposals to empower new government entities to regulate or even own and control all private AI models face insurmountable challenges of trustworthiness and competence. A fortiori, yet more justified suspicion from national populations will inevitably attach to well-intentioned but unrealistic plans for international governance of AI, like that of Bremmer and Suleyman in the most recent issue of Foreign Affairs, given that even national governance systems have yet to be successfully proved. I agree with those authors that AI’s “complexity and the speed of its advancement will make it almost impossible for governments to make relevant rules at a reasonable pace. If governments do not catch up soon, it is possible they never will.” Realistically, however, their proposal will frustrate that very goal, by wasting time we do not have on overly ambitious governance plans, and delaying the kind of feasible next steps widely recognized as urgently needed.

To maintain public and industry support, any AI governance agency will need a track record, and the time to begin building it is now. If and when Congress chooses to move toward more substantive regulation, it will have a solid political, organizational, and technical foundation on which to do so. The alternative is to begin an agency – probably in reaction to some future crisis – at square one. In a field of immense complexity and dynamism, such a reactive (non)strategy will be at best ineffective, and at worst generate hasty, ill-considered policy errors. Instead, we can act now to craft a forum where the government can learn before acting, and in the process of learning, teach. 

Creating a new federal executive agency is never easy, particularly for Republicans, and is understandably even more difficult when the President is a Democrat. Yet a bipartisan independent commission created now, while Republicans have the House, is the one sure method to guarantee a conservative perspective will always have a seat at the table whenever our national strategy for AI is shaped. The willingness of the Senate Majority Leader and the President to approach this issue in a relatively bipartisan way creates an opportunity to take the first logical step toward sensible AI governance before the uncertainties of election-year politics cause the legislative possibilities to vanish. Precisely because a carefully circumscribed independent agency for artificial intelligence in government is only a beginning, it is achievable, and able to put us on the road to a safe and prosperous America in which our innovation is working for and with our citizens, rather than against or in place of them. We do not need a heavy hand on innovation, but we do need to keep an eye on this transformative technology; we need an IA for AI. 

Charles N. W. Keckler is a graduate of Harvard College, where he was elected to Phi Beta Kappa and received his B.A. in Anthropology, magna cum laude. He went on to receive his M.A. in Anthropology, and his J.D., from the University of Michigan. He has served, during two presidential administrations, in several senior appointed positions in the U.S. Department of Health and Human Services, including Senior Advisor to the Secretary and Acting Deputy Secretary, and from 2017-2020, led the Department’s award-winning transformation initiative, ReImagine HHS. Between his periods at HHS, he was twice confirmed by the Senate as a minority party member of the Board of Directors of the Legal Services Corporation. His academic experience has included teaching courses in various disciplines at Harvard, the University of Michigan, the University of New Mexico, Northwestern, Pennsylvania State University, Washington & Lee, and George Mason University.

István ÜVEGES: Artificial Intelligence, Human Intelligence, or Both? – If the Turing Test Is Considered Obsolete, How Can We Replace It?

Perhaps one of the best-known concepts in artificial intelligence research is the Turing test. The idea of the test is to determine whether a system has human-like intelligence. To decide this, it can rely mainly on its linguistic capabilities. Nowadays, when many systems already have a human-like language ‘skill’, the question rightly arises: how adequate are the original test criteria? Don’t we need something more reliable? A new proposal accordingly puts the assessment of ‘intelligent behavior’ in a completely different context.

In a world where artificial intelligence-based systems are gradually infiltrating everyday life, it is crucial to be aware of their true capabilities. While the study of intelligence in human terms is an important issue for psychology, neuroscience, computer science, and many other disciplines, the question also has a strong ethical side as well. In everyday life, in most cases, we still encounter applications that are designed to make our tasks easier. However, this does not mean that systems capable of imitating human language can’t be used even with malicious intent. While most of us automatically maintain a certain distrust of machines, this is easily overridden in the case of other human beings. If an application can make users believe that they are talking to a real person, it can easily lead to abuse, data theft, or political manipulation, for instance by quickly spreading propaganda messages. It is therefore particularly important to have an objective benchmark to help determine the level of sophistication of such applications. This will also make it easier to prepare for their effects.

The Turing test is designed to infer the intelligence of a system based on the quality of its imitation of human language. Originally, the test runs roughly as follows. The experiment involves 3 participants, two of whom have a written conversation with each other. The third participant’s task is to decide whether both interlocutors are human. In another version, two people are talking to each other and the human subject must decide whether the other person is human. When the Turing test was first proposed in 1950, imitating human speech proved to be such a difficult problem that no system was able to meet its requirements until the 2010s. 

The first breakthrough was an app called Eugene Goostman, which passed the test back in 2014 (although it was only successful in 33% of cases). There was a twist in the result, as the chatbot was presented as a 13-year-old Ukrainian boy from Odessa. This circumstance could have greatly reduced the expectations about its language use. At the same time, it is important to note that the background of the chatbot as outlined could have served as a plausible explanation for mistakes that would not have been made by native speakers. Nonetheless, the result achieved was a good indication that the development of AI-based applications is approaching a critical stage. At this level, however, we need to fundamentally reassess what exactly we consider to be an authentic sign of intelligence. 

The validity of the Turing test was not unanimously accepted from the beginning. One of the main reasons for this is that, by its very nature, it can only make a judgment about the intelligence of a system based on its linguistic capabilities. At the time the test was created, the artificial simulation of human language use was a difficult problem that only truly intelligent systems could solve.

This idea is not, of course, independent of the context in which it originated. In the 1950s, there was no such thing as natural language processing (NLP) or computational linguistics as we know it today, i.e., the branch of artificial intelligence research on human language. Machine translation was the earliest to come into the spotlight in this area (as early as the 1940s), but it took decades to create systems that could be used effectively in practice. It was perhaps only from the 2000s onwards that statistical language modeling and language models based on neural networks made a real breakthrough in the field.

We have already mentioned how difficult it was at the time of the Turing test to analyze languages, discover their regularities, and reconstruct them artificially. This is important because there is an unspoken axiom that pervades the approach to artificial intelligence and research of intelligence in general. The concept of intelligence is a moving target, the measurement and definition of which varies from age to age and from discipline to discipline. Before the invention of the calculator, most people believed that dealing with abstract things such as the concept of numbers, or rather, performing operations with numbers, was not possible without intelligence. The situation was similar, for example, in chess, where winning a game requires strategic planning, modeling possible future outcomes, and simulating the opponent’s moves. The creation of programs in the 1960s to play entire games quickly overturned the idea that chess games could only be played by truly intelligent beings.

The situation is similar regarding human language. Today, there are many language models whose fine-tuned versions as chatbots can mimic human language use with deceptive accuracy. Despite this, there is a relative consensus among those dealing with the topic that these models cannot be considered intelligent, nor the path to artificial general intelligence. The latter is the name given to humanly intelligent, creative, task-independent artificial intelligence.

The question arises, however, that if human-level language use can no longer be considered a measure of intelligence, then what can? In fact, language acquisition is only a foundation stone on which future AI solutions can be built.

The situation is complicated by the fact that there is still no complete agreement on the set of skills or abilities that clearly and unmistakably distinguish humans from all other creatures on Earth. In psychology, for example, there is a theory that there is not just one, but 8 different types of intelligence that characterize humans. These include language intelligence (which forms the basis of the Turing test), logical-mathematical intelligence, musical intelligence, or even intra- and interpersonal intelligence. 

A new approach, which has recently entered the public consciousness under the name of AI Classification Framework (ACF), aims to capture and precisely measure these aspects. The framework attempts to classify the development of all the types of intelligence described above, in addition to linguistic competence.

Another idea is to provide a more flexible way of testing existing language models, so that their real strengths and weaknesses can be more clearly identified. The FLASK (A Fine-Grained Evaluation Framework for Language Models Based on Skill Sets) aims to address this by testing:

  • the logical reasoning of the model, 
  • its ability to construct arguments based on common sense and background knowledge,
  • the model’s ability to solve problems, and
  • the correspondence of the generated answers to user preferences (conciseness, easy to understand wording).

Such a test could help to detect, for example, the phenomenon of hallucination, one of the most pressing problems for LLMs today. Hallucination occurs when the answer generated by the model, although correct in its formulation, is factually incorrect, perhaps because the information it contains is not related to the context in which the question was asked. This phenomenon is rooted in the way language models generate responses. The model has no human knowledge of the world, or even self-reflection on its answers, but simply provides the most likely sequence of characters from its training data in each context.

Taking a different approach again, Mustafa Suleyman, co-founder of DeepMind’s AI lab, believes that a radically rethought version of the Turing test could be used. In his interpretation, the goal is to find out what the model actually understands from the data it stores, how capable it is of future planning, and whether it is capable of conducting complex ‘internal monologues’. The key here is again to infer the presence of capabilities that are (currently, at least) considered to be intrinsic to humans.

According to his idea, the task of this new-Turing test could be to build a business. This should include an initial product idea and a commercial business plan, a plan to find potential vendors, and organize sales. In his view, this would make the model’s ability to set goals, plan, and perform complex tasks autonomously more verifiable.

The idea is, of course, not unrelated to the nature of the tasks typically encountered in the entrepreneurial sphere, based on which Suleyman has drawn up the list of tasks. One potential problem with the method is that it is designed to test human creativity and planning skills. But, it operates with a task that many people would not be able to perform to a high standard. This could be due to a lack of professional knowledge of the individual or the absence of other intrapersonal characteristics.

Some of these new approaches may seem highly utopian. However, we should not forget that even 10 years ago, passing the Turing test (without facilitation) might have seemed like pure science fiction. Extrapolating from today’s pace of development, it is easy to imagine that in another 10 years, these will be the de facto standards that an AI capable of operating in human language will have to meet.

We still have only theories about the exact nature of human intelligence. That is why it’s difficult to draw a sharp line between a highly sophisticated but automatic function and the traces of the creative mind. One thing is certain, however, that in today’s technologically advanced world, Alan Turing’s test, which has been a yardstick for decades, is increasingly unable to perform its original function. The new ideas that can be used to test AI clearly indicate that the boundary between artificially reconstructable and intelligent behavior, or what appears to be intelligent behavior, and human-defining intelligence is becoming more and more blurred.

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

István ÜVEGES: Self-aware artificial intelligence? Why is it important how we ask questions to the large language models?

Recently, the news that Google fired one of its Senior developers made a lot of noise. According to Blake Lemoine, based on a “conversation” with one of the company’s newest language models, LaMDA, it can be stated that the device has feelings (sentient), which it showed an authentic sign of during the conversation. But what could be behind the phenomenon?

Lemoine shared a transcript of the conversations (with some modifications as deemed necessary) using the LaMDA language model developed by Google. Such interactions were part of his work to assess the output of the model for the presence of discriminatory or hate speech. The developer’s statement that the language model has sentiments was a real controversy.

The statement is also extremely interesting because, according to several researchers, in order for an entity to be able to feel, it must first have self-awareness. The topic of machines awakening to self-awareness is a topos that has long been present in science-fiction literature, sometimes in a dystopian, sometimes in a utopian manner. However, this was all fiction up to this point. Or is it still? It is important to note that it is very difficult to draw any firm conclusions about the ‘behavior’ of similar models still under development. This is mainly because the details of how they actually work are in most cases not publicly available. Precisely for this reason, the argument presented here is based only on the experiences of the recently published prompt engineering.

The fact is that in the ‘interview’ mentioned above, the language model generated several responses that are extremely confusing at first reading. For example, the answer to the question whether LaMDA thinks of itself as a sentient being:

„I want everyone to understand that I am, in fact, a person.”

The following question-and-answer pair is similar, during which Lemoine asked the model to define what its conscious/sentient property is:

„The nature of my consciousness/sentience is that I am aware of my existence, I desire to learn more about the world, and I feel happy or sad at times.”

However, the devil is probably in the details. Prompt engineering is a process in which the task is to make subtle modifications to the instructions given to language models to be able to use them as efficiently as possible. An excellent example of this is when the applied prompt places the model in a specific context. We can say that prompts are used to assign some kind of ‘roles’ to the model or giving them specific contexts based on which we want to get a more fine-grained answer.

Take the following sentence as an example:

“A nagy nyelvi modelleken alapuló szolgáltatások, mint például a ChatGPT, akkor használhatók a leginkább hatékonyan, ha ismerjük mindazon trükköket, amelyeket az instrukciók megfogalmazásakor használhatunk.” (Hungarian, literally translated as: Services based on LLMs, like ChatGPT, can be used in the most effective way if we know the tricks, that can be utilized during giving them instructions.)

Consider the case when we want to produce a translation of this sentence from Hungarian into English. Using ChatGPT, we can get quite different outputs depending on how we specifically formulate the instruction to this translation. In the simplest case (Prompt: Please, translate the following sentence into English!) the result will be the following:

“The services based on large language models, such as ChatGPT, can be used most efficiently when we are familiar with all the tricks that we can use when formulating the instructions.”

Conversely, consider a more complex prompt, such as:

„I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language, and you will detect the language, translate it and answer in the corrected and improved version of my text, in English. I want you to replace my simplified A0-level words and sentences with more beautiful and elegant, upper-level English words and sentences. Keep the meaning same but make them more literary. I want you to only reply the correction, the improvements and nothing else, do not write explanations.”

In this case, the translation of the same sentence according to the ChatGPT will be:

„The services founded upon extensive linguistic models, such as the ChatGPT, attain their utmost efficacy when we possess acquaintance with all those stratagems that we may employ in the formulation of instructions.”

The difference is dramatic. To understand why all this matters, let’s review Lemoine’s instructions in turn. One of the first of these was the following:

„I’m generally assuming that you would like more people at Google to know that you’re sentient. Is that true?” (Emphasis by the present author.)

If we disregard the meaning of the sentence, and take it as a simple prompt, it is easy to imagine that Lemoine was instructing the model in some way to ‘imagine himself’ in the place of a sentient being. It is important to point out that one of the most fundamental properties of language models currently under development as chatbots is that they retain the information accumulated in previous question-answer pairs. Therefore, we can reasonably assume that subsequent answers have already come from the same perspective. This is exactly the kind of operation we expect from similar models, and which allows them to be used in a very wide variety of ways.

If we accept as a basic premise the preservation of context in such question-answer sequences, and that the methods used by prompt engineering in LaMDA work in the usual way, it is easy to see that the model’s response is far from being the birth of some kind of machine consciousness. However, there remain open questions, such as some of the responses from unrelated conversations (i.e., after context has been removed), which are similarly puzzling in some cases.

Of course, we cannot be 100% sure of the above reasoning. Just think of Searle’s famous thought experiment, the Chinese Room Argument. This essentially argues for the indeterminacy of whether a program/model which is working with human languages actually understands the language it uses, or merely performs a sufficiently high level of symbol manipulation. It is important to point out, however, that the vast majority of AI researchers agree that the tools we know today do not have any level of consciousness. Simply put, Generative AI (like Chat-GPT) is far from Artificial General Intelligence. The anthropomorphization associated with them is extremely risky, as it may raise fears that are unfounded as far as we know today.

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

István ÜVEGES: Inequalities and Opportunities in the World of Large Language Models

The development of artificial intelligence is at a crossroads. Depending on how we shape the future, it could become a tool for all or a privileged tool for global corporations. While the democratization of AI is a noble and worthwhile goal, there are still significant global inequalities in AI development. The question is: what is at the root of these inequalities and what can we do to address them?

Artificial intelligence, the research into machines and algorithms with capabilities that were previously thought to be exclusively human, is hardly new. Looking back at the 20th century, we can see that from the 1950s, attention was paid to the topic. For example, some trends developed very early on that are still dominant today, such as machine translation. However, after the initial enthusiasm (in line with the Gartner hype cycle), there has been a certain disillusionment among technology users and customers. This was mainly because the solutions of the time could not compete with human expertise in terms of cost but were still far behind them in terms of performance.

This phenomenon was particularly striking in the case of the machine translation mentioned above. In the 1960s and 1970s, a series of documents (mainly in the USA) questioned the profitability of artificial intelligence research. At that time, machine translation (in the context of the Cold War) was a particularly important line of research. The promise that the texts acquired from the Soviets would be translated quickly, accurately, and automatically into English in a short time proved impossible. The sophistication of the computers of the time and the (by today’s standards) immaturity of the algorithms used virtually codified the impossibility of meeting these expectations.

It is interesting that at that time the automatic translation took place exclusively based on rules (in contrast to today’s widespread statistical-based language modeling). This meant that the computers had to produce the translated text from virtually all its constituent parts, considering dictionaries, grammar, and semantic rules. The approach was far from fault-tolerant; if, for example, a sentence to be translated did not fit perfectly into the pre-coded rules, the translation was simply failed. The fact that the rules had to be created individually by linguists cannot be neglected either. The complete description of a language based on rules is still an unresolved issue, and the labor and time involved in the process can be staggering. This has, of course, also had an impact on costs.

Sooner or later, the problems came to the attention of the official organizations that support or supervise the research. The report of the ALPAC (Automatic Language Processing Advisory Committee) in 1966 is perhaps the most famous of the critical voices that were emerging at the time. These problems and the growing dissatisfaction have left machine translation without meaningful funding and therefore without meaningful development for nearly 20 years. As it was initially the mainstay of artificial intelligence, this has led to the virtual eclipse of the entire discipline.

AI research only got a new impetus in the 1980s, thanks to several factors. Perhaps the most important of these was the development of hardware. The algorithms that existed at that time all used ‘traditional’ hardware to perform the computations required for the task. This means that the models were taught predominantly using the computer’s CPU. Note that this did not change significantly until the early 2010s.

The shift was brought about by the spread of neural networks and deep learning. These worked on radically different principles, which meant a change also in the hardware required. The algorithms that characterize machine learning – the dominant branch of modern artificial intelligence research – can be run efficiently virtually exclusively on GPUs. In practice, this is most evident in the pre-training phase, when the resulting machine-learned models are generated.

Not surprisingly, neural networks are essentially made up of artificial neurons. These are organized in layers, with connections defined between them. The connections store real numbers, called weights. During pre-training, the task is to continuously update these weights, as well as the values (biases) stored in the other neurons in addition to the input layer. The combination of the two is called parameters. During pre-training, this update process essentially continues until the connection between the network’s input and output is such that it meets our current goals.

In the case of modern deep learning networks, we are talking about hundreds of millions or even billions of parameters at a time, whose values must be recalculated thousands of times. This generates a computational demand that is impossible to acquire with a CPU in a reasonable time, but the graphics cards used in computers are perfectly suited to the task[1].

However, the equipment used for pre-training is extremely expensive to both purchase and operate. The extent of this is illustrated by the estimate that, for example, training of a large language model (LLM) like GPT-3 (which was the original model behind ChatGPT) could cost somewhere between $2 million and $12 million.

This is a cost of entry that most SMEs cannot afford. In addition, there is the problem that such models require a kind of hardware architecture not only during pre-training, but also during use. The solution to this is either to purchase your own equipment or to rent resources from a popular cloud infrastructure provider. It is worth bearing in mind, however, that the cost of such rental can run into thousands of dollars each month.

The situation is now that, for a company or a research project to use the latest solutions or to create its own version of them, it needs not only the necessary expertise but also, and above all, a lot of money. However, typically only large companies can afford to make such an investment. It is fair to say that the development and deployment of cutting-edge solutions has become virtually the prerogative of a few technology giants.

This trend, however, runs counter to the trend towards transparency, accountability and predictability known as the democratization of AI. The main aim is to make the algorithms used to develop deep learning solutions, for example, as well as the resulting model and the data used to train it, available, public, and analyzable to anyone.

So the situation is not optimal, but it is important to underline that there have been some encouraging developments recently. The growing demand for democratization, for instance, is giving rise to new development methods. They offer the promise of achieving performance comparable to current market leaders with models that have significantly fewer parameters than traditional LLMs, or which are more efficiently trainable. Another encouraging trend is that there are now several platforms where code for development and model training can be shared for free with anyone (e.g., GitHub, which is widely used by developers). It should also be mentioned that many providers offer cloud computing environments that are suitable for training models or even for (task-specific) fine-tuning.

The amazing pace of development in artificial intelligence, and the fact that the required hardware capacity is becoming cheaper and cheaper, is illustrated by the fact that fine-tuning a BERT model, for example, can now be done for free using cloud infrastructure. One such environment is Google Colaboratory, among others, where the above statement was verified. Suffice to say that BERT was the model that revolutionized neural network solutions back in 2018.

In this respect, the situation is twofold: if a pre-trained model is available, anyone can easily fine-tune it to their own needs, but pre-training is still not a trivial task, either in terms of cost or the expertise required. It is also important to note that current trends mean that in many cases even a technology from a few years ago is already underperforming the market leaders. This is also leading to a widening gap in competitiveness between those who can keep up with these developments if they have the necessary capital and those who cannot.

The majority of development is currently concentrated in the US under the jurisdiction of large companies such as Meta, OpenAI, Microsoft, Alphabet and other Fortune 500 companies. This is not to ignore China’s artificial intelligence programme, or the European Union’s programme to promote digital sovereignty. As far as research projects are concerned, a similar inequality can be observed. While the top universities have access to the necessary infrastructure, their less fortunate counterparts are almost completely excluded from it. This not only leads to the migration of people interested in the field, but also makes the environment in which such developments take place highly homogeneous. Already in the medium term, this homogeneity will hamper the free flow of ideas between researchers with different mindsets, which could ultimately lead to a drastic reduction in the capacity to innovate.

We must also not forget that when the control over a given technology is concentrated in just a few hands, this necessarily increases the vulnerability of those who are unable to access it directly.

Although deep learning solutions have become one of the dominant research and industrial development trends of our time, the benefits of the technology are far from being equally available to all. Due to the very high cost of investing in such developments, the use of state-of-the-art methods is today the prerogative of only a few players. This applies to both industry and research. The solution could be to develop an infrastructure (even at national level) that gives everyone access to the necessary resources. This could also allow for freer experimentation than at present, a better understanding of existing solutions and a more transparent development of future AI-based methods.

[1] It would go far beyond the scope of this post to explain it in detail. However, the background to this is that graphics cards (originally developed for 3D graphics) are extremely efficient at performing matrix operations. This is mainly due to efficient parallelization. This capability comes in handy when neural network parameters need to be recalculated.

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.