Artificial intelligence-based algorithms are now of inescapable importance in many fields. Their applications include automatic content recommendation systems for streaming providers, chatbots (e.g. ChatGPT), Google’s search interface, etc. The applications listed above are designed to help users make decisions, find information, or organize the vast amount of information available online to make it easier to find what they are looking for. In fact, many of the most popular online services are nowadays unthinkable without the use of artificial intelligence since they make the navigation efficient and accessible to all in the vast amount of data present in the online space.
In addition to the above, however, other uses of digitized data can be envisaged, which are less obvious and are not necessarily aimed at satisfying the needs of the average user, but rather at serving market or political interests, even though (conscious or unintentional) invasion of privacy.
In the world of artificial intelligence, and specifically in its subfield of machine learning, the quantity and quality of training data is a key factor. In the traditional sense, privacy in the online / digital space can be defined as private conversations, social media posts and information related to the individual. However, in addition to these, users leave behind several online footprints that are either not protected at all or are protected by inadequate means by the legal rules on privacy.
Examples include data sets such as browsing history, content viewed, ‘liked’, individual contact networks, geolocation data, etc. Until the last decade, this information has existed mostly in isolation, on separate servers, under the ‘authority’ of different data controllers or collectors. However, from the point at which these data sources became interoperable (whether through the activities of data brokers or otherwise), they have given rise to a mass of data (mostly referred to as ‘big data’) which nowadays offers the possibility of psychological profiling of the source individual, micro-targeting of ads and content, or even the use of psychometric methods.
Unlike traditional information that people are basically aware of sharing (for example, uploading a photo), this data is often generated in ways that the user is not necessarily aware of. Nevertheless, by using it, machine learning algorithms can be a much more effective tool than before for profiling an individual, whether it is (automatically) recognizing and attributing values to a person, be it party preferences or other interests. Mapping groups thus formed (e.g., by unsupervised machine learning algorithms) back to the individual is the key to developing effective and automated opinion-forming techniques.
The process by which data is “turned into gold” in the right hands, and the ways in which it can be used to serve business or policy interests is a multi-stakeholder process that involves a range of technological innovations, emerging trends, regulatory challenges, and perspectives.
In response to the insatiable demand for data from machine learning algorithms, there is now an entire industry dedicated to collecting and selling user data in the most efficient and detailed way possible. Given the rapid progress in both IT and artificial intelligence research, it is reasonable to assume that the problems we are already seeing (data leaks, manipulation, micro-targeting, psychometric profiling, etc.) will only get worse in the future without the right regulatory environment or may be replaced by new challenges that are not yet foreseen.
Among the (already existing) uses of artificial intelligence that are of concern, this paper presents some of the ways in which it can be used to influence election outcomes. The issue of political polarization in social media is also discussed in more detail.
In modern democracies, weaponized / manipulative AI poses a serious threat to the fairness of elections, but also to democratic institutions more generally. In the case of elections, the outcome can be influenced in several ways, in line with the interests of a third party.
The attacks, carried out by artificial intelligence used for malicious or even economic, political interests, can take the form of “physical” attacks (such as paralysis of critical infrastructures or data theft), or psychological effects that poison the voters’ trust in the electoral system, or discredit certain public actors.
In the present context, micro-targeting refers to personalized messaging that has been fine-tuned based on previously collected data about a given user, such as an identified psychological profile. Messages targeted in this way are much more likely to influence or even manipulate opinion than traditional advertising techniques.
This is exemplified by the suspicious cases of abuse uncovered by the Mueller report in the US in connection with the 2016 presidential election, one of the main arenas of which was/is social media platforms.
The heightened concern about such activities is illustrated by the fact that, following the introduction of the GDPR, several EU Member States have initiated investigations against companies involved in data collection. For example, the Irish Council for Civil Liberties (ICCL) report raises serious concerns about the activities of Google and other large-scale operators whereby data collection companies auction information about users, linked to their real-time geolocation, to potential advertisers and then transmit the data packets to the ‘winning’ bidder (Real Time Bidding – RTB). In several of the cases studied, the data transmitted in this way included sensitive health characteristics such as diabetes, HIV status, brain tumors, sleep disorders and depression.
The report found that in some cases, Google’s RTB system forwarded users’ data packets (which may have included the above-mentioned sensitive data without filtering) hundreds of times a day. The value of the data, and the seriousness of the leak, is illustrated by the fact that (also according to the report) it was used by some market/political actors to influence the outcome of the 2019 Polish parliamentary elections.
In doing so, OnAudience used data from around 1.4 million Polish citizens to help target people with specific interests when displaying election-related ads. According to the company, although the data packets were processed and transmitted anonymously, they were still uniquely identified to specific, real individuals. Moreover, these identifiers can be linked to the databases of other companies and thus continue to form a single profile. This implies not only a threatening market behavior in terms of compliance with the GDPR, but also in terms of violation of privacy rights.
Opinion bubbles and political polarization
In addition to the above, it is also significant that social media platforms, to maximize users’ time on the platform, typically present content that best matches the personality of the user, i.e., that is most likely to be of interest to them.
This kind of (AI-enabled) content pre-screening has highlighted two new and important problems in recent years. The first is the problem of the often-false positive feedback generated by the homogeneity of the ranked content, and the second is the issue of political polarization often associated with it.
The former is driven by the phenomenon that social media platforms are making it possible for people to connect with others who share a similar worldview to their own on an unprecedented scale. This kind of social selectivity, coupled with the content filtering technologies of the platforms, results in the creation of psychosocial bubbles that essentially limit the extent of possible social connections and interactions, as well as exposure to novel, even relevant information.
This phenomenon has been studied since the 2010’s, mainly based on informatics and structural measures of online behavior and social networks. Among the later research, the Identity Bubble Reinforcement Model (IBRM) stands out, with the dedicated aim of integrating the social psychological aspects of the problem and human motivation into the earlier results. According to this model, the expanded opportunities for communication and social networking in social media allow individuals to seek social interactions (mainly) with people who share and value their identity. This identity-driven use of social media platforms can ultimately lead to the creation of identity bubbles, which can manifest themselves in three main ways for the individual:
- identification with online social networks (social identification),
- a tendency to interact with like-minded people (homophily)
- and a primary reliance on information from like-minded people on social media (information bias).
Within social media, these three elements are closely correlated and together reflect the process of reinforcing the identity bubble.
The data generated online can also be used to make predictions about users’ personality traits. One of the priority areas for these is psychometric use. This is closely related to the use of the online footprint (and its connection with the right to privacy and confidentiality) and is now also known as a possible technique for influencing voter opinion.
Psychometrics (psychometrics – psychometry) is the field of psychology that deals with testing, measurement, and evaluation. More specifically, the field deals with the theory and techniques of psychological measurement, i.e., the quantification of knowledge, skills, attitudes, and personality traits. Its classical tests aim to measure, for instance, the general attitude of employees in a work environment, their emotional adaptability, and their key motivations, but also include aptitude tests to assess the success in mastering specific skills, or classical IQ tests as well.
In the context of social media, and big data in general, the concept came to the fore mainly in the context of the 2016 US presidential election, along with another technique, micro-targeting.
The name of Cambridge Analytica, which first received significant media attention in July 2015, shortly after the company was hired by Republican presidential candidate Ted Cruz’s team to support his campaign, is inescapable on this topic.. Although the campaign was unsuccessful, Cambridge Analytica’s CEO claimed that the candidate’s popularity had increased dramatically thanks to the company’s use of aggregated voter data, personality profiles and personalized messaging / micro-targeting techniques. The firm could also have played a role in shaping the outcome of the Brexit campaign according to a familiar scenario. In 2016, it was also suspected that US President Donald Trump had also hired the company to support his campaign against Hillary Clinton. In this context, there are reports that Cambridge Analytica employed data-scientists who enabled the campaign team to identify nearly 20 million swing voters in states where the outcome of the election could have been influenced. Winning voters in these states could ultimately and significantly boost Trump’s chances in key states, as well as in the general election.
The company also claims that one of the keys to their success has been the combination of traditional psychometric methods with the potential lies in big data. Their free personality tests, distributed on social media platforms, promised users more information about their own personality traits at no cost. The data submitted could then be linked by Cambridge Analytica to the name of the submitter and a link to their profile.
The resulting data set (supplemented by other public and private user data) allowed the company to classify some 220 million US voters into 32 different personality types, which could then be targeted by the ads that most appealed to them.
Given the right amount of data, the method can be implemented in reverse; after collecting the same data from users who were not profiled by the survey as those who were surveyed, this data can be used as input for machine learning models that can then classify users who were not previously profiled into the personality groups mentioned above. Although the real success of Cambridge Analytica’s methods has not been clearly established, the moral, political and security concerns surrounding the company undoubtedly highlight both the potential of the use of online footprint data and the ways in which it can be used in ways that are legally unregulated or morally and ethically questionable.
Taken together, the above illustrates the potential lying in the use of the ever-increasing amount of data currently available on the internet. However, given that the so-called ‘data-driven economic model’ (where the primary source of profit is not industrial production, but peoples’ attention) is not yet fully developed, the ethical and legal concerns that have already been raised undoubtedly highlight the risks of further proliferation and refinement of AI-based technologies, leaving many questions unanswered.
Initiatives are already being taken to tackle these problems. For example, the European Union’s efforts to achieve digital sovereignty seek to respond to the uneven distribution of artificial intelligence capacities (research, infrastructure) in the world, which is currently to the detriment of the Union. Significant progress has been made with the adoption of the GDPR in relation to the processing and use of personal data, but (as the above-mentioned report of the Irish Council for Civil Liberties and Justice reveals) it is far from clear that in practice what is an effective and appropriate way forward on issues that are not currently regulated and in terms of detecting abuse.
Given that the function of law is primarily to respond to social and technological changes that have already occurred by fine-tuning the regulatory environment, a comprehensive study of the problems related to AI from a legal perspective is also essential.
Another issue that is not discussed in detail in this article, but which is also of particular importance, is the question of the contrasts that the use of AI-based capacities concentrated in the hands of the state entails. Such capacities can be used both to defend liberal democracies and to build authoritarian (and/or surveillance) states, as the People’s Republic of China has done, for instance, by introducing a ‘social credit system’.
After examining the issues involved, perhaps the most important finding is the need to improve the regulations surrounding artificial intelligence, to update them to meet the challenges of the times, and to develop cyber defense procedures that can detect, predict and possibly prevent manipulative techniques using artificial intelligence.
 The quote refers to a common saying, especially in the United States, which emphasises the data-based dimension of economic growth: ‘Data is the new gold’. (e.g., Rachel Nyswander Thomas: Data is the New Gold: Marketing and Innovation in the New Economy https://www.uschamberfoundation.org/data-new-gold-marketing-and-innovation-new-economy (Accessed: 12. 22. 2022.)
 In addition, artificial intelligence can be used to amplify the effects of efforts to distort election results, such as gerrymandering, which are not really relevant to the topic of this paper. Cf. Manheim, Karl – Lyric, Kaplan: Artificial intelligence: Risks to privacy and democracy. Yale JL & Tech. 21, 2019 p. 133 – 135.
 Robert S. Mueller, III: Report on the Investigation Into Russian Interference in the 2016 Presidential Election:(https://www.nbcnews.com/politics/politics-news/read-text-full-mueller-report-n994551 Accessed: 12. 19. 2022.)
 (EU) 2016/679
  Ryan, Johnny: Two years of DPC inaction on the ongoing RTB data breach – Submission to the Irish Data Protection Commission (21 September 2020): https://www.iccl.ie/wp-content/uploads/2020/09/1.-Submission-to-Data-Protection-Commissioner.pdf
 Ibid. 6-7.
 Ibid. 5.
 For example, ranking content in the newsfeed according to relevance and interests.
 Kaakinen, Markus –Sirola, Anu – Savolainen, Iina – Oksanen, Atte: Shared identity and shared information in social media: development and validation of the identity bubble reinforcement scale, Media Psychology, 23:1, 25-51, 2020, p. 25-26.
 Pariser, Eli: The filter bubble: What the Internet is hiding from you. London, England: Penguin, 2011
 Zollo, Fabiana – Bessi, Alessandro – Del Vicario, Michela – Scala, Antonio – Caldarelli, Guido – Shekhtman, Louis – Quattrociocchi, Walter: Debunking in a world of tribes. PloS ONE, 12(7), 2017
Krysten Godfrey Maddocks: What is Psychometrics? How Assessments Help Make Hiring Decisions: https://www.snhu.edu/about-us/newsroom/social-sciences/what-is-psychometrics (Accessed: 12. 22. 2022.)
 Vogel, Kenneth P. – Parti, Tarini: Cruz partners with donor’s ‘psychographic’ firm: https://www.politico.com/story/2015/07/ted-cruz-donor-for-data-119813 (Accessed: 12. 22. 2022.)
 Doward, Jamie –Gibbs, Alice: Did Cambridge Analytica influence the Brexit vote and the US election? https://www.theguardian.com/politics/2017/mar/04/nigel-oakes-cambridge-analytica-what-role-brexit-trump (Accessed: 12. 22. 2022.)
 Blakely, Rhys: Data scientists target 20 million new voters for Trump: https://www.thetimes.co.uk/article/trump-calls-in-brexit-experts-to-target-voters-pf0hwcts9 (Accessed: 12. 22. 2022.)
 González, Roberto J.: Hacking the citizenry?: Personality profiling, ‘big data ‘and the election of Donald Trump. Anthropology Today 33.3, 2017, p. 9-12.
 The results could be evaluated according to the Big Five personality model, a long-established, fundamental concept in personality psychology research about the classification of an individual’s personality traits into factor groups. These main traits are extraversion, friendliness, conscientiousness, emotional stability, and culture/intellect.
 Harry Davis: Ted Cruz using firm that harvested data on millions of unwitting Facebook users: https://www.theguardian.com/us-news/2015/dec/11/senator-ted-cruz-president-campaign-facebook-user-data (Accessed: 12. 22. 2022.)
 Confessore, Nicholas –Hakim, Danny: Data Firm Says ‘Secret Sauce’ Aided Trump; Many Scoff: https://www.nytimes.com/2017/03/06/us/politics/cambridge-analytica.html (Accessed: 12. 22. 2022.)
 EPRS Ideas Paper – Towards a more resilient EU: Digital sovereignty for Europe: https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/651992/EPRS_BRI(2020)651992_EN.pdf (Accessed: 12. 23. 2022.)
 Nicholas Wright: How Artificial Intelligence Will Reshape the Global Order – The Coming Competition Between Digital Authoritarianism and Liberal Democracy: https://www.foreignaffairs.com/articles/world/2018-07-10/how-artificial-intelligence-will-reshape-global-order?check_logged_in=1&utm_medium=promo_email&utm_source=lo_flows&utm_campaign=registered_user_welcome&utm_term=email_1&utm_content=20221108 (Accessed: 12. 23. 2022.)
István Üveges is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.