Submission
Privacy Policy
Code of Ethics
Newsletter

The Social Impact of AI-Based Content Moderation (Part II.)

The spread of disinformation

A prominent case of disinformation is the proliferation of fake news and the recently emerging deepfakes (which also rely heavily on generative artificial intelligence). Fake news is deliberately disseminated misleading information whose primary purpose is, for example, to spread propaganda and/or manipulate public opinion. The deliberate spread of political disinformation can also be a tool of psychological warfare. In everyday speech, any false information that is widely disseminated to the public is referred to as disinformation. Deepfake is a technology that uses Artificial Intelligence, or more specifically Machine Learning, to create fake images, videos, or audio recordings that appear to be real. The term itself comes from a combination of the words “deep learning” and “fake”. One of the most common uses of the technology is to replace someone else’s face in a video with another person’s face so that the result appears to be authentic. This technology can be used to “trick” a person into doing or saying things that they never actually did. For example, fake videos of politicians or celebrities saying or doing untrue things are often the result of deepfake technology.

One of the main goals of content moderation is to prevent the spread of such disinformation. However, automated (moderation) systems often fail to detect or remove false information in a timely manner. This is particularly problematic for viral content, as algorithms that rank content based on engagement (likes, shares, comments) tend to disseminate this type of information quickly and widely, in extreme cases to masses of users before a moderation decision is taken. This is because, although moderation itself is now largely automated, it still relies heavily on user feedback, which often takes time. As a result, disinformation can reach the public much faster than algorithms or even human moderators can intervene.

While it is true that, for example, the DSA states that content created by artificial intelligence must be clearly distinguished from original so that users know that the content is not real, enforcing this in practice is far from a trivial task. Especially if we consider that disinformation distributors are not interested in this kind of cooperation.

Another problem is that algorithms are often unable to properly interpret the context of content, which can result in real and relevant information being removed in the fight against disinformation. This is especially true in situations where there are only subtle differences between false information and real facts, or where sarcasm, irony, or culturally specific terms appear. Such misunderstandings can also undermine the overall credibility of social media platforms, as users’ trust is reduced if they feel that platforms arbitrarily remove certain content.

Right to equal treatment

The issue of equity in algorithmic content moderation is also increasingly coming to the fore. As technology has evolved, it has become apparent that automated systems often act in a discriminatory way, especially against marginalized communities. In many cases, these prejudices are rooted in the training data used to develop the algorithms, which may carry with them existing prejudices and stereotypes in society.

For any machine learning-based solution, it is true that the effectiveness of the algorithm depends heavily on the training data. If this data is biased or incomplete, the algorithms will also operate based on these biases. The biases present in the data can be explicit or implicit, and correcting for the latter is much more complicated, if it can be done at all. For instance, if a moderation algorithm is trained to remove a disproportionate number of posts from a minority group of users, it is more likely to remove those posts later, even if they do not violate community guidelines. Conversely, similar content from other communities will remain untouched, leading directly to a form of unequal treatment.

A specific example of this is when members of the black community have shared content related to the Black Lives Matter movement and often had it removed, while similarly strident content from other movements has remained untouched. This unequal treatment can undermine trust in platforms and increase social divisions. Furthermore, such discriminatory practices may force members of marginalized communities to self-censor, leading to further oppression.

To address this problem, technology companies should pay more attention to ensuring the diversity and accuracy of the training data used for machine learning algorithms and take cultural and social differences into account when developing algorithms. In addition, it would be important that human moderators better complement and correct the decisions made by algorithms when moderating content (e.g. Reinforcement Learning with Human Feedback—RLHF). This is particularly important in cases where the context determines the legality of a decision. In addition, greater transparency and accountability would of course be essential to regain user trust. Most social media platforms today simply treat the algorithms used for moderation as know-how, which does nothing to build trust in them, and is a worrying practice in many respects, given these platforms’ impact on freedom of expression.

The impact of automatic moderation on users

The psychological effects of algorithmic content moderation can be significant for users, especially if they are frequently confronted with error messages or inadequately communicated moderation decisions. These effects can be detrimental in several ways and can contribute to a deterioration of the user experience and a loss of trust in online platforms.

When users experience their content being removed or blocked without a clear explanation, it can easily lead to frustration. This frustration can, of course, be exacerbated if the moderation result is flawed or unfair, for example, if a harmless post is wrongly labeled as an offense (false positive). The appeals process can also be opaque and unpredictable for users, often time-consuming and complicated, and not always leading to a satisfactory outcome.

Anxiety can also be a significant psychological consequence, especially for those who regularly use such platforms for communication, self-expression, or even business purposes. The constant uncertainty about moderating content and the fear of having an important post or account removed can increase users’ stress levels. Systematic removal of content or blocking of accounts can have a serious impact on users’ mental health, especially for those who are part of online communities or whose online presence is essential to their work or to building or maintaining social relationships.

One of the main problems with algorithmic moderation is the lack of transparency mentioned above. Often, users are not given a detailed explanation of why the content they have created has been removed or why their account has been blocked. This lack of transparency can contribute to dissatisfaction, as users may feel that they are the victims of arbitrary decisions and do not know how to avoid similar situations in the future.

Such situations also increase distrust in platforms. If users feel that moderation decisions are inconsistent, or that they have no way of effectively appealing to them or understanding the background to the decisions, this may alienate them from the platforms in the long run. This can be particularly problematic for social media platforms that rely heavily on user activity and engagement.


István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

Print Friendly, PDF & Email