Forms of Algorithmic Content Moderation
Automated content moderation is growing rapidly, partly because of the volume of content online and partly to contain costs. However, the details of the technologies used are often obscure, raising serious transparency and accountability concerns.
Discovering the exact technical solutions behind the moderation activities of each social media platform is far from trivial. This is mainly because there is little publicly available, reliable information on the subject, as the companies concerned mostly treat it as a trade secret. Therefore, publicly available reports, for example, or the use of unconventional sources such as investigative journalism, are essential to uncover the situation. Their use is not unprecedented in the relevant literature.
Algorithmic content moderation on social media platforms is a set of techniques and procedures designed to automatically filter and monitor user-generated content. It is essentially an automation of the moderation process, where decisions are made based on full or partial matches with databases, possibly classification by machine learning models.
Today, the user base of the largest social media platforms is well into the billions, producing content on the same scale every day. It is not surprising, therefore, that the model of content moderation by moderators only, or in all cases reviewed by moderators, has become unsustainable. In contrast, automating content moderation is a practical, scalable solution for social media platforms.
Back in 2019, for example, Facebook made its hash-based solutions PDQ and TMK + PDQF publicly available. Their source code is still publicly available today. In computer science, especially in cryptography, hashing is a process whereby a variable-sized input (such as text or images) is used to produce a typically shorter, fixed-length representation. The key to the process is the hash function. These are algorithms that produce said fixed-length representation by mapping an arbitrary-length bit sequence to a fixed-length one. In such a case, the key is that if even a single character in the original bit sequence (e.g. in the text of a social media post) changes, the hashed value should also change. In addition, it is also important to ensure that the probability of two different texts having the same hash value by accident is very low. These are probably the main characteristics of the so-called cryptographic hashing.
The next step of the procedure is that, for example, in the case of content uploaded to social media, once the hash of the content has been calculated, it must also be decided whether it refers to an infringement. This could be something that the ToS adopted when the platform was used states is contrary to the community guidelines, or it could be a specific violation of the law. To determine this, there are hash databases that contain hash values of known problematic content, which can be shared between providers (Shared Industry Hash Database). Since the calculation of the hash and the search in such databases are extremely fast, the method scales well even in the case of the large amounts of data to be processed, such as those generated every day in social media.
The problem here may be the exact match. After all, for an image, for example, all it takes is to change the color of a single pixel and the resulting hash value will be completely different from what a search of the database would return.
This is addressed by hash functions and databases that can predict not only exact matches but also “meaningful” matches: perceptual hash functions. The difference in perceptual hash functions is that they produce identical or very similar hash values for two pieces of content that people, for example, evaluate as similar. That’s why they can detect content that has only an extra watermark, for example, in the case of images, or text where only a few minor details have changed in the problematic content. The Facebook solutions mentioned above are likely to fall into this category.
Also, Facebook, for example, has tried to filter pornographic content based on machine learning. In addition, there is now research using state-of-the-art freely available neural networks. One such solution could be to train BERT models for moderation tasks, for example for Reddit posts. The background to this is that many communities on Reddit have started to embed moderation tasks and processing logic into algorithms, which in turn opens the possibility of obtaining targeted training data. But we also have working examples in Wikipedia, where fully autonomous systems work just as well as systems to support moderators. There is also an experiment from the research community that proposes a multi-step neural network architecture for filtering pornographic content.
However, despite the unstoppable growth of automated solutions, partly because of their necessity and partly because of their practicality, there are many open questions about their reliability, the possible bias of the systems, or even the acceptability of the decisions made by such systems. The latter is interesting because big tech companies are usually protective of the exact details of their own solutions. This not only makes transparency and accountability difficult or impossible but also raises serious concerns. Perhaps the most serious of these is the influence that social media companies exert over the expression of opinion on the internet in a way that even states alone may not be able to do.
István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.