Privacy Policy
Code of Ethics

Language as Evidence: The History and Present of Forensic Linguistics in the Age of AI

Forensic linguistics is a relatively young discipline that focuses specifically on the use of language in legal situations. Its confluence with Artificial Intelligence (AI) can be seen as a branch of LegalTech, which can assist in enforcing the right to a fair trial, but also in the evidentiary process of criminal cases. But what is this discipline and how can it benefit from the latest advances in AI research?


Forensic linguistics is a branch of applied linguistics that involves the application of linguistic knowledge and methods to legal and criminal matters. As a discipline, forensic linguistics is concerned with the analysis of spoken and written language, with the main aim of finding evidence that can then be used in a legal case. Put another way, it is a discipline that deals with all situations where language and law intersect.

As a discipline, forensic linguistics is relatively young. The creation of forensic linguistics as a separate discipline within applied linguistics is often associated with Jan Svartvik. Svartvik, in his 1968 work “The Evans Statements: a Case for Forensic Linguistics”, presented police cases, with reference to the methods used to establish the authorship of certain texts. This was, incidentally, also common practice in the forensic linguistics work of the time.  In his publication, Svartvik presented, for example, the statements made by Timothy Evans in his analysis of a “confession” to the police. Evans was a Welsh man who was found guilty of the murder of his wife and daughter and subsequently sentenced to death in 1950 but was granted a posthumous pardon in 1966.

Svartvik used a combination of qualitative and quantitative analysis to cast doubt on the authorship of documents. He revealed inconsistencies in the testimonies, including strikingly different grammatical structures in certain incriminating passages. In his analysis, he suggested that these linguistic features were indicative of police insertions in the testimony which, without them, would have suggested the suspect’s innocence.

Similar forensic linguistic studies later became widespread among practitioners in the field. The method of linguistic profiling outlined above has subsequently evolved considerably.

Fields and methods of investigation

In the case of one or more incriminated texts, if there is a suspect who is presumed to be the author of the text, a sample can be taken from them to serve as the basis for a comparative text analysis. The fundamental question is then whether the suspect could be the author of the incriminated text. The question can be answered by comparing and analyzing the incriminated text and the sample. In such a case, the investigator will carry out the analysis at all linguistic levels, including the level of the individual’s vocabulary, stable idioms, and grammatical structure.

Similarly, it can be used in cases where there is one or more incriminated texts, but no specific suspect who is presumed to be the author of the text. In this case, we can categorize the author based on his or their language use characteristics, draw conclusions about the possible suspects, and then try to narrow down this suspect pool as much as possible based on the indicators. In effect, this is a way of narrowing down the author from all possible language users.

The above examples represent just one possible application of forensic linguistics. This area is also known as the study of linguistic evidence. This analysis may include, for instance, the identification of authorship, voice identification, and the linguistic analysis of threats or forged documents. Similarly, the language of judicial proceedings may also be subject to analysis. In such cases, from courtroom interactions to police interrogations, linguistic analysis helps to clarify the intent of the communication and ensure a fair trial. When legal documents are examined, legal texts are analyzed and interpreted to ensure that they are understandable and comply with legal norms. This area can therefore overlap with the drafting guidelines that arise in connection with Plain Language, where the aim is to make documents easily understandable to the (often lay) recipients of the text.

Basis—the individual’s (individual) language variant

Of course, the rise of artificial intelligence has not left forensic linguistics untouched. When determining the authorship of documents, it is common practice for the examiner to look for features that indicate the suspect’s individual language variation, or idiolect. In fact, idiolect is a central concept that fundamentally defines the objectives of forensic linguistics.

Idiolect” refers to the individual’s linguistic variety and/or use of language, from the way he or she pronounces certain sounds and sound combinations (phonemes) to the way he or she prefers to express himself or herself in certain conversational situations. In linguistics, the term also refers to the hypothesis that no two people share the same language or have the same linguistic repertoire. Whether we use language in writing or orally, our chosen vocabulary and phrases are shaped by the linguistic influences to which we have been exposed throughout our lives. This depends on factors such as;

  • whether we come from a particular dialect,
  • whether we have a specific vocabulary (sociolect),
  • whether we know foreign languages, which makes us prone to anglicisms,
  • what qualifications we have,
  • what is our usual style of speaking at work,
  • the language our parents taught us to use,
  • what style we use to communicate within groups of friends, and so on.

All the above will somehow leave a mark on the way we choose our words. The sum of these influences is why it can be said that no two people use language in the same way.

It is important to note, of course, that idiolect in linguistics is a term that is mentioned in passing in most introductory linguistics textbooks but is not easily observed or measured in practice, and for which there is little consensus and even less empirical evidence.

Nevertheless, we have all encountered situations in our daily lives where we have come to know someone by or about a typical idiom, so we can say that the existence of an individual language variant is intuitively plausible.

AI and forensic linguistics

Artificial intelligence algorithms are excellent for identifying and learning patterns from data. This can be extremely useful in cases where we want to draw conclusions based on a person’s idiolect, for example, to establish authorship, as mentioned above. It is not surprising, therefore, that in forensic linguistics these skills can also be used to decipher hidden patterns of communication that indicate specific linguistic behavior. For example, artificial intelligence can analyze differences in language use between documents to identify with high accuracy possible forgeries or determine authorship. Machine learning models, particularly those using Natural Language Processing, are trained on huge datasets of documents that together are a reasonably good representation of human language use. Models built in this way can therefore be capable of recognizing everything from common legal terminology to the subtler nuances of individual writing styles.

Forensic linguists often work with extensive datasets, including collections of emails, messages, legal documents, and speech recordings in different dialects and languages. Artificial intelligence can process these data sets much more efficiently than human analysts. For example, AI-powered text analytics tools can quickly sift through thousands of emails to extract relevant legal evidence, identify emotions, or flag unusual communication patterns that may indicate manipulative or deceptive behavior. This level of analysis is crucial in cases such as contract disputes, copyright infringement, or even criminal cases where the intent and meaning behind the written or spoken words are in dispute.

A major advantage of including AI in linguistic analysis is that it can operate without the cognitive biases that humans may carry. AI systems are designed to focus solely on the data and apply the same rules and learning criteria uniformly to all the content they analyze. This objective approach helps reduce errors due to human subjectivity. For example, when attributing authorship, AI can impartially compare the document in question to a corpus of known works, using statistical and probabilistic models to determine linguistic similarities and differences.

The impact of AI is also notable in real-time applications such as the translation and interpretation of legal proceedings, which is invaluable in a multilingual legal environment. This is particularly important as forensic linguistics also has a major role to play in ensuring fair proceedings. AI-driven translation systems are increasingly used in international courts and immigration negotiations, where they help to break down language barriers, enabling clearer communication and understanding between the different parties involved. These systems use advanced NLP techniques to provide accurate, context-aware translations that are essential for fair legal proceedings.

The role of AI in improving linguistic analysis in forensic linguistics represents a significant advance in the field of Legal Technologies (LegalTech). By automating complex analyses, providing the tools to efficiently manage large volumes of data, and improving the accuracy of language assessments, AI supports a stronger and fairer legal system. It is a fact that nowadays forensic linguistics is not yet an integral part of, for example, the evidentiary procedure. However, thanks to increasingly advanced technology, its role may increase drastically in our increasingly digitized world.

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.

Print Friendly, PDF & Email