Submission
Privacy Policy
Code of Ethics
Newsletter

OSINT and AI: Possibilities and Drawbacks

Open-Source Intelligence (OSINT) is an intelligence method that involves collecting, evaluating, and analyzing publicly available data to answer a specific intelligence question. Although intelligence agencies were already involved in this kind of activity in the last century, the emergence of Artificial Intelligence has also transformed this field significantly in recent times. In this paper, we look at what exactly Open-Source Intelligence is, what methods it uses, how it exploits AI-based technologies, and what questions it is best suited to answer.

The (very) short history of OSINT [1]

Surprisingly, the roots of OSINT go back centuries, long before the age of the internet. One of the earliest examples of OSINT dates to the 19th century, when both the Union and Confederacy used newspapers and other public documents to gather information on enemy troop movements, supply lines, and morale during the American Civil War. This information was vital in planning military strategies and anticipating enemy actions.

In the 20th century, during the Second World War, the establishment of the BBC Monitoring Service was a major milestone in the history of OSINT. This service monitored and analyzed enemy radio transmissions, providing valuable intelligence information to the British government. OSINT clearly went beyond the mere collection of documents and press products during this period, including the monitoring of radio transmissions and other channels of communication. The OSS (Office of Strategic Services), considered the predecessor of the CIA in the United States, also employed an entire branch specifically dedicated to Open-Source Intelligence during the Second World War. This sector collects and monitors newspapers, magazines, and radio broadcasts around the world, looking for photos and articles that could give it important intelligence information.

After its initial success, OSINT lost its importance until the 21st century, seemingly taking a back seat to other methods. For a while, this was not even changed by the advent of the internet, which gave analysts access to a flow of information from all over the world. Nowadays, social media, online publications, and databases provide real-time information that can also be a key element in information analysis. This potential was demonstrated first by the events of the 2009 Green Revolution in Iran.

Then, in protest of the regime, millions of young Iranians flooded the internet to coordinate their activities, share viral content, and encourage others to join the campaign. It was the first time that the internet was flooded with citizen information about a major political event. This was, of course, made possible by a combination of smartphones, the internet, and social media. For example, 60% of the blog links posted on Twitter during the first week of the protests were about Iranian politics. This kind of increased online presence and activism has given access to a whole new repository of freely available but strategically important information. At the same time, of course, it has highlighted the value of such information and the importance of technologies that can process it.

OSINT vs. traditional intelligence

The above suggests that the main difference between open source and classical intelligence is the data sources available. OSINT can only rely on information that has been deliberately made public by its owners. This contrasts sharply with traditional intelligence, which may use, for example, interception (signals intelligence – SIGINT), information from human sources (human intelligence – HUMINT), or even analysis of satellite imagery (imagery intelligence – IMINT). For obvious reasons, this information is often not public and may require special licenses or technologies to access.

This difference is of course reflected not only in the sources of data collection but also in its methods and purposes. OSINT’s methods mostly include web search, social media analysis, data mining, and natural language processing. Traditional intelligence methods are much more diverse. They also include conducting covert operations, physical surveillance, and establishing personal contacts with potential sources of information. These methods are often more time-consuming and costly than those used for OSINT.

It is also important to note that OSINT can be used in a wide range of areas, including national security, law enforcement, corporate intelligence, market research, journalism, and even to answer specific research questions. In contrast, traditional intelligence is mostly focused on government or military purposes.

In terms of costs and resources, OSINT is relatively cost-effective as it does not require expensive technologies or specialized staff to access publicly available data. In contrast, traditional intelligence methods, such as SIGINT and IMINT, require significant resources, including advanced equipment and highly trained operators.

Artificial Intelligence and OSINT

As in many other areas, Artificial Intelligence (AI)-based tools now play a key role in Open-Source Intelligence. AI-based tools have changed the way OSINT processes are mostly related to the way data is collected, analyzed, and synthesized throughout the entire lifecycle (preparation, collection, processing, analysis, dissemination).

The most important AI technologies used in today’s modern OSINT solutions come mainly from the fields of natural language processing (NLP), image and video analysis, and robotic data collection (web crawling).

NLP applications allow algorithms or machine learning models to interpret human language use. This is particularly useful for analyzing social media and other data-rich sources. For example, AI tools that use NLP can analyze social media posts, blogs, and news articles to identify trends, public opinion, or views on certain topics.

In the field of image analysis, AI can detect and categorize objects, faces, and patterns in the images and videos being processed. This capability is particularly useful in the areas of law enforcement and intelligence, where the analysis of images and videos is key. As an example, image analysis tools using AI can help identify suspicious activities or locations, even supporting investigative work.

Web crawling robots, or bots can automate web scraping by continuously browsing the content available on the internet. One such tool is Photon Scanner, which is available to anyone and allows the collection, filtering, and automated downloading of web URLs. This data can then be collected and processed for further analysis. OSINT tools that use bots therefore can automate the collection of information about companies, individuals, or specific events, thus speeding up and simplifying the data collection process.

It may seem marginal today, but the proliferation of synthetic content may also lead to the emergence of OSINT tools that can separate artificially generated or manipulated content from the “original”. Given that insights generated by open-source intelligence are worth exactly as much as they were generated from “clean” data, this is expected to be appreciated significantly soon.

As can be seen from the above, thanks to the impact of AI, OSINT is no longer just a supplementary source of information, but a vital, stand-alone data collection and analysis method. Among other things, its application significantly extends the scope and depth of information collection. The continued development of AI and the integration of new technologies into OSINT tools is expected to further enhance the role and effectiveness of OSINT in the future.


István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.


[1] See more on the topic:  https://www.tandfonline.com/doi/full/10.1080/16161262.2023.2224091