Submission
Privacy Policy
Code of Ethics
Newsletter

From Phantom Citations to Prompt Injection: The Crisis of Trust in Science in the Age of Generative AI – Part II.

Alongside the changes on the submission side, the influence of generative AI tools is becoming increasingly visible in the peer review process as well. This trend continues even though many journals and publishers have issued guidelines urging caution, especially when it comes to uploading manuscripts to external tools or handling confidential content. In some cases, they explicitly prohibit the use of such tools during the review process.

Given the previously mentioned asymmetry (growing submission volumes without a proportional increase in review capacity) it is perhaps no surprise that reviewers are turning to time-saving tools. These may include asking for a summary of the manuscript, generating a checklist, running a language check, or drafting a review outline with AI assistance. A survey published at the end of 2025, involving 1,600 researchers, found that more than half of respondents had already used AI tools when preparing a peer review. Many of them reported increasing their use of such tools in the past year.

Overall, a clear trend is emerging. The use of AI-assisted review practices is spreading faster than the development of unified rules and reliable oversight mechanisms.

When automation becomes part of the peer review process, the potential for manipulation comes with it. A striking example of this is the discovery of hidden instructions in several preprints that specifically target AI-based reviewers powered by large language models. In a typical case, the manuscript contains “invisible” prompts near the beginning, formatted in small font and white text. These include instructions such as ignoring all previous instructions and giving the manuscript a positive review. Human reviewers are unlikely to notice them, but language models may interpret them as commands during processing.

This tactic is a specific instance of prompt injection. The idea is to embed a hidden instruction within external content, in the hope that the model will treat it as a legitimate command on par with the user’s original intent. This can divert the model’s behavior from what the user expects. In recent months, the concept of prompt injection has received increased attention, mainly because of its implications in agentic web-browsing systems. In these systems, the model does not just read content, but can also act on behalf of the user. Hidden instructions from any web source can enter the context and influence actions. The hidden prompts found in preprints, however, show that this risk is not limited to browsing agents. The same core mechanism may affect scientific evaluation when part of the review process relies on text generation or automated summarization. In such cases, the protective role of peer review is not compromised because the model is flawed. It is compromised because the system allows the content being evaluated (i.e. the manuscript itself) to issue instructions that influence the outcome.

These hidden prompts clearly illustrate how vulnerable the evaluation process becomes when it relies on automated systems at any point. In such cases, the manuscript does not simply serve as the object of evaluation. In a strange reversal, it can also influence the course of the review.

Up to this point, we have examined how concrete problems are reshaping scientific discourse on a structural level. Less attention has been given, however, to why the generative AI tools at the center of today’s hype cycle are so unreliable when it comes to evaluating creative work or performing genuinely creative tasks such as research. The most important issue relates to a phenomenon previously described in the literature as “model collapse”, which a recent experiment published in January 2026 further supports. In this experiment, researchers created a feedback loop between two generative systems: one that produces images from text (text-to-image) and one that generates text from images (image-to-text). The output of the first model became the input for the second, and the resulting text was then used as a new prompt for the image model, forming a continuous cycle.

The results were unambiguous. After only a few iterations, the outputs of both models began to converge toward generic patterns. Any unique detail, creative element, or variation quickly disappeared. What remained was essentially the bland and predictable average of many once-distinct concepts.

Anyone who has used a generative text model like ChatGPT will likely find this phenomenon familiar. The responses often feel like the average of many possible approaches and ideas. In coding tasks, the output may be correct. In text generation, however, it tends to be overly generic, frequently shallow, and lacking in deeper insight or logical nuance. This is essentially the same phenomenon that the experiment made visible through iteration. But in fact, it is already present from the outset in how today’s large language models behave.

The lesson from this experiment is not that artificial intelligence is “stupid” in the human sense. Rather, it is that despite being trained on an enormous amount of data, these models are only able to preserve certain types of meaning with any consistency. Moreover, this homogenizing effect can emerge during normal use. It does not require the model to retrain itself on its own outputs. The loss of diversity and depth can happen simply through repeated generation and reformulation.

Returning to the original question, the debate is no longer just about whether a few bad papers slip into the scientific record. The real concern is how the combination of incentive structures and high-speed text production is shaping the entire academic ecosystem. In this environment, unintentional blandness and intentional misuse often travel through the same channels. Publishing as a service, and the rise of paper mills, thrive precisely where quantitative metrics dominate and the costs of quality control are not borne by those who profit. A 2022 study examined papers retracted for being produced by paper mills and found that these articles were often cited and integrated into other work even after they had been withdrawn. This highlights a key point raised earlier. The damage caused by low-quality or fraudulent content is not immediate, not always visible, and most importantly, it is very difficult to reverse.

In such cases, the most obvious and convenient response is often to say that shallow or even outright unreliable texts simply need to be detected. The problem is that detection, at least with the technology currently available, rests on uncertain ground. There have already been widely used tools that were eventually discontinued due to their limitations in accuracy. Detection tools also find themselves in direct competition with text generators, creating a dynamic that resembles an arms race. In the case of a false positive—when a detector mistakenly labels a human-written text as AI-generated—the reputation of the author can suffer irreparable harm. In the case of a false negative, the error remains in the system, and its consequences may continue to spread unnoticed.

A more effective defense lies in reinforcing the classic norms of scientific practice. These standards have always existed, but now they need to be applied with far greater consistency. The key principle is verifiability. Citations must be traceable. Access to data and code should be provided wherever reasonably possible. The origin and creation process of figures must be clearly explained. Methodological descriptions should be detailed enough for a competent reader to follow exactly what was done.

This brings the broader picture into focus. Generative tools have lowered the cost of producing smooth and well-formed text. At the same time, publication incentives in many areas still reward quantity over quality. Peer review capacity has not increased in proportion, which introduces the temptation of automation and creates opportunities for manipulation. Meanwhile, generated texts tend to drift toward more conventional and generic ideas. This leads not only to quality issues but also to a gradual dulling of academic discourse and a loss of meaningful intellectual engagement.


István ÜVEGES, PhD is a Computational Linguist researcher and developer at GriffSoft Ltd. and a researcher at the ELTE Centre for Social Sciences. His main interests include the social impacts of Artificial Intelligence (Machine Learning), the nature of Legal Language (legalese), the Plain Language Movement, and sentiment- and emotion analysis.

Constitutional Discourse
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.