Beyond the Hype: Building AI That Actually Runs (Part I.)
From “Smart Search” to Modern AI Systems
In this two-part series, we use our own professional journey as a case study to trace the evolution of AI applications. We ask a practical question, how did AI become genuinely useful in practice, especially in fields such as LegalTech, where reliability, smooth integration into existing workflows, and consistent performance over time matter as much as technical novelty Through our experiences, particularly the practical lessons learned during the development of our RAG-based system, DOCUTENT, we show how technology matures from an experimental solution into a stable part of professional infrastructure. This retrospective therefore focuses on the role of engineering rigor and operational discipline in turning technical promise into lasting practical value.
The incorporation of Artificial Intelligence (AI) first took shape in professional practice through a simpler challenge – making information retrievable. Long before today’s generative systems came to dominate public discussion, institutions were already trying to solve the problem of how to find the right document or piece of knowledge when it was needed.

That problem was particularly acute in legal environments. A relevant document prior filing or internal memorandum might exist somewhere in the system, yet remain invisible at the moment of decision. The real institutional foundations of AI were laid by systems that addressed exactly this difficulty: they transformed dispersed and unstructured document collections into searchable knowledge. In that sense, the early history of applied AI is inseparable from the history of search.
Although our current solutions have become language-agnostic, our own development path began from a specific and demanding starting point: Hungarian language. In English-language environments, early retrieval systems could build to some extent on the comparatively simpler morphology of the language. In Hungarian, by contrast, the same conceptual term may appear in many different grammatical forms, and a system that relies only on surface-level matching will miss a substantial share of relevant results. For legal search, this is not a minor inconvenience; it affects whether a document set is professionally trustworthy at all.

Computers do not perceive words as jurists do. They process sequences of characters. Two forms of the same word may therefore appear entirely unrelated to a machine unless the system is equipped with explicit linguistic rules. In agglutinative languages, this difficulty is especially pronounced. A single stem may generate dozens, sometimes hundreds, of forms through suffixes and inflection. For that reason, stemming and lemmatization were preconditions of reliable retrieval. They enabled the system to search for underlying lexical units rather than for exact letter sequences alone. In practical terms, a legal researcher will not miss a relevant document merely because the searched concept appeared in a different grammatical form.
Comparable developments took place in well-known international legal information systems, including Westlaw and LexisNexis. Their developers also recognized that professional users did not need literal text matching but retrieval that reflected legal meaning and relevance. The underlying institutional premise was the same across jurisdictions: the problem was the difficulty of accessing information in a practical, easy-to-search form.
The task of early systems was therefore to consolidate documents scattered across network drives and internal repositories into a common knowledge base. At this stage, the goal was not to generate answers or new text. The technology was designed to build a reliable digital index, that is, a structured, machine-readable map of documents and their key attributes. That required the systematic capture of metadata which includes the author, the date of creation, the subject matter, and how a document relates to other materials. Search became dependable once the system could work with those structured signals rather than with raw text alone. Our own development path closely followed this logic.
Consider a simple legal search problem where the user is looking for a topic and for every document that refers to a particular case number, statutory provision, judge, or authority. Full-text indexing alone is often insufficient for that task. What made the next stage of development significant was the system’s growing ability to detect such legally salient elements within unstructured prose.

This is where methods, like Named Entity Recognition (NER) became particularly important. NER enabled the automatic detection of proper names, dates, case numbers, statutory references, and other distinctive markers embedded in the text. The point, again, was not generation, but knowledge extraction and metadata enrichment. Once those elements could be identified systematically, the system could do more than retrieve isolated documents: it could trace references between them, connect related materials, and support more sophisticated forms of search. In a legal context, that is already a substantial gain in professional usefulness.
As these systems came to serve collections containing millions, and in some cases hundreds of millions of documents, the technology ceased to be an experimental tool and became an infrastructure. At that scale, the question is no longer simply whether the algorithm works but whether the system remains available, responsive, and stable under sustained institutional use. In a court, government agency, or regulatory body, a search system that works impressively most of the time is not enough. It always has to work reliably.
That is also the point at which the distinction between demo and a real product becomes unmistakable. A prototype may work well with a limited corpus –that is, a collection of documents the system is trained on or tested against–and carefully prepared queries. A production system must perform for large numbers of users, across uneven data, every day. Many promising AI projects fail precisely at this stage. Why? The underlying idea is sound, but the surrounding engineering is not sufficient for sustained operation. Productization is the work closing that gap between the idea and the operating product.
One of the most important lessons of this development process is that AI must be treated as part of the broader IT environment. That requires standardized interfaces, such as OpenAPI-defined services where appropriate, deployment methods that make systems easier to run consistently across environments, and observability in live operation. These may appear to be secondary engineering concerns, but in practice they determine whether a system can be integrated into existing institutional workflows, maintained over time, and governed responsibly. Containerization (i.e. running the system in a standardized packaged environment) helps make dependencies and runtime conditions more consistent across environments, while observability makes performance, failures, and costs visible in time for corrective action. The trajectory may begin with “smart search,” but durable success depends on integration, reproducibility, observability, and engineering discipline.
Daniel Nagy is Interim Director of AI Enablement and Head of Docutent Division at GriffSoft Zrt. Previously, he led software development at MONTANA Knowledge Management. His main interests include AI-powered knowledge management, natural language processing, semantic search, and the development of production-ready AI products and document intelligence systems.