
Making the Invisible Visible: Explainable AI in Everyday Life (Part II.)
As AI systems increasingly influence real-life decisions, interpretability is no longer just a technical concern, it’s a societal and legal imperative. In this post, we explore SHAP, a widely used method that reveals how different features contribute to a model’s prediction. But how far does this bring us toward true transparency? We take a closer look at what SHAP can and cannot explain.
SHAP (stands for SHapley Additive exPlanations) is a method that brings the concept of Shapley values, originally developed in game theory, into the realm of machine learning. Shapley values were initially used to fairly allocate credit among participants who contributed to a joint outcome. Each participant received a share proportionate to their contribution to the overall result. These values are calculated by considering every possible ordering and combination of participants, offering a precise picture of each individual’s impact on the group’s performance.
SHAP applies this principle to the features used in a machine learning model, like age, income, or length of employment, and determines how much each feature contributes to a specific prediction. In text classification tasks, the features might be individual words. In such cases, SHAP can highlight, for example, how the presence of a particular word pushes the model’s decision in one direction or the other. If the model is performing a binary readability classification (yes / no), the presence of a complex or technical term might nudge the model toward labeling the text as less comprehensible. SHAP calculates how the model’s prediction would change if a given feature were removed or added and does so by considering all possible combinations. This approach ensures that the resulting explanations are balanced and consistently reflect the model’s behavior. Returning to the previous example, SHAP essentially reveals how much each individual word contributes to the final decision made by the model.
This type of explanation is known as a local explanation. It sheds light on the reasoning behind a single, specific decision. Local explanations are particularly valuable when we need to justify a model’s output to an individual or an organization in an understandable way. Why was a loan application denied, why was a person placed in a different risk category, or why the AI recommended a particular medical treatment. The same logic applies to tasks like text classification, where the outcome may hinge on the presence of specific words. A technical term that makes a text harder to understand, according to the model, can influence the prediction in much the same way that an income figure might affect a credit evaluation. SHAP helps in this context by showing which features influenced the outcome and to what extent. It presents the prediction relative to a so-called baseline value (typically the model’s average output) and indicates how each feature shifts the prediction above or below this baseline. This reveals the structure of the decision at an individual level. Such visualization not only makes the model’s reasoning more interpretable, but can also help detect potential errors, biases, or unexpected patterns in the decision-making process.
However, SHAP is not only useful for understanding individual decisions. It can also provide insights at a global level, helping us see which features the model generally considers important. Global SHAP values capture the average impact of each feature across all predictions. This is especially relevant when auditing a model: Are there hidden biases? Are discriminatory patterns present? Does the model align with the expected professional or logical structure? One common approach is to rank the average absolute SHAP values of each feature. This produces a kind of importance ordering, helping us understand the overall patterns in the model’s behavior.
One of SHAP’s major strengths is that it is model-agnostic: in theory, it can be applied to any type of machine learning model, whether it’s a decision tree, logistic regression, or a deep neural network. However, it’s important to note that calculating Shapley values is computationally intensive. For example, a model with ten input features requires evaluating over a thousand feature combinations. In practice, therefore, approximation methods or optimized algorithms are typically used.
SHAP explanations can also be visualized, which greatly enhances their interpretability. One of the most well-known and informative visualizations is the beeswarm plot, which illustrates how the method works in practice. On the horizontal axis, SHAP values indicate whether a given feature pushed the prediction higher or lower relative to the model’s baseline output. The color of each dot reflects the original value of the feature; red for high values, blue for low ones (e.g., population size). The combination of color and position provides a clear picture of how different values of a feature influenced the model’s predictions. From these patterns, we can infer whether high values tend to have a positive or negative effect. If the red dots (high values) mostly appear on the positive side, that suggests the feature tends to increase the prediction. If they cluster on the negative side, the opposite is true. For example, if the model is classifying texts by readability, and one of the features is the presence of technical terms, the beeswarm plot can help highlight which words played the most influential role in shaping the model’s decisions.

(Source: SHAP documentation)
The above figure serves as an illustration and is based on the predictions of a model trained on the California Housing dataset, where the goal was to estimate property prices using available features. These features include, for example: MedInc (median income in the area), AveOccup (average number of occupants per household), as well as Longitude and Latitude (geographical location). Other relevant variables shown in the plot include AveRooms, AveBedrms, HouseAge, and Population.
As mentioned earlier, the horizontal axis represents SHAP values, indicating how much and in what direction a given feature influenced a specific price estimate. The color of each point reflects the original value of the feature: blue for low, red for high. For instance, high median income (red dots in the MedInc row) is associated with positive SHAP values, meaning it increases the predicted housing price. In contrast, a high number of occupants (red dots in the AveOccup row) tend to receive negative SHAP values, thus reducing the estimated price.
It is important, however, to keep in mind the limitations of such explanation methods. They can become less reliable when strong correlations or interactions exist among input features, making it difficult to isolate the individual contribution of each variable. SHAP, for example, shows which features influenced a particular decision and by how much, but it is not suited for answering “what if” questions, for instance, how the outcome would change if a feature had a different value. Addressing such counterfactual scenarios requires different methods entirely.
In and of itself, no single explainable AI method can fully solve the transparency problem. Still, each contributes to making machine learning decisions more interpretable. As automated decision-making becomes more widespread, the need for approaches that are not only technically sound but also understandable to humans is growing. This is a direction worth pursuing more broadly: developing tools that can bridge the technical depth of machine learning models with the human demand for interpretability.
István ÜVEGES, PhD is a Computational Linguist researcher and developer at MONTANA Knowledge Management Ltd. and a researcher at the HUN-REN Centre for Social Sciences. His main interests include the social impacts of Artificial Intelligence (Machine Learning), the nature of Legal Language (legalese), the Plain Language Movement, and sentiment- and emotion analysis.