V-CECE: Visual Counterfactual Explanations via Conceptual Edits

Nikolaos Spanos (Scholar), Maria Lymperaiou (Scholar), Giorgos Filandrianos (Scholar), Konstantinos Thomas (Scholar), Athanasios Voulodimos (Scholar), Giorgos Stamou (Scholar)
AILS, National Technical University of Athens

Overview

Recent black-box counterfactual generation frameworks fail to take into account the semantic content of the proposed edits, while relying heavily on training to guide the generation process. We propose a novel, plug-and-play black-box counterfactual generation framework, which suggests step-by-step edits based on theoretical guarantees of optimal edits to produce human-level counterfactual explanations with zero training. Our framework utilizes a pre-trained image editing diffusion model, and operates without access to the internals of the classifier, leading to an explainable counterfactual generation process. Throughout our experimentation, we showcase the explanatory gap between human reasoning and neural model behavior by utilizing both Convolutional Neural Network (CNN), Vision Transformer (ViT) and Large Vision Language Model (LVLM) classifiers, substantiated through a comprehensive human evaluation.

Model Outline

Outline of our model

The outline of our model. Edits are proposed through extraction from the graph and their order is evaluated on suggestions from an LVLM model, from the importance metric, or a balanced approach. The edits are performed and evaluated iteratively by passing the through the classifier and reviewing the output. If the classification does not change, we repeat the loop. Thus, we find the active minimal edit set required for flipping the class label.

Results on Various Models

Outline of our model

Results showcasing the various models evaluated. CNNs and ViTs as showcased by the results tend to require a significant amount of steps in order for the label to flip, showcasing that, despite the semantic information being absent from the image, the statistical dependence on unrelated factors, such as color and texture, leads to an unintented bias in their decisions. LVLMs, on the other hand, appeared more semantically consistent, recongizing the change much sooner, leading to higher quality counterfactual images as well.

Qualitative Results

Outline of our model

In this image, we present some results from the counterfactual process on the BDD100K dataset, with the task being flipping the label from "stop" to "go". The edit recommender accurately proposes to remove the car and, thus, force the classifier to recognize this as a "go" instance, given that there are no obstacles or indicators forcing a break. The CNN classifier, however, insists on the "stop" classification and, only after multiple edits, the image is heavily altered, leading to an adversarial example that fits the label. The LVLM classification, however, recognizes the change as soon as the human subjects do as well.

Citation


     @misc{spanos2025vcecevisualcounterfactualexplanations,
      title={V-CECE: Visual Counterfactual Explanations via Conceptual Edits}, 
      author={Nikolaos Spanos and Maria Lymperaiou and Giorgos Filandrianos and Konstantinos Thomas and Athanasios Voulodimos and Giorgos Stamou},
      year={2025},
      eprint={2509.16567},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.16567}, 
}
}