Recent black-box counterfactual generation frameworks fail to take into account the semantic content of the proposed edits, while relying heavily on training to guide the generation process. We propose a novel, plug-and-play black-box counterfactual generation framework, which suggests step-by-step edits based on theoretical guarantees of optimal edits to produce human-level counterfactual explanations with zero training. Our framework utilizes a pre-trained image editing diffusion model, and operates without access to the internals of the classifier, leading to an explainable counterfactual generation process. Throughout our experimentation, we showcase the explanatory gap between human reasoning and neural model behavior by utilizing both Convolutional Neural Network (CNN), Vision Transformer (ViT) and Large Vision Language Model (LVLM) classifiers, substantiated through a comprehensive human evaluation.
Results showcasing the various models evaluated. CNNs and ViTs as showcased by the results tend to require a significant amount of steps in order for the label to flip, showcasing that, despite the semantic information being absent from the image, the statistical dependence on unrelated factors, such as color and texture, leads to an unintented bias in their decisions. LVLMs, on the other hand, appeared more semantically consistent, recongizing the change much sooner, leading to higher quality counterfactual images as well.
In this image, we present some results from the counterfactual process on the BDD100K dataset, with the task being flipping the label from "stop" to "go". The edit recommender accurately proposes to remove the car and, thus, force the classifier to recognize this as a "go" instance, given that there are no obstacles or indicators forcing a break. The CNN classifier, however, insists on the "stop" classification and, only after multiple edits, the image is heavily altered, leading to an adversarial example that fits the label. The LVLM classification, however, recognizes the change as soon as the human subjects do as well.
@misc{spanos2025vcecevisualcounterfactualexplanations,
title={V-CECE: Visual Counterfactual Explanations via Conceptual Edits},
author={Nikolaos Spanos and Maria Lymperaiou and Giorgos Filandrianos and Konstantinos Thomas and Athanasios Voulodimos and Giorgos Stamou},
year={2025},
eprint={2509.16567},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.16567},
}
}