Convolutional neural networks (CNNs) are fragile to small perturbations in
the input images. These networks are thus prone to malicious attacks that
perturb the inputs to force a misclassification. Such slightly manipulated
images aimed at deceiving the classifier are known as adversarial images. In
this work, we investigate statistical differences between natural images and
adversarial ones. More precisely, we show that employing a proper image
transformation and for a class of adversarial attacks, the distribution of the
leading digit of the pixels in adversarial images deviates from Benford's law.
The stronger the attack, the more distant the resulting distribution is from
Benford's law. Our analysis provides a detailed investigation of this new
approach that can serve as a basis for alternative adversarial example
detection methods that do not need to modify the original CNN classifier
neither work on the raw high-dimensional pixels as features to defend against
attacks