Skeletal motion plays a vital role in human activity recognition as either an
independent data source or a complement. The robustness of skeleton-based
activity recognizers has been questioned recently, which shows that they are
vulnerable to adversarial attacks when the full-knowledge of the recognizer is
accessible to the attacker. However, this white-box requirement is overly
restrictive in most scenarios and the attack is not truly threatening. In this
paper, we show that such threats do exist under black-box settings too. To this
end, we propose the first black-box adversarial attack method BASAR. Through
BASAR, we show that adversarial attack is not only truly a threat but also can
be extremely deceitful, because on-manifold adversarial samples are rather
common in skeletal motions, in contrast to the common belief that adversarial
samples only exist off-manifold. Through exhaustive evaluation and comparison,
we show that BASAR can deliver successful attacks across models, data, and
attack modes. Through harsh perceptual studies, we show that it achieves
effective yet imperceptible attacks. By analyzing the attack on different
activity recognizers, BASAR helps identify the potential causes of their
vulnerability and provides insights on what classifiers are likely to be more
robust against attack. Code is available at
https://github.com/realcrane/BASAR-Black-box-Attack-on-Skeletal-Action-Recognition.Comment: Accepted in CVPR 202