In recent years, privacy-preserving methods for deep learning have become an
urgent problem. Accordingly, we propose the combined use of federated learning
(FL) and encrypted images for privacy-preserving image classification under the
use of the vision transformer (ViT). The proposed method allows us not only to
train models over multiple participants without directly sharing their raw data
but to also protect the privacy of test (query) images for the first time. In
addition, it can also maintain the same accuracy as normally trained models. In
an experiment, the proposed method was demonstrated to well work without any
performance degradation on the CIFAR-10 and CIFAR-100 datasets