359 research outputs found
RobCaps: Evaluating the Robustness of Capsule Networks against Affine Transformations and Adversarial Attacks
Capsule Networks (CapsNets) are able to hierarchically preserve the pose
relationships between multiple objects for image classification tasks. Other
than achieving high accuracy, another relevant factor in deploying CapsNets in
safety-critical applications is the robustness against input transformations
and malicious adversarial attacks.
In this paper, we systematically analyze and evaluate different factors
affecting the robustness of CapsNets, compared to traditional Convolutional
Neural Networks (CNNs). Towards a comprehensive comparison, we test two CapsNet
models and two CNN models on the MNIST, GTSRB, and CIFAR10 datasets, as well as
on the affine-transformed versions of such datasets. With a thorough analysis,
we show which properties of these architectures better contribute to increasing
the robustness and their limitations. Overall, CapsNets achieve better
robustness against adversarial examples and affine transformations, compared to
a traditional CNN with a similar number of parameters. Similar conclusions have
been derived for deeper versions of CapsNets and CNNs. Moreover, our results
unleash a key finding that the dynamic routing does not contribute much to
improving the CapsNets' robustness. Indeed, the main generalization
contribution is due to the hierarchical feature learning through capsules.Comment: To appear at the 2023 International Joint Conference on Neural
Networks (IJCNN), Queensland, Australia, June 202
SeVuc: A study on the Security Vulnerabilities of Capsule Networks against adversarial attacks
Capsule Networks (CapsNets) preserve the hierarchical spatial relationships between objects, and thereby bear the potential to surpass the performance of traditional Convolutional Neural Networks (CNNs) in performing tasks like image classification. This makes CapsNets suitable for the smart cyber-physical systems (CPS), where a large amount of training data may not be available. A large body of work has explored adversarial examples for CNNs, but their effectiveness on CapsNets has not yet been studied systematically. In our work, we perform an analysis to study the vulnerabilities in CapsNets to adversarial attacks. These perturbations, added to the test inputs, are small and imperceptible to humans, but can fool the network to mispredict. We propose a greedy algorithm to automatically generate imperceptible adversarial examples in a black-box attack scenario. We show that this kind of attacks, when applied to the German Traffic Sign Recognition Benchmark and CIFAR10 datasets, mislead CapsNets in making a correct classification, which can be catastrophic for smart CPS, like autonomous vehicles. Moreover, we apply the same kind of adversarial attacks to a 5-layer CNN (LeNet), to a 9-layer CNN (VGGNet), and to a 20-layer CNN (ResNet), and analyze the outcome, compared to the CapsNets, to study their different behaviors under the adversarial attacks
TextCaps : Handwritten Character Recognition with Very Small Datasets
Many localized languages struggle to reap the benefits of recent advancements
in character recognition systems due to the lack of substantial amount of
labeled training data. This is due to the difficulty in generating large
amounts of labeled data for such languages and inability of deep learning
techniques to properly learn from small number of training samples. We solve
this problem by introducing a technique of generating new training samples from
the existing samples, with realistic augmentations which reflect actual
variations that are present in human hand writing, by adding random controlled
noise to their corresponding instantiation parameters. Our results with a mere
200 training samples per class surpass existing character recognition results
in the EMNIST-letter dataset while achieving the existing results in the three
datasets: EMNIST-balanced, EMNIST-digits, and MNIST. We also develop a strategy
to effectively use a combination of loss functions to improve reconstructions.
Our system is useful in character recognition for localized languages that lack
much labeled training data and even in other related more general contexts such
as object recognition
An Evasion Attack against Stacked Capsule Autoencoder
Capsule network is a type of neural network that uses the spatial
relationship between features to classify images. By capturing the poses and
relative positions between features, its ability to recognize affine
transformation is improved, and it surpasses traditional convolutional neural
networks (CNNs) when handling translation, rotation and scaling. The Stacked
Capsule Autoencoder (SCAE) is the state-of-the-art capsule network. The SCAE
encodes an image as capsules, each of which contains poses of features and
their correlations. The encoded contents are then input into the downstream
classifier to predict the categories of the images. Existing research mainly
focuses on the security of capsule networks with dynamic routing or EM routing,
and little attention has been given to the security and robustness of the SCAE.
In this paper, we propose an evasion attack against the SCAE. After a
perturbation is generated based on the output of the object capsules in the
model, it is added to an image to reduce the contribution of the object
capsules related to the original category of the image so that the perturbed
image will be misclassified. We evaluate the attack using an image
classification experiment, and the experimental results indicate that the
attack can achieve high success rates and stealthiness. It confirms that the
SCAE has a security vulnerability whereby it is possible to craft adversarial
samples without changing the original structure of the image to fool the
classifiers. We hope that our work will make the community aware of the threat
of this attack and raise the attention given to the SCAE's security
Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey
While Deep Neural Networks (DNNs) achieve state-of-the-art results in many
different problem settings, they are affected by some crucial weaknesses. On
the one hand, DNNs depend on exploiting a vast amount of training data, whose
labeling process is time-consuming and expensive. On the other hand, DNNs are
often treated as black box systems, which complicates their evaluation and
validation. Both problems can be mitigated by incorporating prior knowledge
into the DNN.
One promising field, inspired by the success of convolutional neural networks
(CNNs) in computer vision tasks, is to incorporate knowledge about symmetric
geometrical transformations of the problem to solve. This promises an increased
data-efficiency and filter responses that are interpretable more easily. In
this survey, we try to give a concise overview about different approaches to
incorporate geometrical prior knowledge into DNNs. Additionally, we try to
connect those methods to the field of 3D object detection for autonomous
driving, where we expect promising results applying those methods.Comment: Survey Pape
Hybrid Gromov-Wasserstein Embedding for Capsule Learning
Capsule networks (CapsNets) aim to parse images into a hierarchy of objects,
parts, and their relations using a two-step process involving part-whole
transformation and hierarchical component routing. However, this hierarchical
relationship modeling is computationally expensive, which has limited the wider
use of CapsNet despite its potential advantages. The current state of CapsNet
models primarily focuses on comparing their performance with capsule baselines,
falling short of achieving the same level of proficiency as deep CNN variants
in intricate tasks. To address this limitation, we present an efficient
approach for learning capsules that surpasses canonical baseline models and
even demonstrates superior performance compared to high-performing convolution
models. Our contribution can be outlined in two aspects: firstly, we introduce
a group of subcapsules onto which an input vector is projected. Subsequently,
we present the Hybrid Gromov-Wasserstein framework, which initially quantifies
the dissimilarity between the input and the components modeled by the
subcapsules, followed by determining their alignment degree through optimal
transport. This innovative mechanism capitalizes on new insights into defining
alignment between the input and subcapsules, based on the similarity of their
respective component distributions. This approach enhances CapsNets' capacity
to learn from intricate, high-dimensional data while retaining their
interpretability and hierarchical structure. Our proposed model offers two
distinct advantages: (i) its lightweight nature facilitates the application of
capsules to more intricate vision tasks, including object detection; (ii) it
outperforms baseline approaches in these demanding tasks
- …