218 research outputs found
Deep Anchored Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have been proven to be extremely
successful at solving computer vision tasks. State-of-the-art methods favor
such deep network architectures for its accuracy performance, with the cost of
having massive number of parameters and high weights redundancy. Previous works
have studied how to prune such CNNs weights. In this paper, we go to another
extreme and analyze the performance of a network stacked with a single
convolution kernel across layers, as well as other weights sharing techniques.
We name it Deep Anchored Convolutional Neural Network (DACNN). Sharing the same
kernel weights across layers allows to reduce the model size tremendously, more
precisely, the network is compressed in memory by a factor of L, where L is the
desired depth of the network, disregarding the fully connected layer for
prediction. The number of parameters in DACNN barely increases as the network
grows deeper, which allows us to build deep DACNNs without any concern about
memory costs. We also introduce a partial shared weights network (DACNN-mix) as
well as an easy-plug-in module, coined regulators, to boost the performance of
our architecture. We validated our idea on 3 datasets: CIFAR-10, CIFAR-100 and
SVHN. Our results show that we can save massive amounts of memory with our
model, while maintaining a high accuracy performance.Comment: This paper is accepted to 2019 IEEE/CVF Conference on Computer Vision
and Pattern Recognition Workshops (CVPRW
Do Deep Neural Networks Suffer from Crowding?
Crowding is a visual effect suffered by humans, in which an object that can
be recognized in isolation can no longer be recognized when other objects,
called flankers, are placed close to it. In this work, we study the effect of
crowding in artificial Deep Neural Networks for object recognition. We analyze
both standard deep convolutional neural networks (DCNNs) as well as a new
version of DCNNs which is 1) multi-scale and 2) with size of the convolution
filters change depending on the eccentricity wrt to the center of fixation.
Such networks, that we call eccentricity-dependent, are a computational model
of the feedforward path of the primate visual cortex. Our results reveal that
the eccentricity-dependent model, trained on target objects in isolation, can
recognize such targets in the presence of flankers, if the targets are near the
center of the image, whereas DCNNs cannot. Also, for all tested networks, when
trained on targets in isolation, we find that recognition accuracy of the
networks decreases the closer the flankers are to the target and the more
flankers there are. We find that visual similarity between the target and
flankers also plays a role and that pooling in early layers of the network
leads to more crowding. Additionally, we show that incorporating the flankers
into the images of the training set does not improve performance with crowding.Comment: CBMM mem
Foveation-based Mechanisms Alleviate Adversarial Examples
We show that adversarial examples, i.e., the visually imperceptible
perturbations that result in Convolutional Neural Networks (CNNs) fail, can be
alleviated with a mechanism based on foveations---applying the CNN in different
image regions. To see this, first, we report results in ImageNet that lead to a
revision of the hypothesis that adversarial perturbations are a consequence of
CNNs acting as a linear classifier: CNNs act locally linearly to changes in the
image regions with objects recognized by the CNN, and in other regions the CNN
may act non-linearly. Then, we corroborate that when the neural responses are
linear, applying the foveation mechanism to the adversarial example tends to
significantly reduce the effect of the perturbation. This is because,
hypothetically, the CNNs for ImageNet are robust to changes of scale and
translation of the object produced by the foveation, but this property does not
generalize to transformations of the perturbation. As a result, the accuracy
after a foveation is almost the same as the accuracy of the CNN without the
adversarial perturbation, even if the adversarial perturbation is calculated
taking into account a foveation
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Despite the growing use of transformer models in computer vision, a
mechanistic understanding of these networks is still needed. This work
introduces a method to reverse-engineer Vision Transformers trained to solve
image classification tasks. Inspired by previous research in NLP, we
demonstrate how the inner representations at any level of the hierarchy can be
projected onto the learned class embedding space to uncover how these networks
build categorical representations for their predictions. We use our framework
to show how image tokens develop class-specific representations that depend on
attention mechanisms and contextual information, and give insights on how
self-attention and MLP layers differentially contribute to this categorical
composition. We additionally demonstrate that this method (1) can be used to
determine the parts of an image that would be important for detecting the class
of interest, and (2) exhibits significant advantages over traditional linear
probing approaches. Taken together, our results position our proposed framework
as a powerful tool for mechanistic interpretability and explainability
research.Comment: NeurIPS 202
Different Algorithms (Might) Uncover Different Patterns: A Brain-Age Prediction Case Study
Machine learning is a rapidly evolving field with a wide range of
applications, including biological signal analysis, where novel algorithms
often improve the state-of-the-art. However, robustness to algorithmic
variability - measured by different algorithms, consistently uncovering similar
findings - is seldom explored. In this paper we investigate whether established
hypotheses in brain-age prediction from EEG research validate across
algorithms. First, we surveyed literature and identified various features known
to be informative for brain-age prediction. We employed diverse feature
extraction techniques, processing steps, and models, and utilized the
interpretative power of SHapley Additive exPlanations (SHAP) values to align
our findings with the existing research in the field. Few of our models
achieved state-of-the-art performance on the specific data-set we utilized.
Moreover, analysis demonstrated that while most models do uncover similar
patterns in the EEG signals, some variability could still be observed. Finally,
a few prominent findings could only be validated using specific models. We
conclude by suggesting remedies to the potential implications of this lack of
robustness to model variability
- …