1,157 research outputs found
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Over the last decade, Convolutional Neural Network (CNN) models have been
highly successful in solving complex vision problems. However, these deep
models are perceived as "black box" methods considering the lack of
understanding of their internal functioning. There has been a significant
recent interest in developing explainable deep learning models, and this paper
is an effort in this direction. Building on a recently proposed method called
Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide
better visual explanations of CNN model predictions, in terms of better object
localization as well as explaining occurrences of multiple object instances in
a single image, when compared to state-of-the-art. We provide a mathematical
derivation for the proposed method, which uses a weighted combination of the
positive partial derivatives of the last convolutional layer feature maps with
respect to a specific class score as weights to generate a visual explanation
for the corresponding class label. Our extensive experiments and evaluations,
both subjective and objective, on standard datasets showed that Grad-CAM++
provides promising human-interpretable visual explanations for a given CNN
architecture across multiple tasks including classification, image caption
generation and 3D action recognition; as well as in new settings such as
knowledge distillation.Comment: 17 Pages, 15 Figures, 11 Tables. Accepted in the proceedings of IEEE
Winter Conf. on Applications of Computer Vision (WACV2018). Extended version
is under review at IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Efficient Defenses Against Adversarial Attacks
Following the recent adoption of deep neural networks (DNN) accross a wide
range of applications, adversarial attacks against these models have proven to
be an indisputable threat. Adversarial samples are crafted with a deliberate
intention of undermining a system. In the case of DNNs, the lack of better
understanding of their working has prevented the development of efficient
defenses. In this paper, we propose a new defense method based on practical
observations which is easy to integrate into models and performs better than
state-of-the-art defenses. Our proposed solution is meant to reinforce the
structure of a DNN, making its prediction more stable and less likely to be
fooled by adversarial samples. We conduct an extensive experimental study
proving the efficiency of our method against multiple attacks, comparing it to
numerous defenses, both in white-box and black-box setups. Additionally, the
implementation of our method brings almost no overhead to the training
procedure, while maintaining the prediction performance of the original model
on clean samples.Comment: 16 page
Data-free parameter pruning for Deep Neural Networks
Deep Neural nets (NNs) with millions of parameters are at the heart of many
state-of-the-art computer vision systems today. However, recent works have
shown that much smaller models can achieve similar levels of performance. In
this work, we address the problem of pruning parameters in a trained NN model.
Instead of removing individual weights one at a time as done in previous works,
we remove one neuron at a time. We show how similar neurons are redundant, and
propose a systematic way to remove them. Our experiments in pruning the densely
connected layers show that we can remove upto 85\% of the total parameters in
an MNIST-trained network, and about 35\% for AlexNet without significantly
affecting performance. Our method can be applied on top of most networks with a
fully connected layer to give a smaller network.Comment: BMVC 201
FastSal: a Computationally Efficient Network for Visual Saliency Prediction
This paper focuses on the problem of visual saliency prediction, predicting
regions of an image that tend to attract human visual attention, under a
constrained computational budget. We modify and test various recent efficient
convolutional neural network architectures like EfficientNet and MobileNetV2
and compare them with existing state-of-the-art saliency models such as SalGAN
and DeepGaze II both in terms of standard accuracy metrics like AUC and NSS,
and in terms of the computational complexity and model size. We find that
MobileNetV2 makes an excellent backbone for a visual saliency model and can be
effective even without a complex decoder. We also show that knowledge transfer
from a more computationally expensive model like DeepGaze II can be achieved
via pseudo-labelling an unlabelled dataset, and that this approach gives result
on-par with many state-of-the-art algorithms with a fraction of the
computational cost and model size. Source code is available at
https://github.com/feiyanhu/FastSal
Dynamic Knowledge Distillation with A Single Stream Structure for RGB-D Salient Object Detection
RGB-D salient object detection(SOD) demonstrates its superiority on detecting
in complex environments due to the additional depth information introduced in
the data. Inevitably, an independent stream is introduced to extract features
from depth images, leading to extra computation and parameters. This
methodology which sacrifices the model size to improve the detection accuracy
may impede the practical application of SOD problems. To tackle this dilemma,
we propose a dynamic distillation method along with a lightweight framework,
which significantly reduces the parameters. This method considers the factors
of both teacher and student performance within the training stage and
dynamically assigns the distillation weight instead of applying a fixed weight
on the student model. Extensive experiments are conducted on five public
datasets to demonstrate that our method can achieve competitive performance
compared to 10 prior methods through a 78.2MB lightweight structure
- …