53 research outputs found
Distilling Critical Paths in Convolutional Neural Networks
Neural network compression and acceleration are widely demanded currently due
to the resource constraints on most deployment targets. In this paper, through
analyzing the filter activation, gradients, and visualizing the filters'
functionality in convolutional neural networks, we show that the filters in
higher layers learn extremely task-specific features, which are exclusive for
only a small subset of the overall tasks, or even a single class. Based on such
findings, we reveal the critical paths of information flow for different
classes. And by their intrinsic property of exclusiveness, we propose a
critical path distillation method, which can effectively customize the
convolutional neural networks to small ones with much smaller model size and
less computation.Comment: Accepted in NIPS18 CDNNRIA worksho
DoPa: A Comprehensive CNN Detection Methodology against Physical Adversarial Attacks
Recently, Convolutional Neural Networks (CNNs) demonstrate a considerable
vulnerability to adversarial attacks, which can be easily misled by adversarial
perturbations. With more aggressive methods proposed, adversarial attacks can
be also applied to the physical world, causing practical issues to various CNN
powered applications. To secure CNNs, adversarial attack detection is
considered as the most critical approach. However, most existing works focus on
superficial patterns and merely search a particular method to differentiate the
adversarial inputs and natural inputs, ignoring the analysis of CNN inner
vulnerability. Therefore, they can only target to specific physical adversarial
attacks, lacking expected versatility to different attacks. To address this
issue, we propose DoPa -- a comprehensive CNN detection methodology for various
physical adversarial attacks. By interpreting the CNN's vulnerability, we find
that non-semantic adversarial perturbations can activate CNN with significantly
abnormal activations and even overwhelm other semantic input patterns'
activations. Therefore, we add a self-verification stage to analyze the
semantics of distinguished activation patterns, which improves the CNN
recognition process. We apply such a detection methodology into both image and
audio CNN recognition scenarios. Experiments show that DoPa can achieve an
average rate of 90% success for image attack detection and 92% success for
audio attack detection.
Announcement:[The original DoPa draft on arXiv was modified and submitted to
a conference already, while this short abstract was submitted only for a
presentation at the KDD 2019 AIoT Workshop.]Comment: 5 pages, 3 figure
Functionality-Oriented Convolutional Filter Pruning
The sophisticated structure of Convolutional Neural Network (CNN) allows for
outstanding performance, but at the cost of intensive computation. As
significant redundancies inevitably present in such a structure, many works
have been proposed to prune the convolutional filters for computation cost
reduction. Although extremely effective, most works are based only on
quantitative characteristics of the convolutional filters, and highly overlook
the qualitative interpretation of individual filter's specific functionality.
In this work, we interpreted the functionality and redundancy of the
convolutional filters from different perspectives, and proposed a
functionality-oriented filter pruning method. With extensive experiment
results, we proved the convolutional filters' qualitative significance
regardless of magnitude, demonstrated significant neural network redundancy due
to repetitive filter functions, and analyzed the filter functionality defection
under inappropriate retraining process. Such an interpretable pruning approach
not only offers outstanding computation cost optimization over previous filter
pruning methods, but also interprets filter pruning process
How convolutional neural network see the world - A survey of convolutional neural network visualization methods
Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive
performance on many computer vision related tasks, such as object detection,
image recognition, image retrieval, etc. These achievements benefit from the
CNNs outstanding capability to learn the input features with deep layers of
neuron structures and iterative training process. However, these learned
features are hard to identify and interpret from a human vision perspective,
causing a lack of understanding of the CNNs internal working mechanism. To
improve the CNN interpretability, the CNN visualization is well utilized as a
qualitative analysis method, which translates the internal features into
visually perceptible patterns. And many CNN visualization works have been
proposed in the literature to interpret the CNN in perspectives of network
structure, operation, and semantic concept. In this paper, we expect to provide
a comprehensive survey of several representative CNN visualization methods,
including Activation Maximization, Network Inversion, Deconvolutional Neural
Networks (DeconvNet), and Network Dissection based visualization. These methods
are presented in terms of motivations, algorithms, and experiment results.
Based on these visualization methods, we also discuss their practical
applications to demonstrate the significance of the CNN interpretability in
areas of network design, optimization, security enhancement, etc.Comment: 32 pages, 21 figures. Mathematical Foundations of Computin
Demystifying Neural Network Filter Pruning
Based on filter magnitude ranking (e.g. L1 norm), conventional filter pruning
methods for Convolutional Neural Networks (CNNs) have been proved with great
effectiveness in computation load reduction. Although effective, these methods
are rarely analyzed in a perspective of filter functionality. In this work, we
explore the filter pruning and the retraining through qualitative filter
functionality interpretation. We find that the filter magnitude based method
fails to eliminate the filters with repetitive functionality. And the
retraining phase is actually used to reconstruct the remained filters for
functionality compensation for the wrongly-pruned critical filters. With a
proposed functionality-oriented pruning method, we further testify that, by
precisely addressing the filter functionality redundancy, a CNN can be pruned
without considerable accuracy drop, and the retraining phase is unnecessary
The Global Convergence of the Alternating Minimization Algorithm for Deep Neural Network Problems
In recent years, stochastic gradient descent (SGD) and its variants have been
the dominant optimization methods for training deep neural networks. However,
SGD suffers from limitations such as the lack of theoretical guarantees,
vanishing gradients, excessive sensitivity to input, and difficulties solving
highly non-smooth constraints and functions. To overcome these drawbacks,
alternating minimization-based methods for deep neural network optimization
have attracted fast-increasing attention recently. As an emerging and open
domain, however, several new challenges need to be addressed, including: 1)
there is no guarantee of global convergence under mild, practical conditions,
and 2) cubic time complexity in the size of feature dimensions. We therefore
propose a novel Deep Learning Alternating Minimization (DLAM) algorithm to deal
with these two challenges. Our innovative inequality-constrained formulation
infinitely approximates the original problem with non-convex equality
constraints, enabling our proof of global convergence of the DLAM algorithm
under mild, practical conditions. The time complexity is successfully reduced
from to via a dedicated algorithm design for subproblems that
is enhanced by iterative quadratic approximations and backtracking. Experiments
on benchmark datasets demonstrate the effectiveness of our proposed DLAM
algorithm
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
One popular hypothesis of neural network generalization is that the flat
local minima of loss surface in parameter space leads to good generalization.
However, we demonstrate that loss surface in parameter space has no obvious
relationship with generalization, especially under adversarial settings.
Through visualizing decision surfaces in both parameter space and input space,
we instead show that the geometry property of decision surface in input space
correlates well with the adversarial robustness. We then propose an adversarial
robustness indicator, which can evaluate a neural network's intrinsic
robustness property without testing its accuracy under adversarial attacks.
Guided by it, we further propose our robust training method. Without involving
adversarial training, our method could enhance network's intrinsic adversarial
robustness against various adversarial attacks.Comment: 15 pages, submitted to ICLR 201
Task-Adaptive Incremental Learning for Intelligent Edge Devices
Convolutional Neural Networks (CNNs) are used for a wide range of
image-related tasks such as image classification and object detection. However,
a large pre-trained CNN model contains a lot of redundancy considering the
task-specific edge applications. Also, the statically pre-trained model could
not efficiently handle the dynamic data in the real-world application. The CNN
training data and their labels are collected in an incremental manner. To
tackle the above two challenges, we proposed TeAM a task-adaptive incremental
learning framework for CNNs in intelligent edge devices. Given a pre-trained
large model, TeAM can configure it into any specialized model for dedicated
edge applications. The specialized model can be quickly fine-tuned with local
data to achieve very high accuracy. Also, with our global aggregation and
incremental learning scheme, the specialized CNN models can be collaboratively
aggregated to an enhanced global model with new training data.Comment: 2 page
ASP:A Fast Adversarial Attack Example Generation Framework based on Adversarial Saliency Prediction
With the excellent accuracy and feasibility, the Neural Networks have been
widely applied into the novel intelligent applications and systems. However,
with the appearance of the Adversarial Attack, the NN based system performance
becomes extremely vulnerable:the image classification results can be
arbitrarily misled by the adversarial examples, which are crafted images with
human unperceivable pixel-level perturbation. As this raised a significant
system security issue, we implemented a series of investigations on the
adversarial attack in this work: We first identify an image's pixel
vulnerability to the adversarial attack based on the adversarial saliency
analysis. By comparing the analyzed saliency map and the adversarial
perturbation distribution, we proposed a new evaluation scheme to
comprehensively assess the adversarial attack precision and efficiency. Then,
with a novel adversarial saliency prediction method, a fast adversarial example
generation framework, namely "ASP", is proposed with significant attack
efficiency improvement and dramatic computation cost reduction. Compared to the
previous methods, experiments show that ASP has at most 12 times speed-up for
adversarial example generation, 2 times lower perturbation rate, and high
attack success rate of 87% on both MNIST and Cifar10. ASP can be also well
utilized to support the data-hungry NN adversarial training. By reducing the
attack success rate as much as 90%, ASP can quickly and effectively enhance the
defense capability of NN based system to the adversarial attacks
Interpreting and Evaluating Neural Network Robustness
Recently, adversarial deception becomes one of the most considerable threats
to deep neural networks. However, compared to extensive research in new designs
of various adversarial attacks and defenses, the neural networks' intrinsic
robustness property is still lack of thorough investigation. This work aims to
qualitatively interpret the adversarial attack and defense mechanism through
loss visualization, and establish a quantitative metric to evaluate the neural
network model's intrinsic robustness. The proposed robustness metric identifies
the upper bound of a model's prediction divergence in the given domain and thus
indicates whether the model can maintain a stable prediction. With extensive
experiments, our metric demonstrates several advantages over conventional
adversarial testing accuracy based robustness estimation: (1) it provides a
uniformed evaluation to models with different structures and parameter scales;
(2) it over-performs conventional accuracy based robustness estimation and
provides a more reliable evaluation that is invariant to different test
settings; (3) it can be fast generated without considerable testing cost.Comment: Accepted in IJCAI'1
- …