5,664 research outputs found
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
Structured pruning of filters or neurons has received increased focus for
compressing convolutional neural networks. Most existing methods rely on
multi-stage optimizations in a layer-wise manner for iteratively pruning and
retraining which may not be optimal and may be computation intensive. Besides,
these methods are designed for pruning a specific structure, such as filter or
block structures without jointly pruning heterogeneous structures. In this
paper, we propose an effective structured pruning approach that jointly prunes
filters as well as other structures in an end-to-end manner. To accomplish
this, we first introduce a soft mask to scale the output of these structures by
defining a new objective function with sparsity regularization to align the
output of baseline and network with this mask. We then effectively solve the
optimization problem by generative adversarial learning (GAL), which learns a
sparse soft mask in a label-free and an end-to-end manner. By forcing more
scaling factors in the soft mask to zero, the fast iterative
shrinkage-thresholding algorithm (FISTA) can be leveraged to fast and reliably
remove the corresponding structures. Extensive experiments demonstrate the
effectiveness of GAL on different datasets, including MNIST, CIFAR-10 and
ImageNet ILSVRC 2012. For example, on ImageNet ILSVRC 2012, the pruned
ResNet-50 achieves 10.88\% Top-5 error and results in a factor of 3.7x speedup.
This significantly outperforms state-of-the-art methods.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition
Like other problems in computer vision, offline handwritten Chinese character
recognition (HCCR) has achieved impressive results using convolutional neural
network (CNN)-based methods. However, larger and deeper networks are needed to
deliver state-of-the-art results in this domain. Such networks intuitively
appear to incur high computational cost, and require the storage of a large
number of parameters, which renders them unfeasible for deployment in portable
devices. To solve this problem, we propose a Global Supervised Low-rank
Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve
the problems of speed and storage capacity. We design a nine-layer CNN for HCCR
consisting of 3,755 classes, and devise an algorithm that can reduce the
networks computational cost by nine times and compress the network to 1/18 of
the original size of the baseline model, with only a 0.21% drop in accuracy. In
tests, the proposed algorithm surpassed the best single-network performance
reported thus far in the literature while requiring only 2.3 MB for storage.
Furthermore, when integrated with our effective forward implementation, the
recognition of an offline character image took only 9.7 ms on a CPU. Compared
with the state-of-the-art CNN model for HCCR, our approach is approximately 30
times faster, yet 10 times more cost efficient.Comment: 15 pages, 7 figures, 5 table
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
BitNet: Bit-Regularized Deep Neural Networks
We present a novel optimization strategy for training neural networks which
we call "BitNet". The parameters of neural networks are usually unconstrained
and have a dynamic range dispersed over all real values. Our key idea is to
limit the expressive power of the network by dynamically controlling the range
and set of values that the parameters can take. We formulate this idea using a
novel end-to-end approach that circumvents the discrete parameter space by
optimizing a relaxed continuous and differentiable upper bound of the typical
classification loss function. The approach can be interpreted as a
regularization inspired by the Minimum Description Length (MDL) principle. For
each layer of the network, our approach optimizes real-valued translation and
scaling factors and arbitrary precision integer-valued parameters (weights). We
empirically compare BitNet to an equivalent unregularized model on the MNIST
and CIFAR-10 datasets. We show that BitNet converges faster to a superior
quality solution. Additionally, the resulting model has significant savings in
memory due to the use of integer-valued parameters
PruneNet: Channel Pruning via Global Importance
Channel pruning is one of the predominant approaches for accelerating deep
neural networks. Most existing pruning methods either train from scratch with a
sparsity inducing term such as group lasso, or prune redundant channels in a
pretrained network and then fine tune the network. Both strategies suffer from
some limitations: the use of group lasso is computationally expensive,
difficult to converge and often suffers from worse behavior due to the
regularization bias. The methods that start with a pretrained network either
prune channels uniformly across the layers or prune channels based on the basic
statistics of the network parameters. These approaches either ignore the fact
that some CNN layers are more redundant than others or fail to adequately
identify the level of redundancy in different layers. In this work, we
investigate a simple-yet-effective method for pruning channels based on a
computationally light-weight yet effective data driven optimization step that
discovers the necessary width per layer. Experiments conducted on ILSVRC-
confirm effectiveness of our approach. With non-uniform pruning across the
layers on ResNet-, we are able to match the FLOP reduction of
state-of-the-art channel pruning results while achieving a higher
accuracy. Further, we show that our pruned ResNet- network outperforms
ResNet- and ResNet- networks, and that our pruned ResNet-
outperforms ResNet-.Comment: 12 pages, 3 figures, Published in ICLR 2020 NAS Worksho
C2S2: Cost-aware Channel Sparse Selection for Progressive Network Pruning
This paper describes a channel-selection approach for simplifying deep neural
networks. Specifically, we propose a new type of generic network layer, called
pruning layer, to seamlessly augment a given pre-trained model for compression.
Each pruning layer, comprising depth-wise kernels, is represented
with a dual format: one is real-valued and the other is binary. The former
enables a two-phase optimization process of network pruning to operate with an
end-to-end differentiable network, and the latter yields the mask information
for channel selection. Our method progressively performs the pruning task
layer-wise, and achieves channel selection according to a sparsity criterion to
favor pruning more channels. We also develop a cost-aware mechanism to prevent
the compression from sacrificing the expected network performance. Our results
for compressing several benchmark deep networks on image classification and
semantic segmentation are comparable to those by state-of-the-art
Convolutional Neural Networks with Transformed Input based on Robust Tensor Network Decomposition
Tensor network decomposition, originated from quantum physics to model
entangled many-particle quantum systems, turns out to be a promising
mathematical technique to efficiently represent and process big data in
parsimonious manner. In this study, we show that tensor networks can
systematically partition structured data, e.g. color images, for distributed
storage and communication in privacy-preserving manner. Leveraging the sea of
big data and metadata privacy, empirical results show that neighbouring
subtensors with implicit information stored in tensor network formats cannot be
identified for data reconstruction. This technique complements the existing
encryption and randomization techniques which store explicit data
representation at one place and highly susceptible to adversarial attacks such
as side-channel attacks and de-anonymization. Furthermore, we propose a theory
for adversarial examples that mislead convolutional neural networks to
misclassification using subspace analysis based on singular value decomposition
(SVD). The theory is extended to analyze higher-order tensors using
tensor-train SVD (TT-SVD); it helps to explain the level of susceptibility of
different datasets to adversarial attacks, the structural similarity of
different adversarial attacks including global and localized attacks, and the
efficacy of different adversarial defenses based on input transformation. An
efficient and adaptive algorithm based on robust TT-SVD is then developed to
detect strong and static adversarial attacks
2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy
Deep Convolutional Neural Networks~(CNNs) offer remarkable performance of
classifications and regressions in many high-dimensional problems and have been
widely utilized in real-word cognitive applications. However, high
computational cost of CNNs greatly hinder their deployment in
resource-constrained applications, real-time systems and edge computing
platforms. To overcome this challenge, we propose a novel filter-pruning
framework, two-phase filter pruning based on conditional entropy, namely
\textit{2PFPCE}, to compress the CNN models and reduce the inference time with
marginal performance degradation. In our proposed method, we formulate filter
pruning process as an optimization problem and propose a novel filter selection
criteria measured by conditional entropy. Based on the assumption that the
representation of neurons shall be evenly distributed, we also develop a
maximum-entropy filter freeze technique that can reduce over fitting. Two
filter pruning strategies -- global and layer-wise strategies, are compared.
Our experiment result shows that combining these two strategies can achieve a
higher neural network compression ratio than applying only one of them under
the same accuracy drop threshold. Two-phase pruning, that is, combining both
global and layer-wise strategies, achieves 10 X FLOPs reduction and 46%
inference time reduction on VGG-16, with 2% accuracy drop.Comment: 8 pages, 6 figure
Deep Learning-Based Video Coding: A Review and A Case Study
The past decade has witnessed great success of deep learning technology in
many disciplines, especially in computer vision and image processing. However,
deep learning-based video coding remains in its infancy. This paper reviews the
representative works about using deep learning for image/video coding, which
has been an actively developing research area since the year of 2015. We divide
the related works into two categories: new coding schemes that are built
primarily upon deep networks (deep schemes), and deep network-based coding
tools (deep tools) that shall be used within traditional coding schemes or
together with traditional coding tools. For deep schemes, pixel probability
modeling and auto-encoder are the two approaches, that can be viewed as
predictive coding scheme and transform coding scheme, respectively. For deep
tools, there have been several proposed techniques using deep learning to
perform intra-picture prediction, inter-picture prediction, cross-channel
prediction, probability distribution prediction, transform, post- or in-loop
filtering, down- and up-sampling, as well as encoding optimizations. In the
hope of advocating the research of deep learning-based video coding, we present
a case study of our developed prototype video codec, namely Deep Learning Video
Coding (DLVC). DLVC features two deep tools that are both based on
convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF)
and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help
improve the compression efficiency by a significant margin. With the two deep
tools as well as other non-deep coding tools, DLVC is able to achieve on
average 39.6\% and 33.0\% bits saving than HEVC, under random-access and
low-delay configurations, respectively. The source code of DLVC has been
released for future researches
- …