15,083 research outputs found
A Selective Overview of Deep Learning
Deep learning has arguably achieved tremendous success in recent years. In
simple words, deep learning uses the composition of many nonlinear functions to
model the complex dependency between input features and labels. While neural
networks have a long history, recent advances have greatly improved their
performance in computer vision, natural language processing, etc. From the
statistical and scientific perspective, it is natural to ask: What is deep
learning? What are the new characteristics of deep learning, compared with
classical methods? What are the theoretical foundations of deep learning? To
answer these questions, we introduce common neural network models (e.g.,
convolutional neural nets, recurrent neural nets, generative adversarial nets)
and training techniques (e.g., stochastic gradient descent, dropout, batch
normalization) from a statistical point of view. Along the way, we highlight
new characteristics of deep learning (including depth and over-parametrization)
and explain their practical and theoretical benefits. We also sample recent
results on theories of deep learning, many of which are only suggestive. While
a complete understanding of deep learning remains elusive, we hope that our
perspectives and discussions serve as a stimulus for new statistical research
Deep Convolutional Neural Network and Sparse Least Squares Migration
We recast the forward pass of a multilayered convolutional neural network
(CNN) as the solution to the problem of sparse least squares migration (LSM).
The CNN filters and feature maps are shown to be analogous, but not equivalent,
to the migration Green's functions and the quasi-reflectivity distribution,
respectively. This provides a physical interpretation of the filters and
feature maps in deep CNN in terms of the operators for seismic imaging.
Motivated by the connection between sparse LSM and CNN, we propose the neural
network version of sparse LSM. Unlike the standard LSM method that finds the
optimal reflectivity image, neural network LSM (NNLSM) finds both the optimal
quasi-reflectivity image and the quasi-migration Green's functions. These
quasi-migration-Green's functions are also denoted as the convolutional filters
in a CNN and are similar to migration Green's functions. The advantage of NNLSM
over standard LSM is that its computational cost is significantly less and it
can be used for denoising coherent and incoherent noise in migration images.
Its disadvantage is that the NNLSM quasi-reflectivity image is only an
approximation to the actual reflectivity distribution. However, the
quasi-reflectivity image can be used as a superresolution attribute image for
high-resolution delineation of geologic bodies.Comment: 25 pages, 13 figure
OrthoSeg: A Deep Multimodal Convolutional Neural Network for Semantic Segmentation of Orthoimagery
This paper addresses the task of semantic segmentation of orthoimagery using
multimodal data e.g. optical RGB, infrared and digital surface model. We
propose a deep convolutional neural network architecture termed OrthoSeg for
semantic segmentation using multimodal, orthorectified and coregistered data.
We also propose a training procedure for supervised training of OrthoSeg. The
training procedure complements the inherent architectural characteristics of
OrthoSeg for preventing complex co-adaptations of learned features, which may
arise due to probable high dimensionality and spatial correlation in multimodal
and/or multispectral coregistered data. OrthoSeg consists of parallel encoding
networks for independent encoding of multimodal feature maps and a decoder
designed for efficiently fusing independently encoded multimodal feature maps.
A softmax layer at the end of the network uses the features generated by the
decoder for pixel-wise classification. The decoder fuses feature maps from the
parallel encoders locally as well as contextually at multiple scales to
generate per-pixel feature maps for final pixel-wise classification resulting
in segmented output. We experimentally show the merits of OrthoSeg by
demonstrating state-of-the-art accuracy on the ISPRS Potsdam 2D Semantic
Segmentation dataset. Adaptability is one of the key motivations behind
OrthoSeg so that it serves as a useful architectural option for a wide range of
problems involving the task of semantic segmentation of coregistered multimodal
and/or multispectral imagery. Hence, OrthoSeg is designed to enable independent
scaling of parallel encoder networks and decoder network to better match
application requirements, such as the number of input channels, the effective
field-of-view, and model capacity.Comment: 8 pages, 9 figures, 3 table
Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI). These dynamic images are constructed from a sequence of depth maps
using bidirectional rank pooling to effectively capture the spatial-temporal
information. Such image-based representations enable us to fine-tune the
existing ConvNets models trained on image data for classification of depth
sequences, without introducing large parameters to learn. Upon the proposed
representations, a convolutional Neural networks (ConvNets) based method is
developed for gesture recognition and evaluated on the Large-scale Isolated
Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The
method achieved 55.57\% classification accuracy and ranked place in
this challenge but was very close to the best performance even though we only
used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633
RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks
We propose RSO (random search optimization), a gradient free Markov Chain
Monte Carlo search based approach for training deep neural networks. To this
end, RSO adds a perturbation to a weight in a deep neural network and tests if
it reduces the loss on a mini-batch. If this reduces the loss, the weight is
updated, otherwise the existing weight is retained. Surprisingly, we find that
repeating this process a few times for each weight is sufficient to train a
deep neural network. The number of weight updates for RSO is an order of
magnitude lesser when compared to backpropagation with SGD. RSO can make
aggressive weight updates in each step as there is no concept of learning rate.
The weight update step for individual layers is also not coupled with the
magnitude of the loss. RSO is evaluated on classification tasks on MNIST and
CIFAR-10 datasets with deep neural networks of 6 to 10 layers where it achieves
an accuracy of 99.1% and 81.8% respectively. We also find that after updating
the weights just 5 times, the algorithm obtains a classification accuracy of
98% on MNIST.Comment: Technical Repor
A Deeply-Recursive Convolutional Network for Crowd Counting
The estimation of crowd count in images has a wide range of applications such
as video surveillance, traffic monitoring, public safety and urban planning.
Recently, the convolutional neural network (CNN) based approaches have been
shown to be more effective in crowd counting than traditional methods that use
handcrafted features. However, the existing CNN-based methods still suffer from
large number of parameters and large storage space, which require high storage
and computing resources and thus limit the real-world application.
Consequently, we propose a deeply-recursive network (DR-ResNet) based on ResNet
blocks for crowd counting. The recursive structure makes the network deeper
while keeping the number of parameters unchanged, which enhances network
capability to capture statistical regularities in the context of the crowd.
Besides, we generate a new dataset from the video-monitoring data of Beijing
bus station. Experimental results have demonstrated that proposed method
outperforms most state-of-the-art methods with far less number of parameters
Memorization in Overparameterized Autoencoders
The ability of deep neural networks to generalize well in the
overparameterized regime has become a subject of significant research interest.
We show that overparameterized autoencoders exhibit memorization, a form of
inductive bias that constrains the functions learned through the optimization
process to concentrate around the training examples, although the network could
in principle represent a much larger function class. In particular, we prove
that single-layer fully-connected autoencoders project data onto the
(nonlinear) span of the training examples. In addition, we show that deep
fully-connected autoencoders learn a map that is locally contractive at the
training examples, and hence iterating the autoencoder results in convergence
to the training examples. Finally, we prove that depth is necessary and provide
empirical evidence that it is also sufficient for memorization in convolutional
autoencoders. Understanding this inductive bias may shed light on the
generalization properties of overparametrized deep neural networks that are
currently unexplained by classical statistical theory
Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation
Nowadays, the majority of state of the art monocular depth estimation
techniques are based on supervised deep learning models. However, collecting
RGB images with associated depth maps is a very time consuming procedure.
Therefore, recent works have proposed deep architectures for addressing the
monocular depth prediction task as a reconstruction problem, thus avoiding the
need of collecting ground-truth depth. Following these works, we propose a
novel self-supervised deep model for estimating depth maps. Our framework
exploits two main strategies: refinement via cycle-inconsistency and
distillation. Specifically, first a \emph{student} network is trained to
predict a disparity map such as to recover from a frame in a camera view the
associated image in the opposite view. Then, a backward cycle network is
applied to the generated image to re-synthesize back the input image,
estimating the opposite disparity. A third network exploits the inconsistency
between the original and the reconstructed input frame in order to output a
refined depth map. Finally, knowledge distillation is exploited, such as to
transfer information from the refinement network to the student. Our extensive
experimental evaluation demonstrate the effectiveness of the proposed framework
which outperforms state of the art unsupervised methods on the KITTI benchmark.Comment: Accepted at CVPR201
Fine-grained Optimization of Deep Neural Networks
In recent studies, several asymptotic upper bounds on generalization errors
on deep neural networks (DNNs) are theoretically derived. These bounds are
functions of several norms of weights of the DNNs, such as the Frobenius and
spectral norms, and they are computed for weights grouped according to either
input and output channels of the DNNs. In this work, we conjecture that if we
can impose multiple constraints on weights of DNNs to upper bound the norms of
the weights, and train the DNNs with these weights, then we can attain
empirical generalization errors closer to the derived theoretical bounds, and
improve accuracy of the DNNs.
To this end, we pose two problems. First, we aim to obtain weights whose
different norms are all upper bounded by a constant number, e.g. 1.0. To
achieve these bounds, we propose a two-stage renormalization procedure; (i)
normalization of weights according to different norms used in the bounds, and
(ii) reparameterization of the normalized weights to set a constant and finite
upper bound of their norms. In the second problem, we consider training DNNs
with these renormalized weights. To this end, we first propose a strategy to
construct joint spaces (manifolds) of weights according to different
constraints in DNNs. Next, we propose a fine-grained SGD algorithm (FG-SGD) for
optimization on the weight manifolds to train DNNs with assurance of
convergence to minima. Experimental results show that image classification
accuracy of baseline DNNs can be boosted using FG-SGD on collections of
manifolds identified by multiple constraints
3D Face Mask Presentation Attack Detection Based on Intrinsic Image Analysis
Face presentation attacks have become a major threat to face recognition
systems and many countermeasures have been proposed in the past decade.
However, most of them are devoted to 2D face presentation attacks, rather than
3D face masks. Unlike the real face, the 3D face mask is usually made of resin
materials and has a smooth surface, resulting in reflectance differences. So,
we propose a novel detection method for 3D face mask presentation attack by
modeling reflectance differences based on intrinsic image analysis. In the
proposed method, the face image is first processed with intrinsic image
decomposition to compute its reflectance image. Then, the intensity
distribution histograms are extracted from three orthogonal planes to represent
the intensity differences of reflectance images between the real face and 3D
face mask. After that, the 1D convolutional network is further used to capture
the information for describing different materials or surfaces react
differently to changes in illumination. Extensive experiments on the 3DMAD
database demonstrate the effectiveness of our proposed method in distinguishing
a face mask from the real one and show that the detection performance
outperforms other state-of-the-art methods
- …