Search CORE

15,083 research outputs found

A Selective Overview of Deep Learning

Author: Fan Jianqing
Ma Cong
Zhong Yiqiao
Publication venue
Publication date: 15/04/2019
Field of study

Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research

arXiv.org e-Print Archive

Deep Convolutional Neural Network and Sparse Least Squares Migration

Author: Chen Yuqing
Liu Zhaolun
Schuster Gerard
Publication venue: 'Society of Exploration Geophysicists'
Publication date: 24/05/2020
Field of study

We recast the forward pass of a multilayered convolutional neural network (CNN) as the solution to the problem of sparse least squares migration (LSM). The CNN filters and feature maps are shown to be analogous, but not equivalent, to the migration Green's functions and the quasi-reflectivity distribution, respectively. This provides a physical interpretation of the filters and feature maps in deep CNN in terms of the operators for seismic imaging. Motivated by the connection between sparse LSM and CNN, we propose the neural network version of sparse LSM. Unlike the standard LSM method that finds the optimal reflectivity image, neural network LSM (NNLSM) finds both the optimal quasi-reflectivity image and the quasi-migration Green's functions. These quasi-migration-Green's functions are also denoted as the convolutional filters in a CNN and are similar to migration Green's functions. The advantage of NNLSM over standard LSM is that its computational cost is significantly less and it can be used for denoising coherent and incoherent noise in migration images. Its disadvantage is that the NNLSM quasi-reflectivity image is only an approximation to the actual reflectivity distribution. However, the quasi-reflectivity image can be used as a superresolution attribute image for high-resolution delineation of geologic bodies.Comment: 25 pages, 13 figure

arXiv.org e-Print Archive

OrthoSeg: A Deep Multimodal Convolutional Neural Network for Semantic Segmentation of Orthoimagery

Author: Bodani Pankaj
Sharma Shashikant
Shreshtha Kumar
Publication venue: 'Copernicus GmbH'
Publication date: 20/11/2018
Field of study

This paper addresses the task of semantic segmentation of orthoimagery using multimodal data e.g. optical RGB, infrared and digital surface model. We propose a deep convolutional neural network architecture termed OrthoSeg for semantic segmentation using multimodal, orthorectified and coregistered data. We also propose a training procedure for supervised training of OrthoSeg. The training procedure complements the inherent architectural characteristics of OrthoSeg for preventing complex co-adaptations of learned features, which may arise due to probable high dimensionality and spatial correlation in multimodal and/or multispectral coregistered data. OrthoSeg consists of parallel encoding networks for independent encoding of multimodal feature maps and a decoder designed for efficiently fusing independently encoded multimodal feature maps. A softmax layer at the end of the network uses the features generated by the decoder for pixel-wise classification. The decoder fuses feature maps from the parallel encoders locally as well as contextually at multiple scales to generate per-pixel feature maps for final pixel-wise classification resulting in segmented output. We experimentally show the merits of OrthoSeg by demonstrating state-of-the-art accuracy on the ISPRS Potsdam 2D Semantic Segmentation dataset. Adaptability is one of the key motivations behind OrthoSeg so that it serves as a useful architectural option for a wide range of problems involving the task of semantic segmentation of coregistered multimodal and/or multispectral imagery. Hence, OrthoSeg is designed to enable independent scaling of parallel encoder networks and decoder network to better match application requirements, such as the number of input channels, the effective field-of-view, and model capacity.Comment: 8 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Liu Song
Ogunbona Philip
Tang Chang
Wang Pichao
Publication venue
Publication date: 01/01/2016
Field of study

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57\% classification accuracy and ranked

2^{nd}

place in this challenge but was very close to the best performance even though we only used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633

arXiv.org e-Print Archive

Research Online

RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks

Author: Singh Bharat
Tripathi Rohun
Publication venue
Publication date: 12/05/2020
Field of study

We propose RSO (random search optimization), a gradient free Markov Chain Monte Carlo search based approach for training deep neural networks. To this end, RSO adds a perturbation to a weight in a deep neural network and tests if it reduces the loss on a mini-batch. If this reduces the loss, the weight is updated, otherwise the existing weight is retained. Surprisingly, we find that repeating this process a few times for each weight is sufficient to train a deep neural network. The number of weight updates for RSO is an order of magnitude lesser when compared to backpropagation with SGD. RSO can make aggressive weight updates in each step as there is no concept of learning rate. The weight update step for individual layers is also not coupled with the magnitude of the loss. RSO is evaluated on classification tasks on MNIST and CIFAR-10 datasets with deep neural networks of 6 to 10 layers where it achieves an accuracy of 99.1% and 81.8% respectively. We also find that after updating the weights just 5 times, the algorithm obtains a classification accuracy of 98% on MNIST.Comment: Technical Repor

arXiv.org e-Print Archive

A Deeply-Recursive Convolutional Network for Crowd Counting

Author: Ding Xinghao
He Fujin
Huang Yue
Lin Zhirui
Wang Yu
Publication venue
Publication date: 15/05/2018
Field of study

The estimation of crowd count in images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. Recently, the convolutional neural network (CNN) based approaches have been shown to be more effective in crowd counting than traditional methods that use handcrafted features. However, the existing CNN-based methods still suffer from large number of parameters and large storage space, which require high storage and computing resources and thus limit the real-world application. Consequently, we propose a deeply-recursive network (DR-ResNet) based on ResNet blocks for crowd counting. The recursive structure makes the network deeper while keeping the number of parameters unchanged, which enhances network capability to capture statistical regularities in the context of the crowd. Besides, we generate a new dataset from the video-monitoring data of Beijing bus station. Experimental results have demonstrated that proposed method outperforms most state-of-the-art methods with far less number of parameters

arXiv.org e-Print Archive

Memorization in Overparameterized Autoencoders

Author: Belkin Mikhail
Radhakrishnan Adityanarayanan
Uhler Caroline
Yang Karren
Publication venue
Publication date: 03/09/2019
Field of study

The ability of deep neural networks to generalize well in the overparameterized regime has become a subject of significant research interest. We show that overparameterized autoencoders exhibit memorization, a form of inductive bias that constrains the functions learned through the optimization process to concentrate around the training examples, although the network could in principle represent a much larger function class. In particular, we prove that single-layer fully-connected autoencoders project data onto the (nonlinear) span of the training examples. In addition, we show that deep fully-connected autoencoders learn a map that is locally contractive at the training examples, and hence iterating the autoencoder results in convergence to the training examples. Finally, we prove that depth is necessary and provide empirical evidence that it is also sufficient for memorization in convolutional autoencoders. Understanding this inductive bias may shed light on the generalization properties of overparametrized deep neural networks that are currently unexplained by classical statistical theory

arXiv.org e-Print Archive

Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation

Author: Lathuilière Stéphane
Pilzer Andrea
Ricci Elisa
Sebe Nicu
Publication venue
Publication date: 20/04/2019
Field of study

Nowadays, the majority of state of the art monocular depth estimation techniques are based on supervised deep learning models. However, collecting RGB images with associated depth maps is a very time consuming procedure. Therefore, recent works have proposed deep architectures for addressing the monocular depth prediction task as a reconstruction problem, thus avoiding the need of collecting ground-truth depth. Following these works, we propose a novel self-supervised deep model for estimating depth maps. Our framework exploits two main strategies: refinement via cycle-inconsistency and distillation. Specifically, first a \emph{student} network is trained to predict a disparity map such as to recover from a frame in a camera view the associated image in the opposite view. Then, a backward cycle network is applied to the generated image to re-synthesize back the input image, estimating the opposite disparity. A third network exploits the inconsistency between the original and the reconstructed input frame in order to output a refined depth map. Finally, knowledge distillation is exploited, such as to transfer information from the refinement network to the student. Our extensive experimental evaluation demonstrate the effectiveness of the proposed framework which outperforms state of the art unsupervised methods on the KITTI benchmark.Comment: Accepted at CVPR201

arXiv.org e-Print Archive

Fine-grained Optimization of Deep Neural Networks

Author: Ozay Mete
Publication venue
Publication date: 22/05/2019
Field of study

In recent studies, several asymptotic upper bounds on generalization errors on deep neural networks (DNNs) are theoretically derived. These bounds are functions of several norms of weights of the DNNs, such as the Frobenius and spectral norms, and they are computed for weights grouped according to either input and output channels of the DNNs. In this work, we conjecture that if we can impose multiple constraints on weights of DNNs to upper bound the norms of the weights, and train the DNNs with these weights, then we can attain empirical generalization errors closer to the derived theoretical bounds, and improve accuracy of the DNNs. To this end, we pose two problems. First, we aim to obtain weights whose different norms are all upper bounded by a constant number, e.g. 1.0. To achieve these bounds, we propose a two-stage renormalization procedure; (i) normalization of weights according to different norms used in the bounds, and (ii) reparameterization of the normalized weights to set a constant and finite upper bound of their norms. In the second problem, we consider training DNNs with these renormalized weights. To this end, we first propose a strategy to construct joint spaces (manifolds) of weights according to different constraints in DNNs. Next, we propose a fine-grained SGD algorithm (FG-SGD) for optimization on the weight manifolds to train DNNs with assurance of convergence to minima. Experimental results show that image classification accuracy of baseline DNNs can be boosted using FG-SGD on collections of manifolds identified by multiple constraints

arXiv.org e-Print Archive

3D Face Mask Presentation Attack Detection Based on Intrinsic Image Analysis

Author: Feng Xiaoyi
Jiang Xiaoyue
Li Lei
Ma Yupeng
Roli Fabio
Xia Zhaoqiang
Publication venue
Publication date: 27/03/2019
Field of study

Face presentation attacks have become a major threat to face recognition systems and many countermeasures have been proposed in the past decade. However, most of them are devoted to 2D face presentation attacks, rather than 3D face masks. Unlike the real face, the 3D face mask is usually made of resin materials and has a smooth surface, resulting in reflectance differences. So, we propose a novel detection method for 3D face mask presentation attack by modeling reflectance differences based on intrinsic image analysis. In the proposed method, the face image is first processed with intrinsic image decomposition to compute its reflectance image. Then, the intensity distribution histograms are extracted from three orthogonal planes to represent the intensity differences of reflectance images between the real face and 3D face mask. After that, the 1D convolutional network is further used to capture the information for describing different materials or surfaces react differently to changes in illumination. Extensive experiments on the 3DMAD database demonstrate the effectiveness of our proposed method in distinguishing a face mask from the real one and show that the detection performance outperforms other state-of-the-art methods

arXiv.org e-Print Archive