16,289 research outputs found

    Backward Gradient Normalization in Deep Neural Networks

    Full text link
    We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions

    Convolutional Neural Networks for Histopathology Image Classification: Training vs. Using Pre-Trained Networks

    Full text link
    We explore the problem of classification within a medical image data-set based on a feature vector extracted from the deepest layer of pre-trained Convolution Neural Networks. We have used feature vectors from several pre-trained structures, including networks with/without transfer learning to evaluate the performance of pre-trained deep features versus CNNs which have been trained by that specific dataset as well as the impact of transfer learning with a small number of samples. All experiments are done on Kimia Path24 dataset which consists of 27,055 histopathology training patches in 24 tissue texture classes along with 1,325 test patches for evaluation. The result shows that pre-trained networks are quite competitive against training from scratch. As well, fine-tuning does not seem to add any tangible improvement for VGG16 to justify additional training while we observed considerable improvement in retrieval and classification accuracy when we fine-tuned the Inception structure.Comment: To appear in proceedings of the 7th International Conference on Image Processing Theory, Tools and Applications (IPTA 2017), Nov 28-Dec 1, Montreal, Canad

    Towards Automatic Wild Animal Monitoring: Identification of Animal Species in Camera-trap Images using Very Deep Convolutional Neural Networks

    Full text link
    Non intrusive monitoring of animals in the wild is possible using camera trapping framework, which uses cameras triggered by sensors to take a burst of images of animals in their habitat. However camera trapping framework produces a high volume of data (in the order on thousands or millions of images), which must be analyzed by a human expert. In this work, a method for animal species identification in the wild using very deep convolutional neural networks is presented. Multiple versions of the Snapshot Serengeti dataset were used in order to probe the ability of the method to cope with different challenges that camera-trap images demand. The method reached 88.9% of accuracy in Top-1 and 98.1% in Top-5 in the evaluation set using a residual network topology. Also, the results show that the proposed method outperforms previous approximations and proves that recognition in camera-trap images can be automated.Comment: Submitted to ECCV1

    On Minimization of a Quadratic Binary Functional

    Full text link
    The problem of minimization of a quadratic functional depending on great number of binary variables is examined. 3 variants of minimization procedure are studied with the aid of computer simulation for spin-glass matrices. It is shown that under other equal conditions evident superiority has the maximal dynamics (the greedy algorithm). The dependence of the results on a distance between start points and the ground state is investigated. It is determined that the character of distribution of local minima depends on this distance crucially.Comment: 17 pages, 16 figure

    Deep Barcodes for Fast Retrieval of Histopathology Scans

    Full text link
    We investigate the concept of deep barcodes and propose two methods to generate them in order to expedite the process of classification and retrieval of histopathology images. Since binary search is computationally less expensive, in terms of both speed and storage, deep barcodes could be useful when dealing with big data retrieval. Our experiments use the dataset Kimia Path24 to test three pre-trained networks for image retrieval. The dataset consists of 27,055 training images in 24 different classes with large variability, and 1,325 test images for testing. Apart from the high-speed and efficiency, results show a surprising retrieval accuracy of 71.62% for deep barcodes, as compared to 68.91% for deep features and 68.53% for compressed deep features.Comment: Accepted for publication in proceedings of the IEEE World Congress on Computational Intelligence (IEEE WCCI), Rio de Janeiro, Brazil, 8-3 July, 201

    How Do Neural Networks Estimate Optical Flow? A Neuropsychology-Inspired Study

    Full text link
    End-to-end trained convolutional neural networks have led to a breakthrough in optical flow estimation. The most recent advances focus on improving the optical flow estimation by improving the architecture and setting a new benchmark on the publicly available MPI-Sintel dataset. Instead, in this article, we investigate how deep neural networks estimate optical flow. A better understanding of how these networks function is important for (i) assessing their generalization capabilities to unseen inputs, and (ii) suggesting changes to improve their performance. For our investigation, we focus on FlowNetS, as it is the prototype of an encoder-decoder neural network for optical flow estimation. Furthermore, we use a filter identification method that has played a major role in uncovering the motion filters present in animal brains in neuropsychological research. The method shows that the filters in the deepest layer of FlowNetS are sensitive to a variety of motion patterns. Not only do we find translation filters, as demonstrated in animal brains, but thanks to the easier measurements in artificial neural networks, we even unveil dilation, rotation, and occlusion filters. Furthermore, we find similarities in the refinement part of the network and the perceptual filling-in process which occurs in the mammal primary visual cortex.Comment: 16 pages, 15 figure

    Projection-Based 2.5D U-net Architecture for Fast Volumetric Segmentation

    Full text link
    Convolutional neural networks are state-of-the-art for various segmentation tasks. While for 2D images these networks are also computationally efficient, 3D convolutions have huge storage requirements and require long training time. To overcome this issue, we introduce a network structure for volumetric data without 3D convolutional layers. The main idea is to include maximum intensity projections from different directions to transform the volumetric data to a sequence of images, where each image contains information of the full data. We then apply 2D convolutions to these projection images and lift them again to volumetric data using a trainable reconstruction algorithm.The proposed network architecture has less storage requirements than network structures using 3D convolutions. For a tested binary segmentation task, it even shows better performance than the 3D U-net and can be trained much faster.Comment: presented at the SAMPTA 2019 conferenc

    Adaptive Neural Networks for Efficient Inference

    Full text link
    We present an approach to adaptively utilize deep neural networks in order to reduce the evaluation time on new examples without loss of accuracy. Rather than attempting to redesign or approximate existing networks, we propose two schemes that adaptively utilize networks. We first pose an adaptive network evaluation scheme, where we learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example. We show that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples. We pose a global objective for learning an adaptive early exit or network selection policy and solve it by reducing the policy learning problem to a layer-by-layer weighted binary classification problem. Empirically, these approaches yield dramatic reductions in computational cost, with up to a 2.8x speedup on state-of-the-art networks from the ImageNet image recognition challenge with minimal (<1%) loss of top5 accuracy

    SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

    Full text link
    We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activation(s) in the pixel label space at any depth. SegNet is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer. The decoders help map low resolution feature maps at the output of the encoder stack to full input image size feature maps. This addresses an important drawback of recent deep learning approaches which have adopted networks designed for object categorization for pixel wise labelling. These methods lack a mechanism to map deep layer feature maps to input dimensions. They resort to ad hoc methods to upsample features, e.g. by replication. This results in noisy predictions and also restricts the number of pooling layers in order to avoid too much upsampling and thus reduces spatial context. SegNet overcomes these problems by learning to map encoder outputs to image pixel labels. We test the performance of SegNet on outdoor RGB scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models.Comment: This version was first submitted to CVPR' 15 on November 14, 2014 with paper Id 1468. A similar architecture was proposed more recently on May 17, 2015, see http://arxiv.org/pdf/1505.04366.pd

    Machine Learning as Statistical Data Assimilation

    Full text link
    We identify a strong equivalence between neural network based machine learning (ML) methods and the formulation of statistical data assimilation (DA), known to be a problem in statistical physics. DA, as used widely in physical and biological sciences, systematically transfers information in observations to a model of the processes producing the observations. The correspondence is that layer label in the ML setting is the analog of time in the data assimilation setting. Utilizing aspects of this equivalence we discuss how to establish the global minimum of the cost functions in the ML context, using a variational annealing method from DA. This provides a design method for optimal networks for ML applications and may serve as the basis for understanding the success of "deep learning". Results from an ML example are presented. When the layer label is taken to be continuous, the Euler-Lagrange equation for the ML optimization problem is an ordinary differential equation, and we see that the problem being solved is a two point boundary value problem. The use of continuous layers is denoted "deepest learning". The Hamiltonian version provides a direct rationale for back propagation as a solution method for the canonical momentum; however, it suggests other solution methods are to be preferred.Comment: arXiv admin note: text overlap with arXiv:1707.0141
    • …
    corecore