42 research outputs found

    Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

    Full text link
    Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image. The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a set of dense feature vectors is computed which encodes regions of multiple sizes centered on each pixel. The feature extractor is a multiscale convolutional network trained from raw pixels. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment will contain a single object. The convolutional network feature extractor is trained end-to-end from raw pixels, alleviating the need for engineered features. After training, the system is parameter free. The system yields record accuracies on the Stanford Background Dataset (8 classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170 classes) while being an order of magnitude faster than competing approaches, producing a 320 \times 240 image labeling in less than 1 second.Comment: 9 pages, 4 figures - Published in 29th International Conference on Machine Learning (ICML 2012), Jun 2012, Edinburgh, United Kingdo

    Indoor Semantic Segmentation using depth information

    Full text link
    This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA.Comment: 8 pages, 3 figure

    Clustering Learning for Robotic Vision

    Get PDF
    We present the clustering learning technique applied to multi-layer feedforward deep neural networks. We show that this unsupervised learning technique can compute network filters with only a few minutes and a much reduced set of parameters. The goal of this paper is to promote the technique for general-purpose robotic vision systems. We report its use in static image datasets and object tracking datasets. We show that networks trained with clustering learning can outperform large networks trained for many hours on complex datasets.Comment: Code for this paper is available here: https://github.com/culurciello/CL_paper1_cod

    Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

    Full text link
    Detecting and classifying targets in video streams from surveillance cameras is a cumbersome, error-prone and expensive task. Often, the incurred costs are prohibitive for real-time monitoring. This leads to data being stored locally or transmitted to a central storage site for post-incident examination. The required communication links and archiving of the video data are still expensive and this setup excludes preemptive actions to respond to imminent threats. An effective way to overcome these limitations is to build a smart camera that transmits alerts when relevant video sequences are detected. Deep neural networks (DNNs) have come to outperform humans in visual classifications tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be extended to make use of higher-dimensional input data such as multispectral data. We explore this opportunity in terms of achievable accuracy and required computational effort. To analyze the precision of DNNs for scene labeling in an urban surveillance scenario we have created a dataset with 8 classes obtained in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR snapshot sensor to assess the potential of multispectral image data for target classification. We evaluate several new DNNs, showing that the spectral information fused together with the RGB frames can be used to improve the accuracy of the system or to achieve similar accuracy with a 3x smaller computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even for scarcely occurring, but particularly interesting classes, such as cars, 75% of the pixels are labeled correctly with errors occurring only around the border of the objects. This high accuracy was obtained with a training set of only 30 labeled images, paving the way for fast adaptation to various application scenarios.Comment: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures I

    Learning Multi-Scale Representations for Material Classification

    Full text link
    The recent progress in sparse coding and deep learning has made unsupervised feature learning methods a strong competitor to hand-crafted descriptors. In computer vision, success stories of learned features have been predominantly reported for object recognition tasks. In this paper, we investigate if and how feature learning can be used for material recognition. We propose two strategies to incorporate scale information into the learning procedure resulting in a novel multi-scale coding procedure. Our results show that our learned features for material recognition outperform hand-crafted descriptors on the FMD and the KTH-TIPS2 material classification benchmarks

    Deep Predictive Models for Collision Risk Assessment in Autonomous Driving

    Full text link
    In this paper, we investigate a predictive approach for collision risk assessment in autonomous and assisted driving. A deep predictive model is trained to anticipate imminent accidents from traditional video streams. In particular, the model learns to identify cues in RGB images that are predictive of hazardous upcoming situations. In contrast to previous work, our approach incorporates (a) temporal information during decision making, (b) multi-modal information about the environment, as well as the proprioceptive state and steering actions of the controlled vehicle, and (c) information about the uncertainty inherent to the task. To this end, we discuss Deep Predictive Models and present an implementation using a Bayesian Convolutional LSTM. Experiments in a simple simulation environment show that the approach can learn to predict impending accidents with reasonable accuracy, especially when multiple cameras are used as input sources.Comment: 8 pages, 4 figure
    corecore