540 research outputs found

    Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

    Full text link
    Detecting and classifying targets in video streams from surveillance cameras is a cumbersome, error-prone and expensive task. Often, the incurred costs are prohibitive for real-time monitoring. This leads to data being stored locally or transmitted to a central storage site for post-incident examination. The required communication links and archiving of the video data are still expensive and this setup excludes preemptive actions to respond to imminent threats. An effective way to overcome these limitations is to build a smart camera that transmits alerts when relevant video sequences are detected. Deep neural networks (DNNs) have come to outperform humans in visual classifications tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be extended to make use of higher-dimensional input data such as multispectral data. We explore this opportunity in terms of achievable accuracy and required computational effort. To analyze the precision of DNNs for scene labeling in an urban surveillance scenario we have created a dataset with 8 classes obtained in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR snapshot sensor to assess the potential of multispectral image data for target classification. We evaluate several new DNNs, showing that the spectral information fused together with the RGB frames can be used to improve the accuracy of the system or to achieve similar accuracy with a 3x smaller computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even for scarcely occurring, but particularly interesting classes, such as cars, 75% of the pixels are labeled correctly with errors occurring only around the border of the objects. This high accuracy was obtained with a training set of only 30 labeled images, paving the way for fast adaptation to various application scenarios.Comment: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures I

    Understanding High Resolution Aerial Imagery Using Computer Vision Techniques

    Get PDF
    Computer vision can make important contributions to the analysis of remote sensing satellite or aerial imagery. However, the resolution of early satellite imagery was not sufficient to provide useful spatial features. The situation is changing with the advent of very-high-spatial-resolution (VHR) imaging sensors. This change makes it possible to use computer vision techniques to perform analysis of man-made structures. Meanwhile, the development of multi-view imaging techniques allows the generation of accurate point clouds as ancillary knowledge. This dissertation aims at developing computer vision and machine learning algorithms for high resolution aerial imagery analysis in the context of application problems including debris detection, building detection and roof condition assessment. High resolution aerial imagery and point clouds were provided by Pictometry International for this study. Debris detection after natural disasters such as tornadoes, hurricanes or tsunamis, is needed for effective debris removal and allocation of limited resources. Significant advances in aerial image acquisition have greatly enabled the possibilities for rapid and automated detection of debris. In this dissertation, a robust debris detection algorithm is proposed. Large scale aerial images are partitioned into homogeneous regions by interactive segmentation. Debris areas are identified based on extracted texture features. Robust building detection is another important part of high resolution aerial imagery understanding. This dissertation develops a 3D scene classification algorithm for building detection using point clouds derived from multi-view imagery. Point clouds are divided into point clusters using Euclidean clustering. Individual point clusters are identified based on extracted spectral and 3D structural features. The inspection of roof condition is an important step in damage claim processing in the insurance industry. Automated roof condition assessment from remotely sensed images is proposed in this dissertation. Initially, texture classification and a bag-of-words model were applied to assess the roof condition using features derived from the whole rooftop. However, considering the complexity of residential rooftop, a more sophisticated method is proposed to divide the task into two stages: 1) roof segmentation, followed by 2) classification of segmented roof regions. Deep learning techniques are investigated for both segmentation and classification. A deep learned feature is proposed and applied in a region merging segmentation algorithm. A fine-tuned deep network is adopted for roof segment classification and found to achieve higher accuracy than traditional methods using hand-crafted features. Contributions of this study include the development of algorithms for debris detection using 2D images and building detection using 3D point clouds. For roof condition assessment, the solutions to this problem are explored in two directions: features derived from the whole rooftop and features extracted from each roof segments. Through our research, roof segmentation followed by segments classification was found to be a more promising method and the workflow processing developed and tested. Deep learning techniques are also investigated for both roof segmentation and segments classification. More unsupervised feature extraction techniques using deep learning can be explored in future work

    Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

    Full text link
    Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
    corecore