118,858 research outputs found

    SPARCNN: SPAtially Related Convolutional Neural Networks

    Full text link
    The ability to accurately detect and classify objects at varying pixel sizes in cluttered scenes is crucial to many Navy applications. However, detection performance of existing state-of the-art approaches such as convolutional neural networks (CNNs) degrade and suffer when applied to such cluttered and multi-object detection tasks. We conjecture that spatial relationships between objects in an image could be exploited to significantly improve detection accuracy, an approach that had not yet been considered by any existing techniques (to the best of our knowledge) at the time the research was conducted. We introduce a detection and classification technique called Spatially Related Detection with Convolutional Neural Networks (SPARCNN) that learns and exploits a probabilistic representation of inter-object spatial configurations within images from training sets for more effective region proposals to use with state-of-the-art CNNs. Our empirical evaluation of SPARCNN on the VOC 2007 dataset shows that it increases classification accuracy by 8% when compared to a region proposal technique that does not exploit spatial relations. More importantly, we obtained a higher performance boost of 18.8% when task difficulty in the test set is increased by including highly obscured objects and increased image clutter.Comment: 6 pages, AIPR 2016 submissio

    An iterative inference procedure applying conditional random fields for simultaneous classification of land cover and land use

    Get PDF
    Land cover and land use exhibit strong contextual dependencies. We propose a novel approach for the simultaneous classification of land cover and land use, where semantic and spatial context is considered. The image sites for land cover and land use classification form a hierarchy consisting of two layers: a land cover layer and a land use layer. We apply Conditional Random Fields (CRF) at both layers. The layers differ with respect to the image entities corresponding to the nodes, the employed features and the classes to be distinguished. In the land cover layer, the nodes represent super-pixels; in the land use layer, the nodes correspond to objects from a geospatial database. Both CRFs model spatial dependencies between neighbouring image sites. The complex semantic relations between land cover and land use are integrated in the classification process by using contextual features. We propose a new iterative inference procedure for the simultaneous classification of land cover and land use, in which the two classification tasks mutually influence each other. This helps to improve the classification accuracy for certain classes. The main idea of this approach is that semantic context helps to refine the class predictions, which, in turn, leads to more expressive context information. Thus, potentially wrong decisions can be reversed at later stages. The approach is designed for input data based on aerial images. Experiments are carried out on a test site to evaluate the performance of the proposed method. We show the effectiveness of the iterative inference procedure and demonstrate that a smaller size of the super-pixels has a positive influence on the classification result

    Spatial and Topological Analysis of Urban Land Cover Structure in New Orleans Using Multispectral Aerial Image and Lidar Data

    Get PDF
    Urban land use and land cover (LULC) mapping has been one of the major applications in remote sensing of the urban environment. Land cover refers to the biophysical materials at the surface of the earth (i.e. grass, trees, soils, concrete, water), while land use indicates the socio-economic function of the land (i.e., residential, industrial, commercial land uses). This study addresses the technical issue of how to computationally infer urban land use types based on the urban land cover structures from remote sensing data. In this research, a multispectral aerial image and high-resolution LiDAR topographic data have been integrated to investigate the urban land cover and land use in New Orleans, Louisiana. First, the LiDAR data are used to solve the problems associated with solar shadows of trees and buildings, building lean and occlusions in the multispectral aerial image. A two-stage rule-based classification approach has been developed, and the urban land cover of New Orleans has been classified into six categories: water, grass, trees, imperious ground, elevated bridges, and buildings with an overall classification accuracy of 94.2%, significantly higher than that of traditional per-pixel based classification method. The buildings are further classified into regular low-rising, multi-story, mid-rise, high-rise, and skyscrapers in terms of the height. Second, the land cover composition and structure in New Orleans have been quantitatively analyzed for the first time in terms of urban planning districts, and the information and knowledge about the characteristics of urban land cover components and structure for different types of land use functions have been discovered. Third, a graph-theoretic data model, known as relational attribute neighborhood graph (RANG), is adopted to comprehensively represent geometrical and thematic attributes, compositional and structural properties, spatial/topological relations between urban land cover patches (objects). Based on the evaluation of the importance of 26 spatial, thematic and topological variables in RANG, the random forest classification method is utilized to computationally infer and classify the urban land use in New Orleans into 7 types at the urban block level: single-family residential, two-family residential, multi-family residential, commercial, CBD, institutional, parks and open space, with an overall accuracy of 91.7%

    Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions

    Get PDF
    We aim for zero-shot localization and classification of human actions in video. Where traditional approaches rely on global attribute or object classification scores for their zero-shot knowledge transfer, our main contribution is a spatial-aware object embedding. To arrive at spatial awareness, we build our embedding on top of freely available actor and object detectors. Relevance of objects is determined in a word embedding space and further enforced with estimated spatial preferences. Besides local object awareness, we also embed global object awareness into our embedding to maximize actor and object interaction. Finally, we exploit the object positions and sizes in the spatial-aware embedding to demonstrate a new spatio-temporal action retrieval scenario with composite queries. Action localization and classification experiments on four contemporary action video datasets support our proposal. Apart from state-of-the-art results in the zero-shot localization and classification settings, our spatial-aware embedding is even competitive with recent supervised action localization alternatives.Comment: ICC
    • …
    corecore