118,858 research outputs found
SPARCNN: SPAtially Related Convolutional Neural Networks
The ability to accurately detect and classify objects at varying pixel sizes
in cluttered scenes is crucial to many Navy applications. However, detection
performance of existing state-of the-art approaches such as convolutional
neural networks (CNNs) degrade and suffer when applied to such cluttered and
multi-object detection tasks. We conjecture that spatial relationships between
objects in an image could be exploited to significantly improve detection
accuracy, an approach that had not yet been considered by any existing
techniques (to the best of our knowledge) at the time the research was
conducted. We introduce a detection and classification technique called
Spatially Related Detection with Convolutional Neural Networks (SPARCNN) that
learns and exploits a probabilistic representation of inter-object spatial
configurations within images from training sets for more effective region
proposals to use with state-of-the-art CNNs. Our empirical evaluation of
SPARCNN on the VOC 2007 dataset shows that it increases classification accuracy
by 8% when compared to a region proposal technique that does not exploit
spatial relations. More importantly, we obtained a higher performance boost of
18.8% when task difficulty in the test set is increased by including highly
obscured objects and increased image clutter.Comment: 6 pages, AIPR 2016 submissio
An iterative inference procedure applying conditional random fields for simultaneous classification of land cover and land use
Land cover and land use exhibit strong contextual dependencies. We propose a novel approach for the simultaneous classification of land cover and land use, where semantic and spatial context is considered. The image sites for land cover and land use classification form a hierarchy consisting of two layers: a land cover layer and a land use layer. We apply Conditional Random Fields (CRF) at both layers. The layers differ with respect to the image entities corresponding to the nodes, the employed features and the classes to be distinguished. In the land cover layer, the nodes represent super-pixels; in the land use layer, the nodes correspond to objects from a geospatial database. Both CRFs model spatial dependencies between neighbouring image sites. The complex semantic relations between land cover and land use are integrated in the classification process by using contextual features. We propose a new iterative inference procedure for the simultaneous classification of land cover and land use, in which the two classification tasks mutually influence each other. This helps to improve the classification accuracy for certain classes. The main idea of this approach is that semantic context helps to refine the class predictions, which, in turn, leads to more expressive context information. Thus, potentially wrong decisions can be reversed at later stages. The approach is designed for input data based on aerial images. Experiments are carried out on a test site to evaluate the performance of the proposed method. We show the effectiveness of the iterative inference procedure and demonstrate that a smaller size of the super-pixels has a positive influence on the classification result
Spatial and Topological Analysis of Urban Land Cover Structure in New Orleans Using Multispectral Aerial Image and Lidar Data
Urban land use and land cover (LULC) mapping has been one of the major applications in remote sensing of the urban environment. Land cover refers to the biophysical materials at the surface of the earth (i.e. grass, trees, soils, concrete, water), while land use indicates the socio-economic function of the land (i.e., residential, industrial, commercial land uses). This study addresses the technical issue of how to computationally infer urban land use types based on the urban land cover structures from remote sensing data. In this research, a multispectral aerial image and high-resolution LiDAR topographic data have been integrated to investigate the urban land cover and land use in New Orleans, Louisiana. First, the LiDAR data are used to solve the problems associated with solar shadows of trees and buildings, building lean and occlusions in the multispectral aerial image. A two-stage rule-based classification approach has been developed, and the urban land cover of New Orleans has been classified into six categories: water, grass, trees, imperious ground, elevated bridges, and buildings with an overall classification accuracy of 94.2%, significantly higher than that of traditional per-pixel based classification method. The buildings are further classified into regular low-rising, multi-story, mid-rise, high-rise, and skyscrapers in terms of the height. Second, the land cover composition and structure in New Orleans have been quantitatively analyzed for the first time in terms of urban planning districts, and the information and knowledge about the characteristics of urban land cover components and structure for different types of land use functions have been discovered. Third, a graph-theoretic data model, known as relational attribute neighborhood graph (RANG), is adopted to comprehensively represent geometrical and thematic attributes, compositional and structural properties, spatial/topological relations between urban land cover patches (objects). Based on the evaluation of the importance of 26 spatial, thematic and topological variables in RANG, the random forest classification method is utilized to computationally infer and classify the urban land use in New Orleans into 7 types at the urban block level: single-family residential, two-family residential, multi-family residential, commercial, CBD, institutional, parks and open space, with an overall accuracy of 91.7%
Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions
We aim for zero-shot localization and classification of human actions in
video. Where traditional approaches rely on global attribute or object
classification scores for their zero-shot knowledge transfer, our main
contribution is a spatial-aware object embedding. To arrive at spatial
awareness, we build our embedding on top of freely available actor and object
detectors. Relevance of objects is determined in a word embedding space and
further enforced with estimated spatial preferences. Besides local object
awareness, we also embed global object awareness into our embedding to maximize
actor and object interaction. Finally, we exploit the object positions and
sizes in the spatial-aware embedding to demonstrate a new spatio-temporal
action retrieval scenario with composite queries. Action localization and
classification experiments on four contemporary action video datasets support
our proposal. Apart from state-of-the-art results in the zero-shot localization
and classification settings, our spatial-aware embedding is even competitive
with recent supervised action localization alternatives.Comment: ICC
- …