610 research outputs found

    Multimodal learning from visual and remotely sensed data

    Get PDF
    Autonomous vehicles are often deployed to perform exploration and monitoring missions in unseen environments. In such applications, there is often a compromise between the information richness and the acquisition cost of different sensor modalities. Visual data is usually very information-rich, but requires in-situ acquisition with the robot. In contrast, remotely sensed data has a larger range and footprint, and may be available prior to a mission. In order to effectively and efficiently explore and monitor the environment, it is critical to make use of all of the sensory information available to the robot. One important application is the use of an Autonomous Underwater Vehicle (AUV) to survey the ocean floor. AUVs can take high resolution in-situ photographs of the sea floor, which can be used to classify different regions into various habitat classes that summarise the observed physical and biological properties. This is known as benthic habitat mapping. However, since AUVs can only image a tiny fraction of the ocean floor, habitat mapping is usually performed with remotely sensed bathymetry (ocean depth) data, obtained from shipborne multibeam sonar. With the recent surge in unsupervised feature learning and deep learning techniques, a number of previous techniques have investigated the concept of multimodal learning: capturing the relationship between different sensor modalities in order to perform classification and other inference tasks. This thesis proposes related techniques for visual and remotely sensed data, applied to the task of autonomous exploration and monitoring with an AUV. Doing so enables more accurate classification of the benthic environment, and also assists autonomous survey planning. The first contribution of this thesis is to apply unsupervised feature learning techniques to marine data. The proposed techniques are used to extract features from image and bathymetric data separately, and the performance is compared to that with more traditionally used features for each sensor modality. The second contribution is the development of a multimodal learning architecture that captures the relationship between the two modalities. The model is robust to missing modalities, which means it can extract better features for large-scale benthic habitat mapping, where only bathymetry is available. The model is used to perform classification with various combinations of modalities, demonstrating that multimodal learning provides a large performance improvement over the baseline case. The third contribution is an extension of the standard learning architecture using a gated feature learning model, which enables the model to better capture the ‘one-to-many’ relationship between visual and bathymetric data. This opens up further inference capabilities, with the ability to predict visual features from bathymetric data, which allows image-based queries. Such queries are useful for AUV survey planning, especially when supervised labels are unavailable. The final contribution is the novel derivation of a number of information-theoretic measures to aid survey planning. The proposed measures predict the utility of unobserved areas, in terms of the amount of expected additional visual information. As such, they are able to produce utility maps over a large region that can be used by the AUV to determine the most informative locations from a set of candidate missions. The models proposed in this thesis are validated through extensive experiments on real marine data. Furthermore, the introduced techniques have applications in various other areas within robotics. As such, this thesis concludes with a discussion on the broader implications of these contributions, and the future research directions that arise as a result of this work

    Spatial and Topological Analysis of Urban Land Cover Structure in New Orleans Using Multispectral Aerial Image and Lidar Data

    Get PDF
    Urban land use and land cover (LULC) mapping has been one of the major applications in remote sensing of the urban environment. Land cover refers to the biophysical materials at the surface of the earth (i.e. grass, trees, soils, concrete, water), while land use indicates the socio-economic function of the land (i.e., residential, industrial, commercial land uses). This study addresses the technical issue of how to computationally infer urban land use types based on the urban land cover structures from remote sensing data. In this research, a multispectral aerial image and high-resolution LiDAR topographic data have been integrated to investigate the urban land cover and land use in New Orleans, Louisiana. First, the LiDAR data are used to solve the problems associated with solar shadows of trees and buildings, building lean and occlusions in the multispectral aerial image. A two-stage rule-based classification approach has been developed, and the urban land cover of New Orleans has been classified into six categories: water, grass, trees, imperious ground, elevated bridges, and buildings with an overall classification accuracy of 94.2%, significantly higher than that of traditional per-pixel based classification method. The buildings are further classified into regular low-rising, multi-story, mid-rise, high-rise, and skyscrapers in terms of the height. Second, the land cover composition and structure in New Orleans have been quantitatively analyzed for the first time in terms of urban planning districts, and the information and knowledge about the characteristics of urban land cover components and structure for different types of land use functions have been discovered. Third, a graph-theoretic data model, known as relational attribute neighborhood graph (RANG), is adopted to comprehensively represent geometrical and thematic attributes, compositional and structural properties, spatial/topological relations between urban land cover patches (objects). Based on the evaluation of the importance of 26 spatial, thematic and topological variables in RANG, the random forest classification method is utilized to computationally infer and classify the urban land use in New Orleans into 7 types at the urban block level: single-family residential, two-family residential, multi-family residential, commercial, CBD, institutional, parks and open space, with an overall accuracy of 91.7%

    Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches

    Get PDF
    Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.Comment: This work has been accepted for publication in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensin

    A Graph-based Approach for Higher Order Gis Topological Analysis

    Get PDF
    Retrieving structured information from an initial random collection of objects may be carried out by understanding the spatial arrangement between them, assuming no prior knowledge about those objects. As far as topology is concerned, contemporary desktop GIS packages do not generally support further analysis beyond adjacency. Thus, one of the original motivations of this work was to develop new ideas for scene analysis by building up a graph-based technique for better interpretation and understanding of spatial relationships between GIS vector-based objects beyond its first level of adjacency; the final aim is the performance of some kind of local feature organization into a more meaningful global scene by using graph theory. As the example scenario, a LiDAR data set is being used to test the technique that we plan to develop and implement. After the generation of the respective TIN, two different binary classifications were applied to the TIN facets (based on two different slope thresholds) and TIN facets have been aggregated into homogeneous polygons according to their slope characteristics. A graph-based clustering procedure inside these polygonal regions, by establishing a neighbourhood graph, followed by the delineation of cluster shapes and the derivation of cluster characteristics in order to obtain higher level geographic entities information (regarding sets of buildings, vegetation areas, and say, land-use parcels) is object of further work. The results we are expecting to obtain might be useful to support land-use mapping, image understanding or, generally speaking, to support clustering analysis and generalization processes

    Graph theory in higher order topological analysis of urban scenes

    Get PDF
    Interpretation and analysis of spatial phenomena is a highly time-consuming and laborious task in several fields of the Geomatics world. That is why the automation of these tasks is especially needed in areas such as GISc. Carrying out those tasks in the context of an urban scene is particularly challenging given the complex spatial pattern of its elements. The aim of retrieving structured information from an initial unstructured data set translated into more meaningful homogeneous regions can be achieved by identifying meaningful structures within the initial collection of objects, and by understanding their topological relationships and spatial arrangement. This task is being accomplished by applying graph theory and by performing urban scene topology analysis. For this purpose, a graph-based system is being developed, and LiDAR data are currently being used as an example scenario. A particular emphasis is being given to the visualisation aspects of graph analysis, as visual inspections can often reveal patterns not discernable by current automated analysis techniques. This paper focuses primarily on the role of graph theory in the design of such a tool for the analysis of urban scene topology.http://www.sciencedirect.com/science/article/B6V9K-4P6MPBP-2/1/e1b4066db2881db3de31085d779a27c

    Urban Image Classification: Per-Pixel Classifiers, Sub-Pixel Analysis, Object-Based Image Analysis, and Geospatial Methods

    Get PDF
    Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification steps. Within this chapter, each of the four approaches is described in terms of scale and accuracy classifying urban land use and urban land cover; and for its range of urban applications. We demonstrate the overview of four main classification groups in Figure 1 while Table 1 details the approaches with respect to classification requirements and procedures (e.g., reflectance conversion, steps before training sample selection, training samples, spatial approaches commonly used, classifiers, primary inputs for classification, output structures, number of output layers, and accuracy assessment). The chapter concludes with a brief summary of the methods reviewed and the challenges that remain in developing new classification methods for improving the efficiency and accuracy of mapping urban areas

    An object-based convolutional neural network (OCNN) for urban land use classification

    Get PDF
    Urban land use information is essential for a variety of urban-related applications such as urban planning and regional administration. The extraction of urban land use from very fine spatial resolution (VFSR) remotely sensed imagery has, therefore, drawn much attention in the remote sensing community. Nevertheless, classifying urban land use from VFSR images remains a challenging task, due to the extreme difficulties in differentiating complex spatial patterns to derive high-level semantic labels. Deep convolutional neural networks (CNNs) offer great potential to extract high-level spatial features, thanks to its hierarchical nature with multiple levels of abstraction. However, blurred object boundaries and geometric distortion, as well as huge computational redundancy, severely restrict the potential application of CNN for the classification of urban land use. In this paper, a novel object-based convolutional neural network (OCNN) is proposed for urban land use classification using VFSR images. Rather than pixel-wise convolutional processes, the OCNN relies on segmented objects as its functional units, and CNN networks are used to analyse and label objects such as to partition within-object and between-object variation. Two CNN networks with different model structures and window sizes are developed to predict linearly shaped objects (e.g. Highway, Canal) and general (other non-linearly shaped) objects. Then a rule-based decision fusion is performed to integrate the class-specific classification results. The effectiveness of the proposed OCNN method was tested on aerial photography of two large urban scenes in Southampton and Manchester in Great Britain. The OCNN combined with large and small window sizes achieved excellent classification accuracy and computational efficiency, consistently outperforming its sub-modules, as well as other benchmark comparators, including the pixel-wise CNN, contextual-based MRF and object-based OBIA-SVM methods. The proposed method provides the first object-based CNN framework to effectively and efficiently address the complicated problem of urban land use classification from VFSR images

    Deep learning for land cover and land use classification

    Get PDF
    Recent advances in sensor technologies have witnessed a vast amount of very fine spatial resolution (VFSR) remotely sensed imagery being collected on a daily basis. These VFSR images present fine spatial details that are spectrally and spatially complicated, thus posing huge challenges in automatic land cover (LC) and land use (LU) classification. Deep learning reignited the pursuit of artificial intelligence towards a general purpose machine to be able to perform any human-related tasks in an automated fashion. This is largely driven by the wave of excitement in deep machine learning to model the high-level abstractions through hierarchical feature representations without human-designed features or rules, which demonstrates great potential in identifying and characterising LC and LU patterns from VFSR imagery. In this thesis, a set of novel deep learning methods are developed for LC and LU image classification based on the deep convolutional neural networks (CNN) as an example. Several difficulties, however, are encountered when trying to apply the standard pixel-wise CNN for LC and LU classification using VFSR images, including geometric distortions, boundary uncertainties and huge computational redundancy. These technical challenges for LC classification were solved either using rule-based decision fusion or through uncertainty modelling using rough set theory. For land use, an object-based CNN method was proposed, in which each segmented object (a group of homogeneous pixels) was sampled and predicted by CNN with both within-object and between-object information. LU was, thus, classified with high accuracy and efficiency. Both LC and LU formulate a hierarchical ontology at the same geographical space, and such representations are modelled by their joint distribution, in which LC and LU are classified simultaneously through iteration. These developed deep learning techniques achieved by far the highest classification accuracy for both LC and LU, up to around 90% accuracy, about 5% higher than the existing deep learning methods, and 10% greater than traditional pixel-based and object-based approaches. This research made a significant contribution in LC and LU classification through deep learning based innovations, and has great potential utility in a wide range of geospatial applications

    Dwelling on ontology - semantic reasoning over topographic maps

    Get PDF
    The thesis builds upon the hypothesis that the spatial arrangement of topographic features, such as buildings, roads and other land cover parcels, indicates how land is used. The aim is to make this kind of high-level semantic information explicit within topographic data. There is an increasing need to share and use data for a wider range of purposes, and to make data more definitive, intelligent and accessible. Unfortunately, we still encounter a gap between low-level data representations and high-level concepts that typify human qualitative spatial reasoning. The thesis adopts an ontological approach to bridge this gap and to derive functional information by using standard reasoning mechanisms offered by logic-based knowledge representation formalisms. It formulates a framework for the processes involved in interpreting land use information from topographic maps. Land use is a high-level abstract concept, but it is also an observable fact intimately tied to geography. By decomposing this relationship, the thesis correlates a one-to-one mapping between high-level conceptualisations established from human knowledge and real world entities represented in the data. Based on a middle-out approach, it develops a conceptual model that incrementally links different levels of detail, and thereby derives coarser, more meaningful descriptions from more detailed ones. The thesis verifies its proposed ideas by implementing an ontology describing the land use ‘residential area’ in the ontology editor Protégé. By asserting knowledge about high-level concepts such as types of dwellings, urban blocks and residential districts as well as individuals that link directly to topographic features stored in the database, the reasoner successfully infers instances of the defined classes. Despite current technological limitations, ontologies are a promising way forward in the manner we handle and integrate geographic data, especially with respect to how humans conceptualise geographic space