63 research outputs found

    Kernel Feature Extraction Methods for Remote Sensing Data Analysis

    Get PDF
    Technological advances in the last decades have improved our capabilities of collecting and storing high data volumes. However, this makes that in some fields, such as remote sensing several problems are generated in the data processing due to the peculiar characteristics of their data. High data volume, high dimensionality, heterogeneity and their nonlinearity, make that the analysis and extraction of relevant information from these images could be a bottleneck for many real applications. The research applying image processing and machine learning techniques along with feature extraction, allows the reduction of the data dimensionality while keeps the maximum information. Therefore, developments and applications of feature extraction methodologies using these techniques have increased exponentially in remote sensing. This improves the data visualization and the knowledge discovery. Several feature extraction methods have been addressed in the literature depending on the data availability, which can be classified in supervised, semisupervised and unsupervised. In particular, feature extraction can use in combination with kernel methods (nonlinear). The process for obtaining a space that keeps greater information content is facilitated by this combination. One of the most important properties of the combination is that can be directly used for general tasks including classification, regression, clustering, ranking, compression, or data visualization. In this Thesis, we address the problems of different nonlinear feature extraction approaches based on kernel methods for remote sensing data analysis. Several improvements to the current feature extraction methods are proposed to transform the data in order to make high dimensional data tasks easier, such as classification or biophysical parameter estimation. This Thesis focus on three main objectives to reach these improvements in the current feature extraction methods: The first objective is to include invariances into supervised kernel feature extraction methods. Throughout these invariances it is possible to generate virtual samples that help to mitigate the problem of the reduced number of samples in supervised methods. The proposed algorithm is a simple method that essentially generates new (synthetic) training samples from available labeled samples. These samples along with original samples should be used in feature extraction methods obtaining more independent features between them that without virtual samples. The introduction of prior knowledge by means of the virtual samples could obtain classification and biophysical parameter estimation methods more robust than without them. The second objective is to use the generative kernels, i.e. probabilistic kernels, that directly learn by means of clustering techniques from original data by finding local-to-global similarities along the manifold. The proposed kernel is useful for general feature extraction purposes. Furthermore, the kernel attempts to improve the current methods because the kernel not only contains labeled data information but also uses the unlabeled information of the manifold. Moreover, the proposed kernel is parameter free in contrast with the parameterized functions such as, the radial basis function (RBF). Using probabilistic kernels is sought to obtain new unsupervised and semisupervised methods in order to reduce the number and cost of labeled data in remote sensing. Third objective is to develop new kernel feature extraction methods for improving the features obtained by the current methods. Optimizing the functional could obtain improvements in new algorithm. For instance, the Optimized Kernel Entropy Component Analysis (OKECA) method. The method is based on the Independent Component Analysis (ICA) framework resulting more efficient than the standard Kernel Entropy Component Analysis (KECA) method in terms of dimensionality reduction. In this Thesis, the methods are focused on remote sensing data analysis. Nevertheless, feature extraction methods are used to analyze data of several research fields whereas data are multidimensional. For these reasons, the results are illustrated into experimental sequence. First, the projections are analyzed by means of Toy examples. The algorithms are tested through standard databases with supervised information to proceed to the last step, the analysis of remote sensing images by the proposed methods

    Multitask Learning of Vegetation Biochemistry from Hyperspectral Data

    Get PDF
    Statistical models have been successful in accurately estimating the biochemical contents of vegetation from the reflectance spectra. However, their performance deteriorates when there is a scarcity of sizable amount of ground truth data for modeling the complex non-linear relationship occurring between the spectrum and the biochemical quantity. We propose a novel Gaussian process based multitask learning method for improving the prediction of a biochemical through the transfer of knowledge from the learned models for predicting related biochemicals. This method is most advantageous when there are few ground truth data for the biochemical of interest, but plenty of ground truth data for related biochemicals. The proposed multitask Gaussian process hypothesizes that the inter-relationship between the biochemical quantities is better modeled by using a combination of two or more covariance functions and inter-task correlation matrices. In the experiments, our method outperformed the current methods on two real-world datasets

    Hyperspectral Remote Sensing Data Analysis and Future Challenges

    Full text link

    Advances in Hyperspectral Image Classification Methods for Vegetation and Agricultural Cropland Studies

    Get PDF
    Hyperspectral data are becoming more widely available via sensors on airborne and unmanned aerial vehicle (UAV) platforms, as well as proximal platforms. While space-based hyperspectral data continue to be limited in availability, multiple spaceborne Earth-observing missions on traditional platforms are scheduled for launch, and companies are experimenting with small satellites for constellations to observe the Earth, as well as for planetary missions. Land cover mapping via classification is one of the most important applications of hyperspectral remote sensing and will increase in significance as time series of imagery are more readily available. However, while the narrow bands of hyperspectral data provide new opportunities for chemistry-based modeling and mapping, challenges remain. Hyperspectral data are high dimensional, and many bands are highly correlated or irrelevant for a given classification problem. For supervised classification methods, the quantity of training data is typically limited relative to the dimension of the input space. The resulting Hughes phenomenon, often referred to as the curse of dimensionality, increases potential for unstable parameter estimates, overfitting, and poor generalization of classifiers. This is particularly problematic for parametric approaches such as Gaussian maximum likelihoodbased classifiers that have been the backbone of pixel-based multispectral classification methods. This issue has motivated investigation of alternatives, including regularization of the class covariance matrices, ensembles of weak classifiers, development of feature selection and extraction methods, adoption of nonparametric classifiers, and exploration of methods to exploit unlabeled samples via semi-supervised and active learning. Data sets are also quite large, motivating computationally efficient algorithms and implementations. This chapter provides an overview of the recent advances in classification methods for mapping vegetation using hyperspectral data. Three data sets that are used in the hyperspectral classification literature (e.g., Botswana Hyperion satellite data and AVIRIS airborne data over both Kennedy Space Center and Indian Pines) are described in Section 3.2 and used to illustrate methods described in the chapter. An additional high-resolution hyperspectral data set acquired by a SpecTIR sensor on an airborne platform over the Indian Pines area is included to exemplify the use of new deep learning approaches, and a multiplatform example of airborne hyperspectral data is provided to demonstrate transfer learning in hyperspectral image classification. Classical approaches for supervised and unsupervised feature selection and extraction are reviewed in Section 3.3. In particular, nonlinearities exhibited in hyperspectral imagery have motivated development of nonlinear feature extraction methods in manifold learning, which are outlined in Section 3.3.1.4. Spatial context is also important in classification of both natural vegetation with complex textural patterns and large agricultural fields with significant local variability within fields. Approaches to exploit spatial features at both the pixel level (e.g., co-occurrencebased texture and extended morphological attribute profiles [EMAPs]) and integration of segmentation approaches (e.g., HSeg) are discussed in this context in Section 3.3.2. Recently, classification methods that leverage nonparametric methods originating in the machine learning community have grown in popularity. An overview of both widely used and newly emerging approaches, including support vector machines (SVMs), Gaussian mixture models, and deep learning based on convolutional neural networks is provided in Section 3.4. Strategies to exploit unlabeled samples, including active learning and metric learning, which combine feature extraction and augmentation of the pool of training samples in an active learning framework, are outlined in Section 3.5. Integration of image segmentation with classification to accommodate spatial coherence typically observed in vegetation is also explored, including as an integrated active learning system. Exploitation of multisensor strategies for augmenting the pool of training samples is investigated via a transfer learning framework in Section 3.5.1.2. Finally, we look to the future, considering opportunities soon to be provided by new paradigms, as hyperspectral sensing is becoming common at multiple scales from ground-based and airborne autonomous vehicles to manned aircraft and space-based platforms

    Machine Learning for Robust Understanding of Scene Materials in Hyperspectral Images

    Get PDF
    The major challenges in hyperspectral (HS) imaging and data analysis are expensive sensors, high dimensionality of the signal, limited ground truth, and spectral variability. This dissertation develops and analyzes machine learning based methods to address these problems. In the first part, we examine one of the most important HS data analysis tasks-vegetation parameter estimation. We present two Gaussian processes based approaches for improving the accuracy of vegetation parameter retrieval when ground truth is limited and/or spectral variability is high. The first is the adoption of covariance functions based on well-established metrics, such as, spectral angle and spectral correlation, which are known to be better measures of similarity for spectral data. The second is the joint modeling of related vegetation parameters by multitask Gaussian processes so that the prediction accuracy of the vegetation parameter of interest can be improved with the aid of related vegetation parameters for which a larger set of ground truth is available. The efficacy of the proposed methods is demonstrated by comparing them against state-of-the art approaches on three real-world HS datasets and one synthetic dataset. In the second part, we demonstrate how Bayesian optimization can be applied to jointly tune the different components of hyperspectral data analysis frameworks for better performance. Experimental validation on the spatial-spectral classification framework consisting of a classifier and a Markov random field is provided. In the third part, we investigate whether high dimensional HS spectra can be reconstructed from low dimensional multispectral (MS) signals, that can be obtained from much cheaper, lower spectral resolution sensors. A novel end-to-end convolutional residual neural network architecture is proposed that can simultaneously optimize both the MS bands and the transformation to reconstruct HS spectra from MS signals by analyzing a large quantity of HS data. The learned band can be implemented in sensor hardware and the learned transformation can be incorporated in the data processing pipeline to build a low-cost hyperspectral data collection system. Using a diverse set of real-world datasets, we show how the proposed approach of optimizing MS bands along with the transformation rather than just optimizing the transformation with fixed bands, as proposed by previous studies, can drastically increase the reconstruction accuracy. Additionally, we also investigate the prospects of using reconstructed HS spectra for land cover classification

    Graph Embedding via High Dimensional Model Representation for Hyperspectral Images

    Full text link
    Learning the manifold structure of remote sensing images is of paramount relevance for modeling and understanding processes, as well as to encapsulate the high dimensionality in a reduced set of informative features for subsequent classification, regression, or unmixing. Manifold learning methods have shown excellent performance to deal with hyperspectral image (HSI) analysis but, unless specifically designed, they cannot provide an explicit embedding map readily applicable to out-of-sample data. A common assumption to deal with the problem is that the transformation between the high-dimensional input space and the (typically low) latent space is linear. This is a particularly strong assumption, especially when dealing with hyperspectral images due to the well-known nonlinear nature of the data. To address this problem, a manifold learning method based on High Dimensional Model Representation (HDMR) is proposed, which enables to present a nonlinear embedding function to project out-of-sample samples into the latent space. The proposed method is compared to manifold learning methods along with its linear counterparts and achieves promising performance in terms of classification accuracy of a representative set of hyperspectral images.Comment: This is an accepted version of work to be published in the IEEE Transactions on Geoscience and Remote Sensing. 11 page

    The EnMAP Managed Vegetation Scientific Processor

    Get PDF
    Nach jahrelanger wissenschaftlicher und technischer Vorbereitungszeit wird voraussichtlich Ende des Jahres 2020 der Start der orbitalen Phase einer unbemannten deutschen Weltraum-Mission initiiert. Das Environmental Mapping and Analysis Program (EnMAP) wird an Bord des gleichnamigen Satelliten einen hyperspektralen Sensor zur Erfassung terrestrischer Oberflächen tragen. In den Umweltdisziplinen zur Erforschung von Ökosystemen, landwirtschaftlicher, forstwirtschaftlicher und urbaner Flächen, im Bereich der Küsten- und Inlandsgewässer sowie der Geologie und Bodenkunde bereitete man sich im Vorfeld des Starts auf die kommenden Daten vor. Zwar existiert bereits eine Vielzahl an Algorithmen zur wissenschaftlichen Analyse von spektralen Daten, allerdings ergeben sich auch neue Herausforderungen, da die EnMAP-Mission bislang im weltweiten Kontext der Fernerkundung einzigartig ist. Die Abdeckung des vollen optischen Spektrums (420 nm – 2450 nm) in Verbindung mit einer moderaten räumlichen Auflösung von 30 m und einem hohen Signal-Rausch-Verhältnis von mindestens 180 im kurzwelligen Infrarot und über 400 im sichtbaren Spektrum, ermöglichen eine Aufnahmequalität, die bislang nur von flugzeuggestützten Systemen erreicht werden konnte. Die Bemühungen in dieser Dissertation umfassen Aktivitäten in der wissenschaftlichen Vorbereitungsphase zu agrargeographischen Fragestellungen. Algorithmen und Tools zur Analyse der hyperspektralen Daten werden kostenlos im QGIS-Plugin EnMAP-Box 3 zur Verfügung gestellt. Die drängenden Fragen im Agrarsektor drehen sich hierbei um die Ableitung biochemischer und biophysikalischer Parameter aus Fernerkundungsdaten, weshalb die übergeordnete Problemstellung des Promotionsvorhabens die Entwicklung eines wissenschaftsbasierten EnMAP-Tools für bewirtschaftete Vegetationsflächen (EnMAP Managed Vegetation Scientific Processor) darstellt. Zu Beginn wurde eine umfassende Feldkampagne geplant, welche ab April 2014 umgesetzt wurde. Neben der spektralen Erfassung von Blatt-, Bestands- und Bodensignaturen in einem Winterweizen- und einem Maisfeld erfolgte auch die Messung wesentlicher Pflanzenparameter an den exakt gleichen Positionen. Hierzu zählt die non-destruktive Ableitung des Blattflächenindex (LAI), des Blattchlorophyllgehalts (Ccab), des Blattwassergehalts (EWT oder Cw), des relativen Blatttrockengewichts (LMA oder Cm), des mittleren Blattneigungswinkels im Bestand (ALIA) sowie weiterer sekundärer Parameter wie Wuchshöhe, das phänologisches Stadium und der Sonnenvektor. Um die Fähigkeit des späteren EnMAP-Satelliten sich um bis zu 30° orthogonal zur Flugrichtung zu kippen nachzustellen, wurden die spektralen Aufnahmen aus verschiedenen Betrachtungswinkeln erstellt, die dieser Aufnahme-Geometrien nachempfunden sind. Ein gängiges Verfahren zur Ableitung der relevanten Pflanzenparameter ist die Verwendung des Strahlungstransfermodells PROSAIL, welches das spektrale Signal einer Vegetationsfläche auf Basis der zugrundeliegenden biophysikalischen und biochemischen Parameter simuliert. Bei der Umkehr dieses Prozesses können ebendiese Variablen von gemessenen spektralen Daten abgeleitet werden. Hierzu wurde eine Datenbank (Look-Up-Table, LUT) aus PROSAIL-Modellläufen aufgebaut und die in den Feldkampagnen gemessenen Spektren mit dieser abgeglichen. Mit dieser Methode der LUT-Invertierung aus unterschiedlichen Aufnahmewinkeln konnten Genauigkeiten bei der LAI-Schätzung von 18 % und bei Blattchlorophyll von 20 % erzielt werden. Eine starke Anisotropie, also eine Reflexionsabhängigkeit von der Beleuchtungs- und Aufnahmerichtung, wurde bei Winterweizen vor allem für frühe Entwicklungsstadien festgestellt. Bei einer anschließenden Studie zur Unsicherheitsanalyse des Spektralmodells wurden PROSAIL-Ergebnisse, bei denen real gemessene Pflanzenparameter als Input dienten, den zugehörigen Reflektanzspektren gegenübergestellt. Es zeigten sich hierbei mitunter starke Abweichungen zwischen gemessenen und modellierten Spektren, die im Falle des Winterweizens einen saisonalen Verlauf zeichneten. Vor allem während frühen Wachstumsstadien tendierte das Modell dazu die Reflektanz im nahen Infrarot zu überschätzen, während es gegen Ende der Wachstumsperiode eher eine Unterschätzung aufwies. Als Unsicherheitsfaktor wurde die Parametrisierung des Modells ausgemacht, wenn der ALIA-Parameter als echter physikalische Blattwinkel interpretiert wird. Es wurde geschlussfolgert, dass eine Separierung von LAI und ALIA bei der Invertierung von PROSAIL eine korrekte Abschätzung der weniger sensitiven Parameter behindert. Die Erstellung des Vegetations-Prozessors erforderte die Verwendung von Regressions-Algorithmen des maschinellen Lernens (MLRA), da eine Verteilung von großen LUTs an die User nicht praktikabel wäre. Die MLRAs wurden an synthetischen Datensätzen trainiert, wobei zunächst die Optimierung der Hyperparameter im Vordergrund stand, bevor die Anwendung an echten Spektraldaten unternommen wurde. Es konnten dabei erst aussagekräftige Ergebnisse produziert werden, als die Trainingsdaten mit einem künstlichen Rauschen belegt wurden, da die Algorithmen unter einer Überanpassung an die Modellumgebung litten. Mithilfe des Prozessors konnten schließlich LAI, ALIA, Ccab und Cw aus hyperspektralen Daten abgeleitet werden. Künstliche neuronale Netze dienen dabei als Blackbox-Modelle, die in kurzer Zeit große Datenmengen verarbeiten können und somit einen entscheidenden Beitrag zur modernen angewandten Fernerkundung für eine breite User-Community leisten.After years of scientific and technical preparation, the launch of an unmanned German space-mission is planned to be initiated in 2020. The Environmental Mapping and Analysis Program (EnMAP) is going to provide an equally named hyperspectral imager to map land surfaces. Scientists of environmental disciplines of monitoring of ecosystems, agricultural, forestry and urban areas as well as coastal and inland waters, geology and soils prepared themselves for the upcoming data prior to the actual launch. Although there already exists a variety of useful algorithms for a profound analysis of spectral data, new challenges will arise given the uniqueness of the EnMAP-mission in the global context of remote sensing; i.e. coverage of the full range of the optical spectrum (420 nm – 2450 nm) in combination with a moderate spatial resolution of 30 m and a high signal-to-noise ratio of at least 180 in the shortwave infrared and above 400 in the visible spectrum. This enables an imaging quality which to this date has only been reached by airborne systems. The efforts of this dissertation comprise activities in the scientific preparation phase for agro-geographical tasks. Algorithms and tools for an analysis of hyperspectral data are being provided for free in the QGIS-plugin EnMAP-Box 3. Urgent questions in the agricultural sector revolve around the derivation of biochemical and biophysical parameters from remote sensing data. For this reason, the overarching objective of this promotion is the development of a scientific EnMAP-tool for managed areas of vegetation (EnMAP Managed Vegetation Scientific Processor). At first, an extensive field campaign was planned and then started in April, 2014. Apart from spectral observations of leaves, canopies and soils in a winter wheat and a maize field, also relevant plant parameters were acquired at the exact same spots. Namely, they are the Leaf Area Index (LAI), leaf chlorophyll content (Ccab), leaf water content (EWT or Cw), relative dry leaf weight (LMA or Cm), Average Leaf Inclination Angle (ALIA) as well as other secondary parameters like canopy height, phenological stage and the solar vector. Spectral measurements were captured from different observation angles to match ground data with the sensing geometry of the future EnMAP-satellite, which can be tilted up to 30° orthogonal to its direction of flight. A common procedure to derive relevant crop parameters is to make use of the radiative transfer model PROSAIL, which simulates the spectral signal of a vegetated surface based on biophysical and biochemical input parameters. If this process is reverted, said parameters can be derived from measured spectral data. To do so, a Look-Up-Table (LUT) is built containing model runs of PROSAIL and then subsequently compared against spectra from the field campaigns. With this approach of LUT-inversions from different observation angles, an accuracy of 18 % could be achieved for LAI and 20 % for Ccab. Strong anisotropic effects, i.e. dependence on illumination geometry and sensor orientation, were identified for winter wheat mainly in the early stages of plant development. In a consecutive study about uncertainties of the spectral model, PROSAIL results fed with in situ measured crop parameters as input, were opposed to their associated reflectance signatures. A strong deviation between measured and modelled spectra was observed, which – in the case of winter wheat – showed a seasonal behavior. The model tended to overestimate reflectances in the near infrared for early phenological stages and to underestimate them at end of the growing period. The parametrization of the model was identified as an uncertainty factor if the ALIA parameter is interpreted as true physical leaf inclinations. It was concluded that a separation of LAI and ALIA at inversion of PROSAIL prevents an adequate estimation of the less sensitive parameters. The development of the vegetation processor required the use of Machine Learning Regression Algorithms (MLRA), since distribution of large LUTs to the user would be impracticable. The MLRAs were trained with synthetic datasets with primary importance to optimize their hyperparameters, before attempting to apply the algorithms to real spectral data. Significant results could not be obtained until training data were altered with artificial noise, because algorithms suffered from overfitting to the model environment. Executing the processor allowed to derive LAI, ALIA, Ccab and Cw from hyperspectral data. Artificial neural networks served as black box models, which digest great amount of data in a short period of time and thus make a decisive contribution to modern applied remote sensing with relevance for a broad user-community
    • …
    corecore