293 research outputs found
Learning the Structure of High-Dimensional Manifolds with Self-Organizing Maps for Accurate Information Extraction
This paper was submitted by the author prior to final official version. For official version please see http://hdl.handle.net/1911/70515This work aims to improve the capability of accurate information extraction from high-dimensional data, with a specific neural learning paradigm, the Self-Organizing Map (SOM). The SOM is an unsupervised learning algorithm that can faithfully sense the manifold structure and support supervised learning of relevant information from the data. Yet open problems regarding SOM learning exist. We focus on the following two issues. 1. Evaluation of topology preservation. Topology preservation is essential for SOMs in faithful representation of manifold structure. However, in reality, topology violations are not unusual, especially when the data have complicated structure. Measures capable of accurately quantifying and informatively expressing topology violations are lacking. One contribution of this work is a new measure, the Weighted Differential Topographic Function (WDTF), which differentiates an existing measure, the Topographic Function (TF), and incorporates detailed data distribution as an importance weighting of violations to distinguish severe violations from insignificant ones. Another contribution is an interactive visual tool, TopoView, which facilitates the visual inspection of violations on the SOM lattice. We show the effectiveness of the combined use of the WDTF and TopoView through a simple two-dimensional data set and two hyperspectral images. 2. Learning multiple latent variables from high-dimensional data. We use an existing two-layer SOM-hybrid supervised architecture, which captures the manifold structure in its SOM hidden layer, and then, uses its output layer to perform the supervised learning of latent variables. In the customary way, the output layer only uses the strongest output of the SOM neurons. This severely limits the learning capability. We allow multiple, k, strongest responses of the SOM neurons for the supervised learning. Moreover, the fact that different latent variables can be best learned with different values of k motivates a new neural architecture, the Conjoined Twins, which extends the existing architecture with additional copies of the output layer, for preferential use of different values of k in the learning of different latent variables. We also automate the customization of k for different variables with the statistics derived from the SOM. The Conjoined Twins shows
its effectiveness in the inference of two physical parameters from Near-Infrared spectra of planetary ices
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Machine learning approaches to star-galaxy classification
Accurate star-galaxy classification has many important applications in modern precision cosmology. However, a vast number of faint sources that are detected in the current and next-generation ground-based surveys may be challenged by poor star-galaxy classification. Thus, we explore a variety of machine learning approaches to improve star-galaxy classification in ground-based photometric surveys. In Chapter 2, we present a meta-classification framework that combines existing star-galaxy classifiers, and demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method. In Chapter 3, we show that a deep learning algorithm called convolutional neural networks is able to produce accurate and well-calibrated classifications by learning directly from the pixel values of photometric images. In Chapter 4, we study another deep learning technique called generative adversarial networks in a semi-supervised setting, and demonstrate that our semi-supervised method produces competitive classifications using only a small amount of labeled examples
Hydrocarbon quantification using neural networks and deep learning based hyperspectral unmixing
Hydrocarbon (HC) spills are a global issue, which can seriously impact human life and the environment, therefore early identification and remedial measures taken at an early stage are important. Thus, current research efforts aim at remotely quantifying incipient quantities of HC mixed with soils. The increased spectral and spatial resolution of hyperspectral sensors has opened ground-breaking perspectives in many industries including remote inspection of large areas and the environment. The use of subpixel detection algorithms, and in particular the use of the mixture models, has been identified as a future advance that needs to be incorporated in remote sensing. However, there are some challenging tasks since the spectral signatures of the targets of interest may not be immediately available. Moreover, real time processing and analysis is required to support fast decision-making. Progressing in this direction, this thesis pioneers and researches novel methodologies for HC quantification capable of exceeding the limitations of existing systems in terms of reduced cost and processing time with improved accuracy. Therefore the goal of this research is to develop, implement and test different methods for improving HC detection and quantification using spectral unmixing and machine learning. An efficient hybrid switch method employing neural networks and hyperspectral is proposed and investigated. This robust method switches between state of the art hyperspectral unmixing linear and nonlinear models, respectively. This procedure is well suited for the quantification of small quantities of substances within a pixel with high accuracy as the most appropriate model is employed. Central to the proposed approach is a novel method for extracting parameters to characterise the non-linearity of the data. These parameters are fed into a feedforward neural network which decides in a pixel by pixel fashion which model is more suitable. The quantification process is fully automated by applying further classification techniques to the acquired hyperspectral images. A deep learning neural network model is designed for the quantification of HC quantities mixed with soils. A three-term backpropagation algorithm with dropout is proposed to avoid overfitting and reduce the computational complexity of the model.
The above methods have been evaluated using classical repository datasets from the literature and a laboratory controlled dataset. For that, an experimental procedure has been designed to produce a labelled dataset. The data was obtained by mixing and homogenizing different soil types with HC substances, respectively and measuring the reflectance with a hyperspectral sensor.
Findings from the research study reveal that the two proposed models have high performance, they are suitable for the detection and quantification of HC mixed with soils, and surpass existing methods. Improvements in sensitivity, accuracy, computational time are achieved. Thus, the proposed approaches can be used to detect HC spills at an early stage in order to mitigate significant pollution from the spill areas
Hybrid spectral unmixing : using artificial neural networks for linear/non-linear switching
Spectral unmixing is a key process in identifying spectral signature of materials and quantifying their spatial distribution over an image. The linear model is expected to provide acceptable results when two assumptions are satisfied: (1) The mixing process should occur at macroscopic level and (2) Photons must interact with single material before reaching the sensor. However, these assumptions do not always hold and more complex nonlinear models are required. This study proposes a new hybrid method for switching between linear and nonlinear spectral unmixing of hyperspectral data based on artificial neural networks. The neural networks was trained with parameters within a window of the pixel under consideration. These parameters are computed to represent the diversity of the neighboring pixels and are based on the Spectral Angular Distance, Covariance and a non linearity parameter. The endmembers were extracted using Vertex Component Analysis while the abundances were estimated using the method identified by the neural networks (Vertex Component Analysis, Fully Constraint Least Square Method, Polynomial Post Nonlinear Mixing Model or Generalized Bilinear Model). Results show that the hybrid method performs better than each of the individual techniques with high overall accuracy, while the abundance estimation error is significantly lower than that obtained using the individual methods. Experiments on both synthetic dataset and real hyperspectral images demonstrated that the proposed hybrid switch method is efficient for solving spectral unmixing of hyperspectral images as compared to individual algorithms
Adaptive Similarity Measures for Material Identification in Hyperspectral Imagery
Remotely-sensed hyperspectral imagery has become one the most advanced tools for analyzing the processes that shape the Earth and other planets. Effective, rapid analysis of high-volume, high-dimensional hyperspectral image data sets demands efficient, automated techniques to identify signatures of known materials in such imagery. In this thesis, we develop a framework for automatic material identification in hyperspectral imagery using adaptive similarity measures. We frame the material identification problem as a multiclass similarity-based classification problem, where our goal is to predict material labels for unlabeled target spectra based upon their similarities to source spectra with known material labels. As differences in capture conditions affect the spectral representations of materials, we divide the material identification problem into intra-domain (i.e., source and target spectra captured under identical conditions) and inter-domain (i.e., source and target spectra captured under different conditions) settings.
The first component of this thesis develops adaptive similarity measures for intra-domain settings that measure the relevance of spectral features to the given classification task using small amounts of labeled data. We propose a technique based on multiclass Linear Discriminant Analysis (LDA) that combines several distinct similarity measures into a single hybrid measure capturing the strengths of each of the individual measures. We also provide a comparative survey of techniques for low-rank Mahalanobis metric learning, and demonstrate that regularized LDA yields competitive results to the state-of-the-art, at substantially lower computational cost.
The second component of this thesis shifts the focus to inter-domain settings, and proposes a multiclass domain adaptation framework that reconciles systematic differences between spectra captured under similar, but not identical, conditions. Our framework computes a similarity-based mapping that captures structured, relative relationships between classes shared between source and target domains, allowing us apply a classifier trained using labeled source spectra to classify target spectra. We demonstrate improved domain adaptation accuracy in comparison to recently-proposed multitask learning and manifold alignment techniques in several case studies involving state-of-the-art synthetic and real-world hyperspectral imagery
The Challenge of Machine Learning in Space Weather Nowcasting and Forecasting
The numerous recent breakthroughs in machine learning (ML) make imperative to
carefully ponder how the scientific community can benefit from a technology
that, although not necessarily new, is today living its golden age. This Grand
Challenge review paper is focused on the present and future role of machine
learning in space weather. The purpose is twofold. On one hand, we will discuss
previous works that use ML for space weather forecasting, focusing in
particular on the few areas that have seen most activity: the forecasting of
geomagnetic indices, of relativistic electrons at geosynchronous orbits, of
solar flares occurrence, of coronal mass ejection propagation time, and of
solar wind speed. On the other hand, this paper serves as a gentle introduction
to the field of machine learning tailored to the space weather community and as
a pointer to a number of open challenges that we believe the community should
undertake in the next decade. The recurring themes throughout the review are
the need to shift our forecasting paradigm to a probabilistic approach focused
on the reliable assessment of uncertainties, and the combination of
physics-based and machine learning approaches, known as gray-box.Comment: under revie
Earth Observation Open Science and Innovation
geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc
- …