124 research outputs found

    Parsimonious Mahalanobis Kernel for the Classification of High Dimensional Data

    Full text link
    The classification of high dimensional data with kernel methods is considered in this article. Exploit- ing the emptiness property of high dimensional spaces, a kernel based on the Mahalanobis distance is proposed. The computation of the Mahalanobis distance requires the inversion of a covariance matrix. In high dimensional spaces, the estimated covariance matrix is ill-conditioned and its inversion is unstable or impossible. Using a parsimonious statistical model, namely the High Dimensional Discriminant Analysis model, the specific signal and noise subspaces are estimated for each considered class making the inverse of the class specific covariance matrix explicit and stable, leading to the definition of a parsimonious Mahalanobis kernel. A SVM based framework is used for selecting the hyperparameters of the parsimonious Mahalanobis kernel by optimizing the so-called radius-margin bound. Experimental results on three high dimensional data sets show that the proposed kernel is suitable for classifying high dimensional data, providing better classification accuracies than the conventional Gaussian kernel

    X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data

    Get PDF
    This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods

    Mixture of Latent Variable Models for Remotely Sensed Image Processing

    Get PDF
    The processing of remotely sensed data is innately an inverse problem where properties of spatial processes are inferred from the observations based on a generative model. Meaningful data inversion relies on well-defined generative models that capture key factors in the relationship between the underlying physical process and the measurements. Unfortunately, as two mainstream data processing techniques, both mixture models and latent variables models (LVM) are inadequate in describing the complex relationship between the spatial process and the remote sensing data. Consequently, mixture models, such as K-Means, Gaussian Mixture Model (GMM), Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), characterize a class by statistics in the original space, ignoring the fact that a class can be better represented by discriminative signals in the hidden/latent feature space, while LVMs, such as Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Sparse Representation (SR), seek representational signals in the whole image scene that involves multiple spatial processes, neglecting the fact that signal discovery for individual processes is more efficient. Although the combined use of mixture model and LVMs is required for remote sensing data analysis, there is still a lack of systematic exploration on this important topic in remote sensing literature. Driven by the above considerations, this thesis therefore introduces a mixture of LVM (MLVM) framework for combining the mixture models and LVMs, under which three models are developed in order to address different aspects of remote sensing data processing: (1) a mixture of probabilistic SR (MPSR) is proposed for supervised classification of hyperspectral remote sensing imagery, considering that SR is an emerging and powerful technique for feature extraction and data representation; (2) a mixture model of K “Purified” means (K-P-Means) is proposed for addressing the spectral endmember estimation, which is a fundamental issue in remote sensing data analysis; (3) and a clustering-based PCA model is introduced for SAR image denoising. Under a unified optimization scheme, all models are solved via Expectation and Maximization (EM) algorithm, by iteratively estimating the two groups of parameters, i.e., the labels of pixels and the latent variables. Experiments on simulated data and real remote sensing data demonstrate the advantages of the proposed models in the respective applications

    SIMULATIONS-GUIDED DESIGN OF PROCESS ANALYTICAL SENSOR USING MOLECULAR FACTOR COMPUTING

    Get PDF
    Many areas of science now generate huge volumes of data that present visualization, modeling, and interpretation challenges. Methods for effectively representing the original data in a reduced coordinate space are therefore receiving much attention. The purpose of this research is to test the hypothesis that molecular computing of vectors for transformation matrices enables spectra to be represented in any arbitrary coordinate system. New coordinate systems are selected to reduce the dimensionality of the spectral hyperspace and simplify the mechanical/electrical/computational construction of a spectrometer. A novel integrated sensing and processing system, termed Molecular Factor Computing (MFC) based near infrared (NIR) spectrometer, is proposed in this dissertation. In an MFC -based NIR spectrometer, spectral features are encoded by the transmission spectrum of MFC filters which effectively compute the calibration function or the discriminant functions by weighing the signals received from a broad wavelength band. Compared with the conventional spectrometers, the novel NIR analyzer proposed in this work is orders of magnitude faster and more rugged than traditional spectroscopy instruments without sacrificing the accuracy that makes it an ideal analytical tool for process analysis. Two different MFC filter-generating algorithms are developed and tested for searching a near-infrared spectral library to select molecular filters for MFC-based spectroscopy. One using genetic algorithms coupled with predictive modeling methods to select MFC filters from a spectral library for quantitative prediction is firstly described. The second filter-generating algorithm designed to select MFC filters for qualitative classification purpose is then presented. The concept of molecular factor computing (MFC)-based predictive spectroscopy is demonstrated with quantitative analysis of ethanol-in-water mixtures in a MFC-based prototype instrument

    Kernel Feature Extraction Methods for Remote Sensing Data Analysis

    Get PDF
    Technological advances in the last decades have improved our capabilities of collecting and storing high data volumes. However, this makes that in some fields, such as remote sensing several problems are generated in the data processing due to the peculiar characteristics of their data. High data volume, high dimensionality, heterogeneity and their nonlinearity, make that the analysis and extraction of relevant information from these images could be a bottleneck for many real applications. The research applying image processing and machine learning techniques along with feature extraction, allows the reduction of the data dimensionality while keeps the maximum information. Therefore, developments and applications of feature extraction methodologies using these techniques have increased exponentially in remote sensing. This improves the data visualization and the knowledge discovery. Several feature extraction methods have been addressed in the literature depending on the data availability, which can be classified in supervised, semisupervised and unsupervised. In particular, feature extraction can use in combination with kernel methods (nonlinear). The process for obtaining a space that keeps greater information content is facilitated by this combination. One of the most important properties of the combination is that can be directly used for general tasks including classification, regression, clustering, ranking, compression, or data visualization. In this Thesis, we address the problems of different nonlinear feature extraction approaches based on kernel methods for remote sensing data analysis. Several improvements to the current feature extraction methods are proposed to transform the data in order to make high dimensional data tasks easier, such as classification or biophysical parameter estimation. This Thesis focus on three main objectives to reach these improvements in the current feature extraction methods: The first objective is to include invariances into supervised kernel feature extraction methods. Throughout these invariances it is possible to generate virtual samples that help to mitigate the problem of the reduced number of samples in supervised methods. The proposed algorithm is a simple method that essentially generates new (synthetic) training samples from available labeled samples. These samples along with original samples should be used in feature extraction methods obtaining more independent features between them that without virtual samples. The introduction of prior knowledge by means of the virtual samples could obtain classification and biophysical parameter estimation methods more robust than without them. The second objective is to use the generative kernels, i.e. probabilistic kernels, that directly learn by means of clustering techniques from original data by finding local-to-global similarities along the manifold. The proposed kernel is useful for general feature extraction purposes. Furthermore, the kernel attempts to improve the current methods because the kernel not only contains labeled data information but also uses the unlabeled information of the manifold. Moreover, the proposed kernel is parameter free in contrast with the parameterized functions such as, the radial basis function (RBF). Using probabilistic kernels is sought to obtain new unsupervised and semisupervised methods in order to reduce the number and cost of labeled data in remote sensing. Third objective is to develop new kernel feature extraction methods for improving the features obtained by the current methods. Optimizing the functional could obtain improvements in new algorithm. For instance, the Optimized Kernel Entropy Component Analysis (OKECA) method. The method is based on the Independent Component Analysis (ICA) framework resulting more efficient than the standard Kernel Entropy Component Analysis (KECA) method in terms of dimensionality reduction. In this Thesis, the methods are focused on remote sensing data analysis. Nevertheless, feature extraction methods are used to analyze data of several research fields whereas data are multidimensional. For these reasons, the results are illustrated into experimental sequence. First, the projections are analyzed by means of Toy examples. The algorithms are tested through standard databases with supervised information to proceed to the last step, the analysis of remote sensing images by the proposed methods

    Computational methods for the analysis of mass spectrometry imaging data

    Get PDF
    A powerful enhancement to MS-based detection is the addition of spatial information to the chemical data; an approach called mass spectrometry imaging (MSI). MSI enables two- and three-dimensional overviews of hundreds of molecular species over a wide mass range in complex biological samples. In this work, we present two computational methods and a workflow that address three different aspects of MSI data analysis: correction of mass shifts, unsupervised exploration of the data and importance of preprocessing and chemometrics to extract meaningful information from the data. We introduce a new lock mass-free recalibration procedure that enables to significantly reduce these mass shift effects in MSI data. Our method exploits similarities amongst peaklist pairs and takes advantage of the spatial context in three different ways, to perform mass correction in an iterative manner. As an extension of this work, we also present a Java-based tool, MSICorrect, that implements our recalibration approach and also allows data visualization. In the next part, an unsupervised approach to rank ion intensity maps based on the abundance of their spatial pattern is presented. Our method provides a score to every ion intensity map based on the abundance of spatial pattern present in it and then ranks all the maps using it. To know which masses exhibit similar spatial distribution, our method uses spatial-similarity based grouping to provide lists of masses that exhibit similar distribution patterns. In the last part, we demonstrate the application of a data preprocessing and multivariate analysis pipeline to a real-world biological dataset. We demonstrate this by applying the full pipeline to a high-resolution MSI dataset acquired from the leaf surface of Black cottonwood (Populus trichocarpa). Application of the pipeline helped in highlighting and visualizing the chemical specificity on the leaf surface

    Development of innovative analytical methods based on spectroscopic techniques and multivariate statistical analysis for quality control in the food and pharmaceutical fields.

    Get PDF
    The increasing demand on quality assurance and ever more stringent regulations in food and pharmaceutical fields are promoting the need for analytical techniques enabling to provide reliable and accurate results. However, traditional analytical methods are labor-intensive, time-consuming, expensive and they usually require skilled personnel for performing the analysis. For these reasons, in the last decades, quality control protocols based on the employment of spectroscopic methods have been developed for many different application fields, including pharmaceutical and food ones. Vibrational spectroscopic techniques can be an adequate alternative for acquiring both chemical and physical information related to homogenous and heterogenous matrices of interest. Moreover, the significant development of powerful data-driven methodologies allowed to develop algorithms for the optimal extraction and processing of the complex spectroscopic signals allowing to apply combined approaches for quantitative and qualitative purposes. The present Doctoral Thesis has been focused on the development of ad-hoc analytical strategies based on the application of spectroscopic techniques coupled with multivariate data analysis approaches for providing alternative analytical protocols for quality control in food and pharmaceutical sectors. Regarding applications in food sector, excitation-emission Fluorescence Spectroscopy, Near Infrared Spectroscopy (NIRS) and NIR Hyperspectral Imaging (HSI) have been tested for solving analytical issues of independent case-studies. Unsupervised approaches based on Principal Component Analysis (PCA) and Parallel Factor Analysis (PARAFAC) have been applied on fluorescence data for characterizing green tea samples, while quantitative predictive approaches as Partial Least Squares regression have been used to correlate NIR spectra with quality parameters of extra-virgin olive oil samples. HSI was applied to study dynamic chemical processes which occur during cheese ripening with the aim to map chemical and sensory changes over time. The rapid technical progress in terms of spectroscopic instrumentations has led to have more flexible portable systems suitable for performing measurements directly in the field or in a manufacturing plant. Within this scenario, NIR spectroscopy proved to be one of the most powerful Process Analytical Technologies (PAT) for monitoring and controlling complex manufacturing processes. In this thesis, two applications based on the implementation of miniaturized NIR sensors have been performed for the real-time powder blending monitoring of pharmaceutical and food formulation, respectively. The main challenges in blending monitoring are related to the assessment of the homogeneity of multicomponent formulations, which is crucial to ensure the safety and effectiveness of a solid pharmaceutical formulation or the quality of a food product. In the third chapter of this thesis, tailor made qualitative chemometric strategies for obtaining a global understanding of blending processes and to optimize the endpoint detection are presented

    OCM 2017 - Optical Characterization of Materials - conference proceedings

    Get PDF
    Each material has its own specific spectral signature independent if it is food, plastics, or minerals. During the conference we will discuss new trends and developments in material characterization. You also will be informed about latest highlights to identify spectral footprints and their realizations in industry

    Molecular diagnosis of cancer using ambient ionization mass spectrometry

    Get PDF
    My dissertation focuses on advancing the development and application of ambient ionization mass spectrometry methodology and technology to the biomedical field. The primary ambient ionization method used in my studies is desorption electrospray ionization mass spectrometry (DESI-MS) imaging, which has been previously used to analyze and differentiate disease state (i.e. tumor and normal) and in some cases tumor subtype of human liver, kidney, bladder, testicular, prostate, and brain cancers. DESI-MS imaging is an ideal method for disease diagnosis, because it can be used to directly correlate disease state with histopathology to develop and validate MS libraries built using the molecular profiles that relate to tissue disease states. The goal of this research is to use ambient ionization mass spectrometry for intraoperative surgical-guidance to more accurately diagnose tissue and reduce surgical times. Technological developments during the course of research revolved around touch spray ambient ionization mass spectrometry (TS-MS). This method uses a small probe (e.g. teasing needle) to pick up a minuscule amount of material from a sample, transfer the probe to the front of a mass spectrometer, and, with the addition of high voltage and solvent, induce ESI-like mechanisms for ionization. An evaluation of TS for its use as a potential in vivo surgical tool for disease screening was performed by concurrently studying prostate cancer tissue obtained from surgery with DESI-MS imaging. DESI imaging was used to first establish the relationship between MS molecular profiles and pathology which were then targeted using TS. Further, TS was also evaluated as a non-targeted technique by analyzing prostate specimens with unknown disease states and comparing the unknown data to the previously built MS targeted library. Methodological developments include DESI-MS studies for preliminary diagnosis of disease state and tumor subtyping using fine needle aspirations (FNA) of canine lymphoma specimens. Lipid profiles obtained from FNA samples were tested against a MS library built from a matched set of surgical tissue sections with disease states confirmed by histopathology. DESI-MS imaging was also used to expand upon previously investigated human kidney cancer. Previous investigations included two subtypes and low sample numbers (~10 paired normal and tumor samples per subtype), this more recent study includes the top three most commonly diagnosed subtypes (clear cell, papillary, and chromophobe) and higher sample numbers (~20 paired normal and tumor samples per subtype). In summary, many methodological and technological advances were made during the course of my dissertation studies. These advances include the development of a novel ambient ionization method, an extension of current applications to include FNA samples for early diagnosis, and an expansion of previous work to build more complex and comprehensive MS libraries. Advances such as these continue to propel ambient ionization mass spectrometry deeper into the biomedical field and gives hope to the use of chemical profiling using these methods for biomedical applications in the near future
    corecore