Search CORE

6,073 research outputs found

Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

Author: Arenas-García Jerónimo
Camps-Valls Gustavo
Hansen Lars Kai
Petersen Kaare Brandt
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring

arXiv.org e-Print Archive

Online Research Database In Technology

Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis

Author: Caiafa C.
Cichocki A.
De Lathauwer L.
Mandic D.
Phan A-H.
Zhao Q.
Zhou G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/03/2014
Field of study

The widespread use of multi-sensor technology and the emergence of big datasets has highlighted the limitations of standard flat-view matrix models and the necessity to move towards more versatile data analysis tools. We show that higher-order tensors (i.e., multiway arrays) enable such a fundamental paradigm shift towards models that are essentially polynomial and whose uniqueness, unlike the matrix methods, is guaranteed under verymild and natural conditions. Benefiting fromthe power ofmultilinear algebra as theirmathematical backbone, data analysis techniques using tensor decompositions are shown to have great flexibility in the choice of constraints that match data properties, and to find more general latent components in the data than matrix-based methods. A comprehensive introduction to tensor decompositions is provided from a signal processing perspective, starting from the algebraic foundations, via basic Canonical Polyadic and Tucker models, through to advanced cause-effect and multi-view data analysis schemes. We show that tensor decompositions enable natural generalizations of some commonly used signal processing paradigms, such as canonical correlation and subspace techniques, signal separation, linear regression, feature extraction and classification. We also cover computational aspects, and point out how ideas from compressed sensing and scientific computing may be used for addressing the otherwise unmanageable storage and manipulation problems associated with big datasets. The concepts are supported by illustrative real world case studies illuminating the benefits of the tensor framework, as efficient and promising tools for modern signal processing, data analysis and machine learning applications; these benefits also extend to vector/matrix data through tensorization. Keywords: ICA, NMF, CPD, Tucker decomposition, HOSVD, tensor networks, Tensor Train

arXiv.org e-Print Archive

CiteSeerX

A Comprehensive Analysis of MALDI-TOF Spectrometry Data

Author: Malgorzata Plechawska-Wojcik
Publication venue: 'IntechOpen'
Publication date: 09/03/2012
Field of study

IntechOpen

Recommended from our members

Statistical Workflow for Feature Selection in Human Metabolomics Data.

Author: Antonelli Joseph
Cheng Susan
Claggett Brian L
Demler Olga V
Deng Katherine
Henglin Mir
Hushcha Pavel V
Jain Mohit
Kim Andy
Kim Nicole
Lagerborg Kim A
Mora Samia
Niiranen Teemu J
Ovsak Gavin
Pereira Alexandre C
Rao Kevin
Tyagi Octavia
Watrous Jeramie D
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations

eScholarship - University of California

Statistical process monitoring of a multiphase flow facility

Author: Cao Yi
Harrison David
Lao Liyun
Ruiz Carcel Cristobal
Samuel Raphael
Publication venue: 'Elsevier BV'
Publication date: 01/09/2015
Field of study

Industrial needs are evolving fast towards more flexible manufacture schemes. As a consequence, it is often required to adapt the plant production to the demand, which can be volatile depending on the application. This is why it is important to develop tools that can monitor the condition of the process working under varying operational conditions. Canonical Variate Analysis (CVA) is a multivariate data driven methodology which has been demonstrated to be superior to other methods, particularly under dynamically changing operational conditions. These comparative studies normally use computer simulated data in benchmark case studies such as the Tennessee Eastman Process Plant (Ricker, N.L. Tennessee Eastman Challenge Archive, Available at 〈http://depts.washington.edu/control/LARRY/TE/download.html〉 Accessed 21.03.2014). The aim of this work is to provide a benchmark case to demonstrate the ability of different monitoring techniques to detect and diagnose artificially seeded faults in an industrial scale multiphase flow experimental rig. The changing operational conditions, the size and complexity of the test rig make this case study an ideal candidate for a benchmark case that provides a test bed for the evaluation of novel multivariate process monitoring techniques performance using real experimental data. In this paper, the capabilities of CVA to detect and diagnose faults in a real system working under changing operating conditions are assessed and compared with other methodologies. The results obtained demonstrate that CVA can be effectively applied for the detection and diagnosis of faults in real complex systems, and reinforce the idea that the performance of CVA is superior to other algorithms

Cranfield CERES

Incremental online learning in high dimensions

Author: Aaron D'Souza
Atkeson C.
Sethu Vijayakumar
Stefan Schaal
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2005
Field of study

this article, however, is problematic, as it requires a careful selection of initial ridge regression parameters to stabilize the highly rank-deficient full covariance matrix of the input data, and it is easy to create too much bias or too little numerical stabilization initially, which can trap the local distance metric adaptation in local minima.While the LWPR algorithm just computes about a factor 10 times longer for the 20D experiment in comparison to the 2D experiment, RFWR requires a 1000-fold increase of computation time, thus rendering this algorithm unsuitable for high-dimensional regression. In order to compare LWPR's results to other popular regression methods, we evaluated the 2D, 10D, and 20D cross data sets with gaussian process regression (GP) and support vector (SVM) regression in addition to our LWPR method. It should be noted that neither SVM nor GP methods is an incremental method, although they can be considered state-of-the-art for batch regression under relatively small numbers of training data and reasonable input dimensionality. The computational complexity of these methods is prohibitively high for real-time applications. The GP algorithm (Gibbs & MacKay, 1997) used a generic covariance function and optimized over the hyperparameters. The SVM regression was performed using a standard available package (Saunders et al., 1998) and optimized for kernel choices. Figure 6 compares the performance of LWPR and gaussian processes for the above-mentioned data sets using 100, 300, and 500 training data point

CiteSeerX

Crossref

Edinburgh Research Explorer

Incremental Online Learning in High Dimensions

Author: D'Souza Aaron
Schaal Stefan
Vijayakumar Sethu
Publication venue: 'MIT Press - Journals'
Publication date: 31/08/2010
Field of study

Locally weighted projection regression (LWPR) is a new algorithm for incremental non-linear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its cor

Edinburgh Research Archive

Gradients in urban material composition: A new concept to map cities with spaceborne imaging spectroscopy data

Author: Feilhauer Hannes
Heiden Uta
Jilge Marianne
Neumann Carsten
Publication venue
Publication date: 01/01/2019
Field of study

To understand processes in urban environments, such as urban energy fluxes or surface temperature patterns, it is important to map urban surface materials. Airborne imaging spectroscopy data have been successfully used to identify urban surface materials mainly based on unmixing algorithms. Upcoming spaceborne Imaging Spectrometers (IS), such as the Environmental Mapping and Analysis Program (EnMAP), will reduce the time and cost-critical limitations of airborne systems for Earth Observation (EO). However, the spatial resolution of all operated and planned IS in space will not be higher than 20 to 30 m and, thus, the detection of pure Endmember (EM) candidates in urban areas, a requirement for spectral unmixing, is very limited. Gradient analysis could be an alternative method for retrieving urban surface material compositions in pixels from spaceborne IS. The gradient concept is well known in ecology to identify plant species assemblages formed by similar environmental conditions but has never been tested for urban materials. However, urban areas also contain neighbourhoods with similar physical, compositional and structural characteristics. Based on this assumption, this study investigated (1) whether cover fractions of surface materials change gradually in urban areas and (2) whether these gradients can be adequately mapped and interpreted using imaging spectroscopy data (e.g. EnMAP) with 30 m spatial resolution. Similarities of material compositions were analysed on the basis of 153 systematically distributed samples on a detailed surface material map using Detrended Correspondence Analysis (DCA). Determined gradient scores for the first two gradients were regressed against the corresponding mean reflectance of simulated EnMAP spectra using Partial Least Square regression models. Results show strong correlations with R2 = 0.85 and R2 = 0.71 and an RMSE of 0.24 and 0.21 for the first and second axis, respectively. The subsequent mapping of the first gradient reveals patterns that correspond to the transition from predominantly vegetation classes to the dominance of artificial materials. Patterns resulting from the second gradient are associated with surface material compositions that are related to finer structural differences in urban structures. The composite gradient map shows patterns of common surface material compositions that can be related to urban land use classes such as Urban Structure Types (UST). By linking the knowledge of typical material compositions with urban structures, gradient analysis seems to be a powerful tool to map characteristic material compositions in 30 m imaging spectroscopy data of urban areas

Institute of Transport Research:Publications

Institutional Repository of the Freie Universität Berlin