6,073 research outputs found

    Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

    Full text link
    Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring

    Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis

    Full text link
    The widespread use of multi-sensor technology and the emergence of big datasets has highlighted the limitations of standard flat-view matrix models and the necessity to move towards more versatile data analysis tools. We show that higher-order tensors (i.e., multiway arrays) enable such a fundamental paradigm shift towards models that are essentially polynomial and whose uniqueness, unlike the matrix methods, is guaranteed under verymild and natural conditions. Benefiting fromthe power ofmultilinear algebra as theirmathematical backbone, data analysis techniques using tensor decompositions are shown to have great flexibility in the choice of constraints that match data properties, and to find more general latent components in the data than matrix-based methods. A comprehensive introduction to tensor decompositions is provided from a signal processing perspective, starting from the algebraic foundations, via basic Canonical Polyadic and Tucker models, through to advanced cause-effect and multi-view data analysis schemes. We show that tensor decompositions enable natural generalizations of some commonly used signal processing paradigms, such as canonical correlation and subspace techniques, signal separation, linear regression, feature extraction and classification. We also cover computational aspects, and point out how ideas from compressed sensing and scientific computing may be used for addressing the otherwise unmanageable storage and manipulation problems associated with big datasets. The concepts are supported by illustrative real world case studies illuminating the benefits of the tensor framework, as efficient and promising tools for modern signal processing, data analysis and machine learning applications; these benefits also extend to vector/matrix data through tensorization. Keywords: ICA, NMF, CPD, Tucker decomposition, HOSVD, tensor networks, Tensor Train

    A Comprehensive Analysis of MALDI-TOF Spectrometry Data

    Get PDF

    Statistical process monitoring of a multiphase flow facility

    Get PDF
    Industrial needs are evolving fast towards more flexible manufacture schemes. As a consequence, it is often required to adapt the plant production to the demand, which can be volatile depending on the application. This is why it is important to develop tools that can monitor the condition of the process working under varying operational conditions. Canonical Variate Analysis (CVA) is a multivariate data driven methodology which has been demonstrated to be superior to other methods, particularly under dynamically changing operational conditions. These comparative studies normally use computer simulated data in benchmark case studies such as the Tennessee Eastman Process Plant (Ricker, N.L. Tennessee Eastman Challenge Archive, Available at 〈http://depts.washington.edu/control/LARRY/TE/download.html〉 Accessed 21.03.2014). The aim of this work is to provide a benchmark case to demonstrate the ability of different monitoring techniques to detect and diagnose artificially seeded faults in an industrial scale multiphase flow experimental rig. The changing operational conditions, the size and complexity of the test rig make this case study an ideal candidate for a benchmark case that provides a test bed for the evaluation of novel multivariate process monitoring techniques performance using real experimental data. In this paper, the capabilities of CVA to detect and diagnose faults in a real system working under changing operating conditions are assessed and compared with other methodologies. The results obtained demonstrate that CVA can be effectively applied for the detection and diagnosis of faults in real complex systems, and reinforce the idea that the performance of CVA is superior to other algorithms

    Incremental online learning in high dimensions

    Get PDF
    this article, however, is problematic, as it requires a careful selection of initial ridge regression parameters to stabilize the highly rank-deficient full covariance matrix of the input data, and it is easy to create too much bias or too little numerical stabilization initially, which can trap the local distance metric adaptation in local minima.While the LWPR algorithm just computes about a factor 10 times longer for the 20D experiment in comparison to the 2D experiment, RFWR requires a 1000-fold increase of computation time, thus rendering this algorithm unsuitable for high-dimensional regression. In order to compare LWPR's results to other popular regression methods, we evaluated the 2D, 10D, and 20D cross data sets with gaussian process regression (GP) and support vector (SVM) regression in addition to our LWPR method. It should be noted that neither SVM nor GP methods is an incremental method, although they can be considered state-of-the-art for batch regression under relatively small numbers of training data and reasonable input dimensionality. The computational complexity of these methods is prohibitively high for real-time applications. The GP algorithm (Gibbs & MacKay, 1997) used a generic covariance function and optimized over the hyperparameters. The SVM regression was performed using a standard available package (Saunders et al., 1998) and optimized for kernel choices. Figure 6 compares the performance of LWPR and gaussian processes for the above-mentioned data sets using 100, 300, and 500 training data point

    Incremental Online Learning in High Dimensions

    Get PDF
    Locally weighted projection regression (LWPR) is a new algorithm for incremental non-linear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its cor

    Gradients in urban material composition: A new concept to map cities with spaceborne imaging spectroscopy data

    Get PDF
    To understand processes in urban environments, such as urban energy fluxes or surface temperature patterns, it is important to map urban surface materials. Airborne imaging spectroscopy data have been successfully used to identify urban surface materials mainly based on unmixing algorithms. Upcoming spaceborne Imaging Spectrometers (IS), such as the Environmental Mapping and Analysis Program (EnMAP), will reduce the time and cost-critical limitations of airborne systems for Earth Observation (EO). However, the spatial resolution of all operated and planned IS in space will not be higher than 20 to 30 m and, thus, the detection of pure Endmember (EM) candidates in urban areas, a requirement for spectral unmixing, is very limited. Gradient analysis could be an alternative method for retrieving urban surface material compositions in pixels from spaceborne IS. The gradient concept is well known in ecology to identify plant species assemblages formed by similar environmental conditions but has never been tested for urban materials. However, urban areas also contain neighbourhoods with similar physical, compositional and structural characteristics. Based on this assumption, this study investigated (1) whether cover fractions of surface materials change gradually in urban areas and (2) whether these gradients can be adequately mapped and interpreted using imaging spectroscopy data (e.g. EnMAP) with 30 m spatial resolution. Similarities of material compositions were analysed on the basis of 153 systematically distributed samples on a detailed surface material map using Detrended Correspondence Analysis (DCA). Determined gradient scores for the first two gradients were regressed against the corresponding mean reflectance of simulated EnMAP spectra using Partial Least Square regression models. Results show strong correlations with R2 = 0.85 and R2 = 0.71 and an RMSE of 0.24 and 0.21 for the first and second axis, respectively. The subsequent mapping of the first gradient reveals patterns that correspond to the transition from predominantly vegetation classes to the dominance of artificial materials. Patterns resulting from the second gradient are associated with surface material compositions that are related to finer structural differences in urban structures. The composite gradient map shows patterns of common surface material compositions that can be related to urban land use classes such as Urban Structure Types (UST). By linking the knowledge of typical material compositions with urban structures, gradient analysis seems to be a powerful tool to map characteristic material compositions in 30 m imaging spectroscopy data of urban areas
    corecore