Search CORE

47 research outputs found

Kernel PCA for multivariate extremes

Author: Avella-Medina Marco
Davis Richard A.
Samorodnitsky Gennady
Publication venue
Publication date: 23/11/2022
Field of study

We propose kernel PCA as a method for analyzing the dependence structure of multivariate extremes and demonstrate that it can be a powerful tool for clustering and dimension reduction. Our work provides some theoretical insight into the preimages obtained by kernel PCA, demonstrating that under certain conditions they can effectively identify clusters in the data. We build on these new insights to characterize rigorously the performance of kernel PCA based on an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory and provide a careful analysis in the case where the extremes are generated from a linear factor model. We give theoretical guarantees on the performance of kernel PCA preimages of such extremes by leveraging their asymptotic distribution together with Davis-Kahan perturbation bounds. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods

arXiv.org e-Print Archive

Lazy stochastic principal component analysis

Author: Li Li
Nguyen Dinh
Wojnowicz Michael
Zhao Xuan
Publication venue
Publication date: 21/09/2017
Field of study

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW

arXiv.org e-Print Archive

Crossref

One-class kernel subspace ensemble for medical image classification

Author: Coenen Frans
Lu Wenjin
Xiao Jimin
Zhang Bailing
Zhang Yungang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2014
Field of study

University of Liverpool Repository

Springer - Publisher Connector

Characterizing Pathological Deviations from Normality using Constrained Manifold-Learning

Author: De Craene Mathieu
Duchateau Nicolas
Frangi Alejandro
Piella Gemma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International audienceWe propose a technique to represent a pathological pattern as a deviation from normality along a manifold structure. Each subject is represented by a map of local motion abnormalities, obtained from a statistical atlas of motion built from a healthy population. The algorithm learns a manifold from a set of patients with varying degrees of the same pathology. The approach extends recent manifold-learning techniques by constraining the manifold to pass by a physiologically meaningful origin representing a normal motion pattern. Individuals are compared to the manifold population through a distance that combines a mapping to the manifold and the path along the manifold to reach its origin. The method is applied in the context of cardiac resynchronization therapy (CRT), focusing on a specific motion pattern of intra-ventricular dyssyn-chrony called septal flash (SF). We estimate the manifold from 50 CRT candidates with SF and test it on 38 CRT candidates and 21 healthy volunteers. Experiments highlight the need of nonlinear techniques to learn the studied data, and the relevance of the computed distance for comparing individuals to a specific pathological pattern

The University of Manchester - Institutional Repository

Tensor Representations for Object Classification and Detection

Author: Cristani Marco
Gong Shaogang
Murino Vittorio
Tosato Diego
Xiang Tao
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online

Sparse Model Selection using Information Complexity

Author: Sun Yaojin
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2022
Field of study

This dissertation studies and uses the application of information complexity to statistical model selection through three different projects. Specifically, we design statistical models that incorporate sparsity features to make the models more explanatory and computationally efficient. In the first project, we propose a Sparse Bridge Regression model for variable selection when the number of variables is much greater than the number of observations if model misspecification occurs. The model is demonstrated to have excellent explanatory power in high-dimensional data analysis through numerical simulations and real-world data analysis. The second project proposes a novel hybrid modeling method that utilizes a mixture of sparse principal component regression (MIX-SPCR) to segment high-dimensional time series data. Using the MIX-SPCR model, we empirically analyze the S\&P 500 index data (from 1999 to 2019) and identify two key change points. The third project investigates the use of nonlinear features in the Sparse Kernel Factor Analysis (SKFA) method to derive the information criterion. Using a variety of wide datasets, we demonstrate the benefits of SKFA in the nonlinear representation and classification of data. The results obtained show the flexibility and the utility of information complexity in such data modeling problems

University of Tennessee, Knoxville: Trace

Kernel Methods for Machine Learning with Life Science Applications

Author: Abrahamsen Trine Julie
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study

Online Research Database In Technology