25,629 research outputs found

    Robust Sparse Canonical Correlation Analysis

    Full text link
    Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. This paper discusses a method for Robust Sparse CCA. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. We also robustify the method such that it can cope with outliers in the data. To estimate the canonical vectors, we convert the CCA problem into an alternating regression framework, and use the sparse Least Trimmed Squares estimator. We illustrate the good performance of the Robust Sparse CCA method in several simulation studies and two real data examples

    Mining Outcome-relevant Brain Imaging Genetic Associations via Three-way Sparse Canonical Correlation Analysis in Alzheimer’s Disease

    Get PDF
    Neuroimaging genetics is an emerging field that aims to identify the associations between genetic variants (e.g., single nucleotide polymorphisms (SNPs)) and quantitative traits (QTs) such as brain imaging phenotypes. In recent studies, in order to detect complex multi-SNP-multi-QT associations, bi-multivariate techniques such as various structured sparse canonical correlation analysis (SCCA) algorithms have been proposed and used in imaging genetics studies. However, associations between genetic markers and imaging QTs identified by existing bi-multivariate methods may not be all disease specific. To bridge this gap, we propose an analytical framework, based on three-way sparse canonical correlation analysis (T-SCCA), to explore the intrinsic associations among genetic markers, imaging QTs, and clinical scores of interest. We perform an empirical study using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort to discover the relationships among SNPs from AD risk gene APOE, imaging QTs extracted from structural magnetic resonance imaging scans, and cognitive and diagnostic outcomes. The proposed T-SCCA model not only outperforms the traditional SCCA method in terms of identifying strong associations, but also discovers robust outcome-relevant imaging genetic patterns, demonstrating its promise for improving disease-related mechanistic understanding

    Some statistical methods for dimension reduction

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe aim of the work in this thesis is to carry out dimension reduction (DR) for high dimensional (HD) data by using statistical methods for variable selection, feature extraction and a combination of the two. In Chapter 2, the DR is carried out through robust feature extraction. Robust canonical correlation (RCCA) methods have been proposed. In the correlation matrix of canonical correlation analysis (CCA), we suggest that the Pearson correlation should be substituted by robust correlation measures in order to obtain robust correlation matrices. These matrices have been employed for producing RCCA. Moreover, the classical covariance matrix has been substituted by robust estimators for multivariate location and dispersion in order to get RCCA. In Chapter 3 and 4, the DR is carried out by combining the ideas of variable selection using regularisation methods with feature extraction, through the minimum average variance estimator (MAVE) and single index quantile regression (SIQ) methods, respectively. In particular, we extend the sparse MAVE (SMAVE) reported in (Wang and Yin, 2008) by combining the MAVE loss function with different regularisation penalties in Chapter 3. An extension of the SIQ of Wu et al. (2010) by considering different regularisation penalties is proposed in Chapter 4. In Chapter 5, the DR is done through variable selection under Bayesian framework. A flexible Bayesian framework for regularisation in quantile regression (QR) model has been proposed. This work is different from Bayesian Lasso quantile regression (BLQR), employing the asymmetric Laplace error distribution (ALD). The error distribution is assumed to be an infinite mixture of Gaussian (IMG) densities

    Sparse canonical correlation analysis from a predictive point of view

    Full text link
    Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each data set. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each data set, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic data set

    Robust correlated and individual component analysis

    Get PDF
    © 1979-2012 IEEE.Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) thetemporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methodson these application domains, outperforming other state-of-the-art methods in the field
    • …
    corecore