174,302 research outputs found

    Determining Principal Component Cardinality through the Principle of Minimum Description Length

    Full text link
    PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201

    Use of principal components to aggregate rare variants in case-control and family-based association studies in the presence of multiple covariates

    Get PDF
    Rare variants may help to explain some of the missing heritability of complex diseases. Technological advances in next-generation sequencing give us the opportunity to test this hypothesis. We propose two new methods (one for case-control studies and one for family-based studies) that combine aggregated rare variants and common variants located within a region through principal components analysis and allow for covariate adjustment. We analyzed 200 replicates consisting of 209 case subjects and 488 control subjects and compared the results to weight-based and step-up aggregation methods. The principal components and collapsing method showed an association between the gene FLT1 and the quantitative trait Q1 (P<10−30) in a fraction of the computation time of the other methods. The proposed family-based test has inconclusive results. The two methods provide a fast way to analyze simultaneously rare and common variants at the gene level while adjusting for covariates. However, further evaluation of the statistical efficiency of this approach is warranted

    A combined linkage, microarray and exome analysis suggests MAP3K11 as a candidate gene for left ventricular hypertrophy

    Get PDF
    Background: Electrocardiographic measures of left ventricular hypertrophy (LVH) are used as predictors of cardiovascular risk. We combined linkage and association analyses to discover novel rare genetic variants involved in three such measures and two principal components derived from them. Methods: The study was conducted among participants from the Erasmus Rucphen Family Study (ERF), a Dutch family-based sample from the southwestern Netherlands. Variance components linkage analyses were performed using Merlin. Regions of interest (LOD > 1.9) were fine-mapped using microarray and exome sequence data. Results: We observed one significant LOD score for the second principal component on chromosome 15 (LOD score = 3.01) and 12 suggestive LOD scores. Several loci contained variants identified in GWAS for these traits; however, these did not explain the linkage peaks, nor did other common variants. Exome sequence data identified two associated variants after multiple testing corrections were applied. Conclusions: We did not find common SNPs explaining these linkage signals. Exome sequencing uncovered a relatively rare variant in MAPK3K11 on chromosome 11 (MAF = 0.01) that helped account for the suggestive linkage peak observed for the first principal component. Conditional analysis revealed a drop in LOD from 2.01 to 0.88 for MAP3K11, suggesting that this variant may partially explain the linkage signal at this chromosomal location. MAP3K11 is related to the JNK pathway and is a pro-apoptotic kinase that plays an important role in the induction of cardiomyocyte apoptosis in various pathologies, including LVH

    Integrating joint feature selection into subspace learning: A formulation of 2DPCA for outliers robust feature selection

    Full text link
    © 2019 Elsevier Ltd Since the principal component analysis and its variants are sensitive to outliers that affect their performance and applicability in real world, several variants have been proposed to improve the robustness. However, most of the existing methods are still sensitive to outliers and are unable to select useful features. To overcome the issue of sensitivity of PCA against outliers, in this paper, we introduce two-dimensional outliers-robust principal component analysis (ORPCA) by imposing the joint constraints on the objective function. ORPCA relaxes the orthogonal constraints and penalizes the regression coefficient, thus, it selects important features and ignores the same features that exist in other principal components. It is commonly known that square Frobenius norm is sensitive to outliers. To overcome this issue, we have devised an alternative way to derive objective function. Experimental results on four publicly available benchmark datasets show the effectiveness of joint feature selection and provide better performance as compared to state-of-the-art dimensionality-reduction methods

    Two-phase incremental kernel PCA for learning massive or online datasets

    Get PDF
    As a powerful nonlinear feature extractor, kernel principal component analysis (KPCA) has been widely adopted in many machine learning applications. However, KPCA is usually performed in a batch mode, leading to some potential problems when handling massive or online datasets. To overcome this drawback of KPCA, in this paper, we propose a two-phase incremental KPCA (TP-IKPCA) algorithm which can incorporate data into KPCA in an incremental fashion. In the first phase, an incremental algorithm is developed to explicitly express the data in the kernel space. In the second phase, we extend an incremental principal component analysis (IPCA) to estimate the kernel principal components. Extensive experimental results on both synthesized and real datasets showed that the proposed TP-IKPCA produces similar principal components as conventional batch-based KPCA but is computationally faster than KPCA and its several incremental variants. Therefore, our algorithm can be applied to massive or online datasets where the batch method is not available

    Two-phase incremental kernel PCA for learning massive or online datasets

    Get PDF
    As a powerful nonlinear feature extractor, kernel principal component analysis (KPCA) has been widely adopted in many machine learning applications. However, KPCA is usually performed in a batch mode, leading to some potential problems when handling massive or online datasets. To overcome this drawback of KPCA, in this paper, we propose a two-phase incremental KPCA (TP-IKPCA) algorithm which can incorporate data into KPCA in an incremental fashion. In the first phase, an incremental algorithm is developed to explicitly express the data in the kernel space. In the second phase, we extend an incremental principal component analysis (IPCA) to estimate the kernel principal components. Extensive experimental results on both synthesized and real datasets showed that the proposed TP-IKPCA produces similar principal components as conventional batch-based KPCA but is computationally faster than KPCA and its several incremental variants. Therefore, our algorithm can be applied to massive or online datasets where the batch method is not available
    corecore