243,896 research outputs found

    High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

    Full text link
    We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world datasets.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPMAI

    Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model

    Get PDF
    In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both synthetic and real datasets are conducted to illustrate the empirical usefulness of the proposed models and methods

    Distribution-Free Tests of Independence in High Dimensions

    Get PDF
    We consider the testing of mutual independence among all entries in a dd-dimensional random vector based on nn independent observations. We study two families of distribution-free test statistics, which include Kendall's tau and Spearman's rho as important examples. We show that under the null hypothesis the test statistics of these two families converge weakly to Gumbel distributions, and propose tests that control the type I error in the high-dimensional setting where d>nd>n. We further show that the two tests are rate-optimal in terms of power against sparse alternatives, and outperform competitors in simulations, especially when dd is large.Comment: to appear in Biometrik

    On a generalized canonical bundle formula for generically finite morphisms

    Get PDF
    We prove a canonical bundle formula for generically finite morphisms in the setting of generalized pairs (with R\mathbb{R}-coefficients). This complements Filipazzi's canonical bundle formula for morphisms with connected fibres. It is then applied to obtain a subadjunction formula for log canonical centers of generalized pairs. As another application, we show that the image of an anti-nef log canonical generalized pair has the structure of a numerically trivial log canonical generalized pair. This readily implies a result of Chen--Zhang. Along the way we prove that the Shokurov type convex sets for anti-nef log canonical divisors are indeed rational polyhedral sets.Comment: 29 pages, to appear in Ann. Inst. Fourier (Grenoble
    corecore