243,896 research outputs found
High Dimensional Semiparametric Scale-Invariant Principal Component Analysis
We propose a new high dimensional semiparametric principal component analysis
(PCA) method, named Copula Component Analysis (COCA). The semiparametric model
assumes that, after unspecified marginally monotone transformations, the
distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA
in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust
to outliers and data contamination; (iii) It is scale-invariant and yields more
interpretable results. We prove that the COCA estimators obtain fast estimation
rates and are feature selection consistent when the dimension is nearly
exponentially large relative to the sample size. Careful experiments confirm
that COCA outperforms sparse PCA on both synthetic and real-world datasets.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPMAI
Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model
In this manuscript a unified framework for conducting inference on complex
aggregated data in high dimensional settings is proposed. The data are assumed
to be a collection of multiple non-Gaussian realizations with underlying
undirected graphical structures. Utilizing the concept of median graphs in
summarizing the commonality across these graphical structures, a novel
semiparametric approach to modeling such complex aggregated data is provided
along with robust estimation of the median graph, which is assumed to be
sparse. The estimator is proved to be consistent in graph recovery and an upper
bound on the rate of convergence is given. Experiments on both synthetic and
real datasets are conducted to illustrate the empirical usefulness of the
proposed models and methods
Distribution-Free Tests of Independence in High Dimensions
We consider the testing of mutual independence among all entries in a
-dimensional random vector based on independent observations. We study
two families of distribution-free test statistics, which include Kendall's tau
and Spearman's rho as important examples. We show that under the null
hypothesis the test statistics of these two families converge weakly to Gumbel
distributions, and propose tests that control the type I error in the
high-dimensional setting where . We further show that the two tests are
rate-optimal in terms of power against sparse alternatives, and outperform
competitors in simulations, especially when is large.Comment: to appear in Biometrik
On a generalized canonical bundle formula for generically finite morphisms
We prove a canonical bundle formula for generically finite morphisms in the
setting of generalized pairs (with -coefficients). This complements
Filipazzi's canonical bundle formula for morphisms with connected fibres. It is
then applied to obtain a subadjunction formula for log canonical centers of
generalized pairs. As another application, we show that the image of an
anti-nef log canonical generalized pair has the structure of a numerically
trivial log canonical generalized pair. This readily implies a result of
Chen--Zhang. Along the way we prove that the Shokurov type convex sets for
anti-nef log canonical divisors are indeed rational polyhedral sets.Comment: 29 pages, to appear in Ann. Inst. Fourier (Grenoble
- …
