137,217 research outputs found

    Sufficient Canonical Correlation Analysis

    Get PDF
    Canonical correlation analysis (CCA) is an effective way to find two appropriate subspaces in which Pearson’s correlation coefficients are maximized between projected random vectors. Due to its well-established theoretical support and relatively efficient computation, CCA is widely used as a joint dimension reduction tool and has been successfully applied to many image processing and computer vision tasks. However, as reported, the traditional CCA suffers from overfitting in many practical cases. In this paper, we propose sufficient CCA (S-CCA) to relieve CCA’s overfitting problem, which is inspired by the theory of sufficient dimension reduction. The effectiveness of S-CCA is verified both theoretically and experimentally. Experimental results also demonstrate that our S-CCA outperforms some of CCA’s popular extensions during the prediction phase, especially when severe overfitting occurs

    Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem

    Full text link
    We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEP). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR. Supplementary materials are available online

    Distance-Based Independence Screening for Canonical Analysis

    Full text link
    This paper introduces a new method named Distance-based Independence Screening for Canonical Analysis (DISCA) to reduce dimensions of two random vectors with arbitrary dimensions. The objective of our method is to identify the low dimensional linear projections of two random vectors, such that any dimension reduction based on linear projection with lower dimensions will surely affect some dependent structure -- the removed components are not independent. The essence of DISCA is to use the distance correlation to eliminate the "redundant" dimensions until infeasible. Unlike the existing canonical analysis methods, DISCA does not require the dimensions of the reduced subspaces of the two random vectors to be equal, nor does it require certain distributional assumption on the random vectors. We show that under mild conditions, our approach does undercover the lowest possible linear dependency structures between two random vectors, and our conditions are weaker than some sufficient linear subspace-based methods. Numerically, DISCA is to solve a non-convex optimization problem. We formulate it as a difference-of-convex (DC) optimization problem, and then further adopt the alternating direction method of multipliers (ADMM) on the convex step of the DC algorithms to parallelize/accelerate the computation. Some sufficient linear subspace-based methods use potentially numerically-intensive bootstrap method to determine the dimensions of the reduced subspaces in advance; our method avoids this complexity. In simulations, we present cases that DISCA can solve effectively, while other methods cannot. In both the simulation studies and real data cases, when the other state-of-the-art dimension reduction methods are applicable, we observe that DISCA performs either comparably or better than most of them. Codes and an R package can be found in GitHub https://github.com/ChuanpingYu/DISCA

    Multi-dimensional Virtual Values and Second-degree Price Discrimination

    Full text link
    We consider a multi-dimensional screening problem of selling a product with multiple quality levels and design virtual value functions to derive conditions that imply optimality of only selling highest quality. A challenge of designing virtual values for multi-dimensional agents is that a mechanism that pointwise optimizes virtual values resulting from a general application of integration by parts is not incentive compatible, and no general methodology is known for selecting the right paths for integration by parts. We resolve this issue by first uniquely solving for paths that satisfy certain necessary conditions that the pointwise optimality of the mechanism imposes on virtual values, and then identifying distributions that ensure the resulting virtual surplus is indeed pointwise optimized by the mechanism. Our method of solving for virtual values is general, and as a second application we use it to derive conditions of optimality for selling only the grand bundle of items to an agent with additive preferences

    Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis

    Full text link
    The flare-productivity of an active region is observed to be related to its spatial complexity. Mount Wilson or McIntosh sunspot classifications measure such complexity but in a categorical way, and may therefore not use all the information present in the observations. Moreover, such categorical schemes hinder a systematic study of an active region's evolution for example. We propose fine-scale quantitative descriptors for an active region's complexity and relate them to the Mount Wilson classification. We analyze the local correlation structure within continuum and magnetogram data, as well as the cross-correlation between continuum and magnetogram data. We compute the intrinsic dimension, partial correlation, and canonical correlation analysis (CCA) of image patches of continuum and magnetogram active region images taken from the SOHO-MDI instrument. We use masks of sunspots derived from continuum as well as larger masks of magnetic active regions derived from the magnetogram to analyze separately the core part of an active region from its surrounding part. We find the relationship between complexity of an active region as measured by Mount Wilson and the intrinsic dimension of its image patches. Partial correlation patterns exhibit approximately a third-order Markov structure. CCA reveals different patterns of correlation between continuum and magnetogram within the sunspots and in the region surrounding the sunspots. These results also pave the way for patch-based dictionary learning with a view towards automatic clustering of active regions.Comment: Accepted for publication in the Journal of Space Weather and Space Climate (SWSC). 23 pages, 11 figure

    Generalized resolution for orthogonal arrays

    Full text link
    The generalized word length pattern of an orthogonal array allows a ranking of orthogonal arrays in terms of the generalized minimum aberration criterion (Xu and Wu [Ann. Statist. 29 (2001) 1066-1077]). We provide a statistical interpretation for the number of shortest words of an orthogonal array in terms of sums of R2R^2 values (based on orthogonal coding) or sums of squared canonical correlations (based on arbitrary coding). Directly related to these results, we derive two versions of generalized resolution for qualitative factors, both of which are generalizations of the generalized resolution by Deng and Tang [Statist. Sinica 9 (1999) 1071-1082] and Tang and Deng [Ann. Statist. 27 (1999) 1914-1926]. We provide a sufficient condition for one of these to attain its upper bound, and we provide explicit upper bounds for two classes of symmetric designs. Factor-wise generalized resolution values provide useful additional detail.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1205 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore