137,217 research outputs found
Sufficient Canonical Correlation Analysis
Canonical correlation analysis (CCA) is an effective
way to find two appropriate subspaces in which Pearson’s correlation
coefficients are maximized between projected random vectors.
Due to its well-established theoretical support and relatively
efficient computation, CCA is widely used as a joint dimension
reduction tool and has been successfully applied to many image
processing and computer vision tasks. However, as reported,
the traditional CCA suffers from overfitting in many practical
cases. In this paper, we propose sufficient CCA (S-CCA) to
relieve CCA’s overfitting problem, which is inspired by the theory
of sufficient dimension reduction. The effectiveness of S-CCA
is verified both theoretically and experimentally. Experimental
results also demonstrate that our S-CCA outperforms some of
CCA’s popular extensions during the prediction phase, especially
when severe overfitting occurs
Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem
We propose a new algorithm for sparse estimation of eigenvectors in
generalized eigenvalue problems (GEP). The GEP arises in a number of modern
data-analytic situations and statistical methods, including principal component
analysis (PCA), multiclass linear discriminant analysis (LDA), canonical
correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant
co-ordinate selection. We propose to modify the standard generalized orthogonal
iteration with a sparsity-inducing penalty for the eigenvectors. To achieve
this goal, we generalize the equation-solving step of orthogonal iteration to a
penalized convex optimization problem. The resulting algorithm, called
penalized orthogonal iteration, provides accurate estimation of the true
eigenspace, when it is sparse. Also proposed is a computationally more
efficient alternative, which works well for PCA and LDA problems. Numerical
studies reveal that the proposed algorithms are competitive, and that our
tuning procedure works well. We demonstrate applications of the proposed
algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR.
Supplementary materials are available online
Distance-Based Independence Screening for Canonical Analysis
This paper introduces a new method named Distance-based Independence
Screening for Canonical Analysis (DISCA) to reduce dimensions of two random
vectors with arbitrary dimensions. The objective of our method is to identify
the low dimensional linear projections of two random vectors, such that any
dimension reduction based on linear projection with lower dimensions will
surely affect some dependent structure -- the removed components are not
independent. The essence of DISCA is to use the distance correlation to
eliminate the "redundant" dimensions until infeasible. Unlike the existing
canonical analysis methods, DISCA does not require the dimensions of the
reduced subspaces of the two random vectors to be equal, nor does it require
certain distributional assumption on the random vectors. We show that under
mild conditions, our approach does undercover the lowest possible linear
dependency structures between two random vectors, and our conditions are weaker
than some sufficient linear subspace-based methods. Numerically, DISCA is to
solve a non-convex optimization problem. We formulate it as a
difference-of-convex (DC) optimization problem, and then further adopt the
alternating direction method of multipliers (ADMM) on the convex step of the DC
algorithms to parallelize/accelerate the computation. Some sufficient linear
subspace-based methods use potentially numerically-intensive bootstrap method
to determine the dimensions of the reduced subspaces in advance; our method
avoids this complexity. In simulations, we present cases that DISCA can solve
effectively, while other methods cannot. In both the simulation studies and
real data cases, when the other state-of-the-art dimension reduction methods
are applicable, we observe that DISCA performs either comparably or better than
most of them. Codes and an R package can be found in GitHub
https://github.com/ChuanpingYu/DISCA
Multi-dimensional Virtual Values and Second-degree Price Discrimination
We consider a multi-dimensional screening problem of selling a product with
multiple quality levels and design virtual value functions to derive conditions
that imply optimality of only selling highest quality. A challenge of designing
virtual values for multi-dimensional agents is that a mechanism that pointwise
optimizes virtual values resulting from a general application of integration by
parts is not incentive compatible, and no general methodology is known for
selecting the right paths for integration by parts. We resolve this issue by
first uniquely solving for paths that satisfy certain necessary conditions that
the pointwise optimality of the mechanism imposes on virtual values, and then
identifying distributions that ensure the resulting virtual surplus is indeed
pointwise optimized by the mechanism. Our method of solving for virtual values
is general, and as a second application we use it to derive conditions of
optimality for selling only the grand bundle of items to an agent with additive
preferences
Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis
The flare-productivity of an active region is observed to be related to its
spatial complexity. Mount Wilson or McIntosh sunspot classifications measure
such complexity but in a categorical way, and may therefore not use all the
information present in the observations. Moreover, such categorical schemes
hinder a systematic study of an active region's evolution for example. We
propose fine-scale quantitative descriptors for an active region's complexity
and relate them to the Mount Wilson classification. We analyze the local
correlation structure within continuum and magnetogram data, as well as the
cross-correlation between continuum and magnetogram data. We compute the
intrinsic dimension, partial correlation, and canonical correlation analysis
(CCA) of image patches of continuum and magnetogram active region images taken
from the SOHO-MDI instrument. We use masks of sunspots derived from continuum
as well as larger masks of magnetic active regions derived from the magnetogram
to analyze separately the core part of an active region from its surrounding
part. We find the relationship between complexity of an active region as
measured by Mount Wilson and the intrinsic dimension of its image patches.
Partial correlation patterns exhibit approximately a third-order Markov
structure. CCA reveals different patterns of correlation between continuum and
magnetogram within the sunspots and in the region surrounding the sunspots.
These results also pave the way for patch-based dictionary learning with a view
towards automatic clustering of active regions.Comment: Accepted for publication in the Journal of Space Weather and Space
Climate (SWSC). 23 pages, 11 figure
Generalized resolution for orthogonal arrays
The generalized word length pattern of an orthogonal array allows a ranking
of orthogonal arrays in terms of the generalized minimum aberration criterion
(Xu and Wu [Ann. Statist. 29 (2001) 1066-1077]). We provide a statistical
interpretation for the number of shortest words of an orthogonal array in terms
of sums of values (based on orthogonal coding) or sums of squared
canonical correlations (based on arbitrary coding). Directly related to these
results, we derive two versions of generalized resolution for qualitative
factors, both of which are generalizations of the generalized resolution by
Deng and Tang [Statist. Sinica 9 (1999) 1071-1082] and Tang and Deng [Ann.
Statist. 27 (1999) 1914-1926]. We provide a sufficient condition for one of
these to attain its upper bound, and we provide explicit upper bounds for two
classes of symmetric designs. Factor-wise generalized resolution values provide
useful additional detail.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1205 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …