Canonical correlation analysis (CCA) describes the associations between two
sets of variables by maximizing the correlation between linear combinations of
the variables in each data set. However, in high-dimensional settings where the
number of variables exceeds the sample size or when the variables are highly
correlated, traditional CCA is no longer appropriate. This paper proposes a
method for sparse CCA. Sparse estimation produces linear combinations of only a
subset of variables from each data set, thereby increasing the interpretability
of the canonical variates. We consider the CCA problem from a predictive point
of view and recast it into a regression framework. By combining an alternating
regression approach together with a lasso penalty, we induce sparsity in the
canonical vectors. We compare the performance with other sparse CCA techniques
in different simulation settings and illustrate its usefulness on a genomic
data set