5,615 research outputs found
A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data
Investigating the pleiotropic effects of genetic variants can increase
statistical power, provide important information to achieve deep understanding
of the complex genetic structures of disease, and offer powerful tools for
designing effective treatments with fewer side effects. However, the current
multiple phenotype association analysis paradigm lacks breadth (number of
phenotypes and genetic variants jointly analyzed at the same time) and depth
(hierarchical structure of phenotype and genotypes). A key issue for high
dimensional pleiotropic analysis is to effectively extract informative internal
representation and features from high dimensional genotype and phenotype data.
To explore multiple levels of representations of genetic variants, learn their
internal patterns involved in the disease development, and overcome critical
barriers in advancing the development of novel statistical methods and
computational algorithms for genetic pleiotropic analysis, we proposed a new
framework referred to as a quadratically regularized functional CCA (QRFCCA)
for association analysis which combines three approaches: (1) quadratically
regularized matrix factorization, (2) functional data analysis and (3)
canonical correlation analysis (CCA). Large-scale simulations show that the
QRFCCA has a much higher power than that of the nine competing statistics while
retaining the appropriate type 1 errors. To further evaluate performance, the
QRFCCA and nine other statistics are applied to the whole genome sequencing
dataset from the TwinsUK study. We identify a total of 79 genes with rare
variants and 67 genes with common variants significantly associated with the 46
traits using QRFCCA. The results show that the QRFCCA substantially outperforms
the nine other statistics.Comment: 64 pages including 12 figure
A Comparison of Relaxations of Multiset Cannonical Correlation Analysis and Applications
Canonical correlation analysis is a statistical technique that is used to
find relations between two sets of variables. An important extension in pattern
analysis is to consider more than two sets of variables. This problem can be
expressed as a quadratically constrained quadratic program (QCQP), commonly
referred to Multi-set Canonical Correlation Analysis (MCCA). This is a
non-convex problem and so greedy algorithms converge to local optima without
any guarantees on global optimality. In this paper, we show that despite being
highly structured, finding the optimal solution is NP-Hard. This motivates our
relaxation of the QCQP to a semidefinite program (SDP). The SDP is convex, can
be solved reasonably efficiently and comes with both absolute and
output-sensitive approximation quality. In addition to theoretical guarantees,
we do an extensive comparison of the QCQP method and the SDP relaxation on a
variety of synthetic and real world data. Finally, we present two useful
extensions: we incorporate kernel methods and computing multiple sets of
canonical vectors
Linking Image and Text with 2-Way Nets
Linking two data sources is a basic building block in numerous computer
vision problems. Canonical Correlation Analysis (CCA) achieves this by
utilizing a linear optimizer in order to maximize the correlation between the
two views. Recent work makes use of non-linear models, including deep learning
techniques, that optimize the CCA loss in some feature space. In this paper, we
introduce a novel, bi-directional neural network architecture for the task of
matching vectors from two data sources. Our approach employs two tied neural
network channels that project the two views into a common, maximally correlated
space using the Euclidean loss. We show a direct link between the
correlation-based loss and Euclidean loss, enabling the use of Euclidean loss
for correlation maximization. To overcome common Euclidean regression
optimization problems, we modify well-known techniques to our problem,
including batch normalization and dropout. We show state of the art results on
a number of computer vision matching tasks including MNIST image matching and
sentence-image matching on the Flickr8k, Flickr30k and COCO datasets.Comment: 14 pages, 2 figures, 6 table
- …