10,748 research outputs found
Sparse logistic principal components analysis for binary data
We develop a new principal components analysis (PCA) type dimension reduction
method for binary data. Different from the standard PCA which is defined on the
observed data, the proposed PCA is defined on the logit transform of the
success probabilities of the binary observations. Sparsity is introduced to the
principal component (PC) loading vectors for enhanced interpretability and more
stable extraction of the principal components. Our sparse PCA is formulated as
solving an optimization problem with a criterion function motivated from a
penalized Bernoulli likelihood. A Majorization--Minimization algorithm is
developed to efficiently solve the optimization problem. The effectiveness of
the proposed sparse logistic PCA method is illustrated by application to a
single nucleotide polymorphism data set and a simulation study.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS327 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian Inference on Matrix Manifolds for Linear Dimensionality Reduction
We reframe linear dimensionality reduction as a problem of Bayesian inference
on matrix manifolds. This natural paradigm extends the Bayesian framework to
dimensionality reduction tasks in higher dimensions with simpler models at
greater speeds. Here an orthogonal basis is treated as a single point on a
manifold and is associated with a linear subspace on which observations vary
maximally. Throughout this paper, we employ the Grassmann and Stiefel manifolds
for various dimensionality reduction problems, explore the connection between
the two manifolds, and use Hybrid Monte Carlo for posterior sampling on the
Grassmannian for the first time. We delineate in which situations either
manifold should be considered. Further, matrix manifold models are used to
yield scientific insight in the context of cognitive neuroscience, and we
conclude that our methods are suitable for basic inference as well as accurate
prediction.Comment: All datasets and computer programs are publicly available at
http://www.ics.uci.edu/~babaks/Site/Codes.htm
Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors
We investigate the application of hierarchical classification schemes to the
annotation of gene function based on several characteristics of protein
sequences including phylogenic descriptors, sequence based attributes, and
predicted secondary structure. We discuss three Bayesian models and compare
their performance in terms of predictive accuracy. These models are the
ordinary multinomial logit (MNL) model, a hierarchical model based on a set of
nested MNL models, and a MNL model with a prior that introduces correlations
between the parameters for classes that are nearby in the hierarchy. We also
provide a new scheme for combining different sources of information. We use
these models to predict the functional class of Open Reading Frames (ORFs) from
the E. coli genome. The results from all three models show substantial
improvement over previous methods, which were based on the C5 algorithm. The
MNL model using a prior based on the hierarchy outperforms both the
non-hierarchical MNL model and the nested MNL model. In contrast to previous
attempts at combining these sources of information, our approach results in a
higher accuracy rate when compared to models that use each data source alone.
Together, these results show that gene function can be predicted with higher
accuracy than previously achieved, using Bayesian models that incorporate
suitable prior information
- …