4 research outputs found
Joint Nonparametric Precision Matrix Estimation with Confounding
We consider the problem of precision matrix estimation where, due to
extraneous confounding of the underlying precision matrix, the data are
independent but not identically distributed. While such confounding occurs in
many scientific problems, our approach is inspired by recent neuroscientific
research suggesting that brain function, as measured using functional magnetic
resonance imagine (fMRI), is susceptible to confounding by physiological noise
such as breathing and subject motion. Following the scientific motivation, we
propose a graphical model, which in turn motivates a joint nonparametric
estimator. We provide theoretical guarantees for the consistency and the
convergence rate of the proposed estimator. In addition, we demonstrate that
the optimization of the proposed estimator can be transformed into a series of
linear programming problems, and thus be efficiently solved in parallel.
Empirical results are presented using simulated and real brain imaging data,
which suggest that our approach improves precision matrix estimation, as
compared to baselines, when confounding is present
Structure Learning in Graphical Modeling
A graphical model is a statistical model that is associated to a graph whose
nodes correspond to variables of interest. The edges of the graph reflect
allowed conditional dependencies among the variables. Graphical models admit
computationally convenient factorization properties and have long been a
valuable tool for tractable modeling of multivariate distributions. More
recently, applications such as reconstructing gene regulatory networks from
gene expression data have driven major advances in structure learning, that is,
estimating the graph underlying a model. We review some of these advances and
discuss methods such as the graphical lasso and neighborhood selection for
undirected graphical models (or Markov random fields), and the PC algorithm and
score-based search methods for directed graphical models (or Bayesian
networks). We further review extensions that account for effects of latent
variables and heterogeneous data sources
Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring
We consider the two-group classification problem and propose a kernel
classifier based on the optimal scoring framework. Unlike previous approaches,
we provide theoretical guarantees on the expected risk consistency of the
method. We also allow for feature selection by imposing structured sparsity
using weighted kernels. We propose fully-automated methods for selection of all
tuning parameters, and in particular adapt kernel shrinkage ideas for ridge
parameter selection. Numerical studies demonstrate the superior classification
performance of the proposed approach compared to existing nonparametric
classifiers.Comment: 24 page
Unsupervised and Supervised Structure Learning for Protein Contact Prediction
Protein contacts provide key information for the understanding of protein
structure and function, and therefore contact prediction from sequences is an
important problem. Recent research shows that some correctly predicted
long-range contacts could help topology-level structure modeling. Thus, contact
prediction and contact-assisted protein folding also proves the importance of
this problem. In this thesis, I will briefly introduce the extant related work,
then show how to establish the contact prediction through unsupervised
graphical models with topology constraints. Further, I will explain how to use
the supervised deep learning methods to further boost the accuracy of contact
prediction. Finally, I will propose a scoring system called diversity score to
measure the novelty of contact predictions, as well as an algorithm that
predicts contacts with respect to the new scoring system.Comment: PhD Thesi