99,878 research outputs found
Latent Conjunctive Bayesian Network: Unify Attribute Hierarchy and Bayesian Network for Cognitive Diagnosis
Cognitive diagnostic assessment aims to measure specific knowledge structures
in students. To model data arising from such assessments, cognitive diagnostic
models with discrete latent variables have gained popularity in educational and
behavioral sciences. In a learning context, the latent variables often denote
sequentially acquired skill attributes, which is often modeled by the so-called
attribute hierarchy method. One drawback of the traditional attribute hierarchy
method is that its parameter complexity varies substantially with the
hierarchy's graph structure, lacking statistical parsimony. Additionally,
arrows among the attributes do not carry an interpretation of statistical
dependence. Motivated by these, we propose a new family of latent conjunctive
Bayesian networks (LCBNs), which rigorously unify the attribute hierarchy
method for sequential skill mastery and the Bayesian network model in
statistical machine learning. In an LCBN, the latent graph not only retains the
hard constraints on skill prerequisites as an attribute hierarchy, but also
encodes nice conditional independence interpretation as a Bayesian network.
LCBNs are identifiable, interpretable, and parsimonious statistical tools to
diagnose students' cognitive abilities from assessment data. We propose an
efficient two-step EM algorithm for structure learning and parameter estimation
in LCBNs. Application of our method to an international educational assessment
dataset gives interpretable findings of cognitive diagnosis
Partition MCMC for inference on acyclic digraphs
Acyclic digraphs are the underlying representation of Bayesian networks, a
widely used class of probabilistic graphical models. Learning the underlying
graph from data is a way of gaining insights about the structural properties of
a domain. Structure learning forms one of the inference challenges of
statistical graphical models.
MCMC methods, notably structure MCMC, to sample graphs from the posterior
distribution given the data are probably the only viable option for Bayesian
model averaging. Score modularity and restrictions on the number of parents of
each node allow the graphs to be grouped into larger collections, which can be
scored as a whole to improve the chain's convergence. Current examples of
algorithms taking advantage of grouping are the biased order MCMC, which acts
on the alternative space of permuted triangular matrices, and non ergodic edge
reversal moves.
Here we propose a novel algorithm, which employs the underlying combinatorial
structure of DAGs to define a new grouping. As a result convergence is improved
compared to structure MCMC, while still retaining the property of producing an
unbiased sample. Finally the method can be combined with edge reversal moves to
improve the sampler further.Comment: Revised version. 34 pages, 16 figures. R code available at
https://github.com/annlia/partitionMCM
An Efficient Methodology for Learning Bayesian Networks
Statistics from the National Cancer Institute indicate that 1 in 8 women will develop Breast cancer in their lifetime. Researchers have developed numerous statistical models to predict breast cancer risk however physicians are hesitant to use these models because of disparities in the predictions they produce. In an effort to reduce these disparities, we use Bayesian networks to capture the joint distribution of risk factors, and simulate artificial patient populations (clinical avatars) for interrogating the existing risk prediction models. The challenge in this effort has been to produce a Bayesian network whose dependencies agree with literature and are good estimates of the joint distribution of risk factors. In this work, we propose a methodology for learning Bayesian networks that uses prior knowledge to guide a collection of search algorithms in identifying an optimum structure. Using data from the breast cancer surveillance consortium we have shown that our methodology produces a Bayesian network with consistent dependencies and a better estimate of the distribution of risk factors compared with existing method
Understanding the genetic basis of complex polygenic traits through Bayesian model selection of multiple genetic models and network modeling of family-based genetic data
The global aim of this dissertation is to develop advanced statistical modeling to understand the genetic basis of complex polygenic traits. In order to achieve this goal, this dissertation focuses on the development of (i) a novel methodology to detect genetic variants with different inheritance patterns formulated as a Bayesian model selection problem, (ii) integration of genetic data and non-genetic data to dissect the genotype-phenotype associations using Bayesian networks with family-based data, and (iii) an efficient technique to model the family-based data in the Bayesian framework.
In the first part of my dissertation, I present a coherent Bayesian framework for selection of the most likely model from the five genetic models (genotypic, additive, dominant, co-dominant, and recessive) used in genetic association studies. The approach uses a polynomial parameterization of genetic data to simultaneously fit the five models and save computations. I provide a closed-form expression of the marginal likelihood for normally distributed data, and evaluate the performance of the proposed method and existing methods through simulated and real genome-wide data sets.
The second part of this dissertation presents an integrative analytic approach that utilizes Bayesian networks to represent the complex probabilistic dependency structure among many variables from family-based data. I propose a parameterization that extends mixed effects regression models to Bayesian networks by using random effects as additional nodes of the networks to model the between-subjects correlations. I also present results of simulation studies to compare different model selection metrics for mixed models that can be used for learning BNs from correlated data and application of this methodology to real data from a large family-based study.
In the third part of this dissertation, I describe an efficient way to account for family structure in Bayesian inference Using Gibbs Sampling (BUGS). In linear mixed models, a random effects vector has a variance-covariance matrix whose dimension is as large as the sample size. However, a direct handling of this multivariate normal distribution is not computationally feasible in BUGS. Therefore, I propose a decomposition of this multivariate normal distribution into univariate normal distributions using singular value decomposition, and implementation in BUGS is presented
Sparse graphical models for cancer signalling
Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling.
First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathway and network-based priors, and illustrate the proposed method on both synthetic and drug response data.
Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters.
We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments.
Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure.
Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data
- …