9,806 research outputs found

    Bayesian testing of many hypotheses ×\times many genes: A study of sleep apnea

    Full text link
    Substantial statistical research has recently been devoted to the analysis of large-scale microarray experiments which provide a measure of the simultaneous expression of thousands of genes in a particular condition. A typical goal is the comparison of gene expression between two conditions (e.g., diseased vs. nondiseased) to detect genes which show differential expression. Classical hypothesis testing procedures have been applied to this problem and more recent work has employed sophisticated models that allow for the sharing of information across genes. However, many recent gene expression studies have an experimental design with several conditions that requires an even more involved hypothesis testing approach. In this paper, we use a hierarchical Bayesian model to address the situation where there are many hypotheses that must be simultaneously tested for each gene. In addition to having many hypotheses within each gene, our analysis also addresses the more typical multiple comparison issue of testing many genes simultaneously. We illustrate our approach with an application to a study of genes involved in obstructive sleep apnea in humans.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS241 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Objective Bayesian Search of Gaussian DAG Models with Non-local Priors

    Get PDF
    Directed Acyclic Graphical (DAG) models are increasingly employed in the study of physical and biological systems, where directed edges between vertices are used to model direct influences between variables. Identifying the graph from data is a challenging endeavor, which can be more reasonably tackled if the variables are assumed to satisfy a given ordering; in this case, we simply have to estimate the presence or absence of each possible edge, whose direction is established by the ordering of the variables. We propose an objective Bayesian methodology for model search over the space of Gaussian DAG models, which only requires default non-local priors as inputs. Priors of this kind are especially suited to learn sparse graphs, because they allow a faster learning rate, relative to ordinary local priors, when the true unknown sampling distribution belongs to a simple model. We implement an efficient stochastic search algorithm, which deals effectively with data sets having sample size smaller than the number of variables. We apply our method to a variety of simulated and real data sets.Fractional Bayes factor; High-dimensional sparse graph; Moment prior; Non-local prior; Objective Bayes; Pathway based prior; Regulatory network; Stochastic search; Structural learning.

    Likelihood based observability analysis and confidence intervals for predictions of dynamic models

    Get PDF
    Mechanistic dynamic models of biochemical networks such as Ordinary Differential Equations (ODEs) contain unknown parameters like the reaction rate constants and the initial concentrations of the compounds. The large number of parameters as well as their nonlinear impact on the model responses hamper the determination of confidence regions for parameter estimates. At the same time, classical approaches translating the uncertainty of the parameters into confidence intervals for model predictions are hardly feasible. In this article it is shown that a so-called prediction profile likelihood yields reliable confidence intervals for model predictions, despite arbitrarily complex and high-dimensional shapes of the confidence regions for the estimated parameters. Prediction confidence intervals of the dynamic states allow a data-based observability analysis. The approach renders the issue of sampling a high-dimensional parameter space into evaluating one-dimensional prediction spaces. The method is also applicable if there are non-identifiable parameters yielding to some insufficiently specified model predictions that can be interpreted as non-observability. Moreover, a validation profile likelihood is introduced that should be applied when noisy validation experiments are to be interpreted. The properties and applicability of the prediction and validation profile likelihood approaches are demonstrated by two examples, a small and instructive ODE model describing two consecutive reactions, and a realistic ODE model for the MAP kinase signal transduction pathway. The presented general approach constitutes a concept for observability analysis and for generating reliable confidence intervals of model predictions, not only, but especially suitable for mathematical models of biological systems

    RANK-BASED TEMPO-SPATIAL CLUSTERING: A FRAMEWORK FOR RAPID OUTBREAK DETECTION USING SINGLE OR MULTIPLE DATA STREAMS

    Get PDF
    In the recent decades, algorithms for disease outbreak detection have become one of the main interests of public health practitioners to identify and localize an outbreak as early as possible in order to warrant further public health response before a pandemic develops. Today’s increased threat of biological warfare and terrorism provide an even stronger impetus to develop methods for outbreak detection based on symptoms as well as definitive laboratory diagnoses. In this dissertation work, I explore the problems of rapid disease outbreak detection using both spatial and temporal information. I develop a framework of non-parameterized algorithms which search for patterns of disease outbreak in spatial sub-regions of the monitored region within a certain period. Compared to the current existing spatial or tempo-spatial algorithm, the algorithms in this framework provide a methodology for fast searching of either univariate data set or multivariate data set. It first measures which study area is more likely to have an outbreak occurring given the baseline data and currently observed data. Then it applies a greedy searching mechanism to look for clusters with high posterior probabilities given the risk measurement for each unit area as heuristic. I also explore the performance of the proposed algorithms. From the perspective of predictive modeling, I adopt a Gamma-Poisson (GP) model to compute the probability of having an outbreak in each cluster when analyzing univariate data. I build a multinomial generalized Dirichlet (MGD) model to identify outbreak clusters from multivariate data which include the OTC data streams collected by the national retail data monitor (NRDM) and the ED data streams collected by the RODS system. Key contributions of this dissertation include 1) it introduces a rank-based tempo-spatial clustering algorithm, RSC, by utilizing greedy searching and Bayesian GP model for disease outbreak detection with comparable detection timeliness, cluster positive prediction value (PPV) and improved running time; 2) it proposes a multivariate extension of RSC (MRSC) which applies MGD model. The evaluation demonstrated the advantage that MGD model can effectively suppress the false alarms caused by elevated signals that are non-disease relevant and occur in all the monitored data streams
    corecore