109 research outputs found

    Structure learning of undirected graphical models for count data

    Get PDF
    Biological processes underlying the basic functions of a cell involve complex interactions between genes. From a technical point of view, these interactions can be represented through a graph where genes and their connections are, respectively, nodes and edges. The main objective of this paper is to develop a statistical framework for modelling the interactions between genes when the activity of genes is measured on a discrete scale. In detail, we define a new algorithm for learning the structure of undirected graphs, PC-LPGM, proving its theoretical consistence in the limit of infinite observations. The proposed algorithm shows promising results when applied to simulated data as well as to real data

    Nearest-neighbor estimation for ROC analysis under verification bias

    Get PDF
    For a continuous-scale diagnostic test, the receiver operating characteristic (ROC) curve is a popular tool for displaying the ability of the test to discriminate between healthy and diseased subjects. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the test result and other characteristics of the subjects. Estimators of the ROC curve based only on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias, in particular under the assumption that the true disease status, if missing, is missing at random (MAR). MAR assumption means that the probability of missingness depends on the true disease status only through the test result and observed covariate information. However, the existing methods require parametric models for the (conditional) probability of disease and/or the (conditional) probability of verification, and hence are subject to model misspecification: a wrong specification of such parametric models can affect the behavior of the estimators, which can be inconsistent. To avoid misspecification problems, in this paper we propose a fully nonparametric method for the estimation of the ROC curve of a continuous test under verification bias. The method is based on nearest-neighbor imputation and adopts generic smooth regression models for both the probability that a subject is diseased and the probability that it is verified. Simulation experiments and an illustrative example show the usefulness of the new method. Variance estimation is also discussed

    Short term ozone effects on morbidity for the city of Milano, Italy, 1996-2003.

    Get PDF
    In this paper, we explore a range of concerns that arise in measuring short term ozone effects on health. In particular, we tackle the problem of measuring exposure using alternative daily measures of ozone derived from hourly concentrations. We adopt the exposure paradigm of Chiogna and Bellini (2002), and we compare its performances with respect to traditional exposure measures by exploiting model selection. For investigating model selection stability issues, we then apply the idea of bootstrapping the modelling process

    Semiparametric interval estimation of Pr[Y > X]

    Get PDF
    Let X and Y be two independent continuous random variables. We discuss three techniques to obtain confidence intervals for ρ_Pr[Y > X] in a semiparametric framework. One method relies on the asymptotic normality of an estimator for ρ; the remaining methods involve empirical likelihood and combine it with maximum likelihood estimation and with full parametric likelihood, respectively. Finite-sample accuracy of the confidence intervals is assessed through a simulation study. An illustration is given using a dataset on the detection of carriers of Duchenne Muscular Dystrophy

    Searching for a Source of Difference: a Graphical Model Approach

    Get PDF
    In this work, we look at a two-sample problem within the framework of Gaussian graphical models. When the global hypothesis of equality of two distributions is rejected, the interest is usually in localizing the source of di erence. Motivated by the idea that diseases can be seen as system perturbations, and by the need to distinguish between the origin of perturbation and components aected by the perturbation, we introduce the concept of a minimal seed set, and its graphical counterpart a graphical seed set. They intuitively consist of variables driving the dierence between the two conditions. We propose a simple and fast testing procedure to estimate the graphical seed set from data, and study its nite sample behavior with a stimulation study. We illustrate our approach in the context of gene set analysis by means of a publicly available gene expression dataset

    Combinations of covariance selections for graphical modelling.

    Get PDF
    We explore the possibility of composing the results of a fixed number of Gaussian graphical model selections on some partially overlapping variables. This appears to be an useful approach in all the research areas where a large amount of data from different sources and types of experiments is available. Therefore the focus is in binding together information coming from heterogeneous studies to improve the understanding of a particular phenomenon of interest. The proposed approach relies on numerical results on artificial and real data

    Simulating gene silencing through intervention analysis

    Get PDF
    We propose a novel method for simulating the effects of gene silencing. Our approach combines relevant subject matter information provided by biological pathways with gene expression levels measured in regular conditions to predict the behavior of the system after one of the genes has been silenced. We achieve this by modeling gene silencing as an external intervention in a causal graphical model. To account for the uncertainty associated with the structure learning of the graphical model, we adopt a bootstrap approach. We illustrate our proposal on a Drosophila melanogaster gene silencing experiment
    • 

    corecore