3,037 research outputs found

    Markov Bases for Typical Block Effect Models of Two-way Contingency Tables

    Get PDF
    Markov basis for statistical model of contingency tables gives a useful tool for performing the conditional test of the model via Markov chain Monte Carlo method. In this paper we derive explicit forms of Markov bases for change point models and block diagonal effect models, which are typical block-wise effect models of two-way contingency tables, and perform conditional tests with some real data sets.Comment: 16 page

    Markov chain Monte Carlo tests for designed experiments

    Full text link
    We consider conditional exact tests of factor effects in designed experiments for discrete response variables. Similarly to the analysis of contingency tables, a Markov chain Monte Carlo method can be used for performing exact tests, when large-sample approximations are poor and the enumeration of the conditional sample space is infeasible. For designed experiments with a single observation for each run, we formulate log-linear or logistic models and consider a connected Markov chain over an appropriate sample space. In particular, we investigate fractional factorial designs with 2pq2^{p-q} runs, noting correspondences to the models for 2pq2^{p-q} contingency tables

    Goodness of fit for log-linear ERGMs

    Full text link
    Many popular models from the networks literature can be viewed through a common lens of contingency tables on network dyads, resulting in \emph{log-linear ERGMs}: exponential family models for random graphs whose sufficient statistics are linear on the dyads. We propose a new model in this family, the \emph{p1p_1-SBM}, which combines node and group effects common in network formation mechanisms. In particular, it is a generalization of several well-known ERGMs including the stochastic blockmodel for undirected graphs, the degree-corrected version of it, and the directed p1p_1 model without group structure. We frame the problem of testing model fit for the log-linear ERGM class through an exact conditional test whose pp-value can be approximated efficiently in networks of both small and moderately large sizes. The sampling methods we build rely on a dynamic adaptation of Markov bases. We use quick estimation algorithms adapted from the contingency table literature and effective sampling methods rooted in graph theory and algebraic statistics. The performance and scalability of the method is demonstrated on two data sets from biology: the connectome of \emph{C. elegans} and the interactome of \emph{Arabidopsis thaliana}. These two networks -- a neuronal network and a protein-protein interaction network -- have been popular examples in the network science literature. Our work provides a model-based approach to studying them

    Sequences of regressions and their independences

    Full text link
    Ordered sequences of univariate or multivariate regressions provide statistical models for analysing data from randomized, possibly sequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, derive criteria to read all implied independences of a regression graph and prove criteria for Markov equivalence that is to judge whether two different graphs imply the same set of independence statements. Knowledge of Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited discussion paper in the journal TES

    Graphical Markov models, unifying results and their interpretation

    Full text link
    Graphical Markov models combine conditional independence constraints with graphical representations of stepwise data generating processes.The models started to be formulated about 40 years ago and vigorous development is ongoing. Longitudinal observational studies as well as intervention studies are best modeled via a subclass called regression graph models and, especially traceable regressions. Regression graphs include two types of undirected graph and directed acyclic graphs in ordered sequences of joint responses. Response components may correspond to discrete or continuous random variables and may depend exclusively on variables which have been generated earlier. These aspects are essential when causal hypothesis are the motivation for the planning of empirical studies. To turn the graphs into useful tools for tracing developmental pathways and for predicting structure in alternative models, the generated distributions have to mimic some properties of joint Gaussian distributions. Here, relevant results concerning these aspects are spelled out and illustrated by examples. With regression graph models, it becomes feasible, for the first time, to derive structural effects of (1) ignoring some of the variables, of (2) selecting subpopulations via fixed levels of some other variables or of (3) changing the order in which the variables might get generated. Thus, the most important future applications of these models will aim at the best possible integration of knowledge from related studies.Comment: 34 Pages, 11 figures, 1 tabl

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Addressing the unmet need for visualizing Conditional Random Fields in Biological Data

    Get PDF
    Background: The biological world is replete with phenomena that appear to be ideally modeled and analyzed by one archetypal statistical framework - the Graphical Probabilistic Model (GPM). The structure of GPMs is a uniquely good match for biological problems that range from aligning sequences to modeling the genome-to-phenome relationship. The fundamental questions that GPMs address involve making decisions based on a complex web of interacting factors. Unfortunately, while GPMs ideally fit many questions in biology, they are not an easy solution to apply. Building a GPM is not a simple task for an end user. Moreover, applying GPMs is also impeded by the insidious fact that the complex web of interacting factors inherent to a problem might be easy to define and also intractable to compute upon. Discussion: We propose that the visualization sciences can contribute to many domains of the bio-sciences, by developing tools to address archetypal representation and user interaction issues in GPMs, and in particular a variety of GPM called a Conditional Random Field(CRF). CRFs bring additional power, and additional complexity, because the CRF dependency network can be conditioned on the query data. Conclusions: In this manuscript we examine the shared features of several biological problems that are amenable to modeling with CRFs, highlight the challenges that existing visualization and visual analytics paradigms induce for these data, and document an experimental solution called StickWRLD which, while leaving room for improvement, has been successfully applied in several biological research projects.Comment: BioVis 2014 conferenc

    Algebraic Statistics in Practice: Applications to Networks

    Get PDF
    Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal structure discovery and phylogenetics. For each problem we give an overview of recent results in algebraic statistics with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines
    corecore