Search CORE

3,037 research outputs found

Markov Bases for Typical Block Effect Models of Two-way Contingency Tables

Author: Ogawa Mitsunori
Takemura Akimichi
Publication venue
Publication date: 01/01/2011
Field of study

Markov basis for statistical model of contingency tables gives a useful tool for performing the conditional test of the model via Markov chain Monte Carlo method. In this paper we derive explicit forms of Markov bases for change point models and block diagonal effect models, which are typical block-wise effect models of two-way contingency tables, and perform conditional tests with some real data sets.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Markov chain Monte Carlo tests for designed experiments

Author: Agresti
Akimichi Takemura
Aoki
Aoki
Aoki
Condra
Diaconis
Dinwoodie
Dobra
Freeman
Haberman
Hamada
Hastings
Hibi
McCullagh
Mehta
Pistone
Pistone
Robbiano
Satoshi Aoki
Takemura
Takemura
Publication venue: 'Elsevier BV'
Publication date: 15/11/2006
Field of study

We consider conditional exact tests of factor effects in designed experiments for discrete response variables. Similarly to the analysis of contingency tables, a Markov chain Monte Carlo method can be used for performing exact tests, when large-sample approximations are poor and the enumeration of the conditional sample space is infeasible. For designed experiments with a single observation for each run, we formulate log-linear or logistic models and consider a connected Markov chain over an appropriate sample space. In particular, we investigate fractional factorial designs with

2^{p-q}

runs, noting correspondences to the models for

2^{p-q}

contingency tables

arXiv.org e-Print Archive

CiteSeerX

Crossref

Goodness of fit for log-linear ERGMs

Author: Gross Elizabeth
Petrović Sonja
Stasi Despina
Publication venue
Publication date: 31/05/2022
Field of study

Many popular models from the networks literature can be viewed through a common lens of contingency tables on network dyads, resulting in \emph{log-linear ERGMs}: exponential family models for random graphs whose sufficient statistics are linear on the dyads. We propose a new model in this family, the \emph{

p_1

-SBM}, which combines node and group effects common in network formation mechanisms. In particular, it is a generalization of several well-known ERGMs including the stochastic blockmodel for undirected graphs, the degree-corrected version of it, and the directed

p_1

model without group structure. We frame the problem of testing model fit for the log-linear ERGM class through an exact conditional test whose

p

-value can be approximated efficiently in networks of both small and moderately large sizes. The sampling methods we build rely on a dynamic adaptation of Markov bases. We use quick estimation algorithms adapted from the contingency table literature and effective sampling methods rooted in graph theory and algebraic statistics. The performance and scalability of the method is demonstrated on two data sets from biology: the connectome of \emph{C. elegans} and the interactome of \emph{Arabidopsis thaliana}. These two networks -- a neuronal network and a protein-protein interaction network -- have been popular examples in the network science literature. Our work provides a model-based approach to studying them

arXiv.org e-Print Archive

Sequences of regressions and their independences

Author: K. Sadeghi
Kayvan Sadeghi
N. Wermuth
N. Wermuth
Nanny Wermuth
Publication venue
Publication date: 01/01/2011
Field of study

Ordered sequences of univariate or multivariate regressions provide statistical models for analysing data from randomized, possibly sequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, derive criteria to read all implied independences of a regression graph and prove criteria for Markov equivalence that is to judge whether two different graphs imply the same set of independence statements. Knowledge of Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited discussion paper in the journal TES

arXiv.org e-Print Archive

CiteSeerX

Chalmers Research

Chalmers Publication Library

Graphical Markov models, unifying results and their interpretation

Author: Allman
Andersen
Anderson
Anderson
Andersson
Andersson
Barndorff-Nielsen
Birch
Bollen
Castelo
Castelo
Chaudhuri
Cochran
Cox
Cox
Cox
Cox
Cox
Darroch
Dawid
Dawid
Dempster
Dempster
Drton
Edwards
Edwards
Eichler
Evans
Fisher
Foygel
Fried
Frydenberg
Geiger
Gibbs
Glonek
Grcar
Green
Haberman
Højsgaard
Jiang
Jöreskog
Kauermann
Khare
Koster
Lauritzen
Lauritzen
Lauritzen
Letac
Lněnička
Loh
Lupparelli
Ma
Mabry
Marchetti
Marchetti
Markov
McCullagh
McCullagh
Nemeth
Neumann
Oliver
Ostrowski
Ostrowski
Pearl
Pearl
Pearl
Pearl
Reingold
Richardson
Robins
Roverato
Sadeghi
Sadeghi
Sadeghi
San Martin
Schur
Shpitser
Simpson
Speed
Speed
Spirtes
Stanghellini
Strotz
Studený
Sundberg
Tarjan
Tikhonov
Tukey
Uhler
Wainer
Wainwright
Weisberg
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Wermuth
Whittaker
Wiedenbeck
Won
Wright
Wright
Zellner
Zwiernik
Publication venue
Publication date: 09/10/2015
Field of study

Graphical Markov models combine conditional independence constraints with graphical representations of stepwise data generating processes.The models started to be formulated about 40 years ago and vigorous development is ongoing. Longitudinal observational studies as well as intervention studies are best modeled via a subclass called regression graph models and, especially traceable regressions. Regression graphs include two types of undirected graph and directed acyclic graphs in ordered sequences of joint responses. Response components may correspond to discrete or continuous random variables and may depend exclusively on variables which have been generated earlier. These aspects are essential when causal hypothesis are the motivation for the planning of empirical studies. To turn the graphs into useful tools for tracing developmental pathways and for predicting structure in alternative models, the generated distributions have to mimic some properties of joint Gaussian distributions. Here, relevant results concerning these aspects are spelled out and illustrated by examples. With regression graph models, it becomes feasible, for the first time, to derive structural effects of (1) ignoring some of the variables, of (2) selecting subpopulations via fixed levels of some other variables or of (3) changing the order in which the variables might get generated. Thus, the most important future applications of these models will aim at the best possible integration of knowledge from related studies.Comment: 34 Pages, 11 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

A survey of statistical network models

Author: Alice X. Zheng
Anna Goldenberg
Citable Link
Edoardo M. Airoldi
Stephen E. Fienberg
Publication venue
Publication date: 01/01/2009
Field of study

Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

arXiv.org e-Print Archive

CiteSeerX

Addressing the unmet need for visualizing Conditional Random Fields in Biological Data

Author: Bartlett Christopher W.
Callahan Nicholas W
Dong Min
Li Q. Quinn
Liang Chun
Magliery Thomas J
Ray William C.
Wolock Samuel L.
Publication venue
Publication date: 01/01/2014
Field of study

Background: The biological world is replete with phenomena that appear to be ideally modeled and analyzed by one archetypal statistical framework - the Graphical Probabilistic Model (GPM). The structure of GPMs is a uniquely good match for biological problems that range from aligning sequences to modeling the genome-to-phenome relationship. The fundamental questions that GPMs address involve making decisions based on a complex web of interacting factors. Unfortunately, while GPMs ideally fit many questions in biology, they are not an easy solution to apply. Building a GPM is not a simple task for an end user. Moreover, applying GPMs is also impeded by the insidious fact that the complex web of interacting factors inherent to a problem might be easy to define and also intractable to compute upon. Discussion: We propose that the visualization sciences can contribute to many domains of the bio-sciences, by developing tools to address archetypal representation and user interaction issues in GPMs, and in particular a variety of GPM called a Conditional Random Field(CRF). CRFs bring additional power, and additional complexity, because the CRF dependency network can be conditioned on the query data. Conclusions: In this manuscript we examine the shared features of several biological problems that are amenable to modeling with CRFs, highlight the challenges that existing visualization and visual analytics paradigms induce for these data, and document an experimental solution called StickWRLD which, while leaving room for improvement, has been successfully applied in several biological research projects.Comment: BioVis 2014 conferenc

arXiv.org e-Print Archive

Springer - Publisher Connector

Algebraic Statistics in Practice: Applications to Networks

Author: Casanellas Marta
Petrović Sonja
Uhler Caroline
Publication venue
Publication date: 22/06/2019
Field of study

Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal structure discovery and phylogenetics. For each problem we give an overview of recent results in algebraic statistics with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines

arXiv.org e-Print Archive

DSpace@MIT

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Transcription regulation: models for combinatorial regulation and functional specificity

Author: Thomas David John
Publication venue
Publication date: 16/05/2014
Field of study

Gene regulation id controlled by transcription factor proteins that bind to specific DNA sequences, known as transcription factor binding sites (TFBSs). Combinations of transcription factors working, co-operatively in cis-regulatory modules (CRMs), play a role in regulating gene expression. Current computational methods for TFBS prediction cannot distinguish between functional and non-functional sites, and predict very large numbers of false positives. The thesis focuses on the development of a novel computational model, based on artificial neural networks (ANNs), for the identification of functional TFBSs, and the CRMs within which they operate in the human genome. Datasets of 12,239 experimentally verified true positive (TP) TFBSs and 130,199 false positive (FP) TFBSs were extracted using a combination of position weight matrices from the JASPAR database and experimentally verified sites from the Encyclopedia of DNA elements (ENCODE). A number of machine learning alsgorithms were tested using a range of genetic information including gene expression, necleosome positioning, DNA methylation states and DNA entropy. The best model, that gave a mean area under the curve under a receiver operator characteristic curve of 0.800, was based on a feedforward ANN using backpropagation. This model was then used to predict functional TFBSs in a number of gene sets from the human genome. The predictions, combined with experimentally proven TFBSs from ENCODE, were used to investigate combinatorial [atterns of TFBSs operating in CRMs. CRM patterns have been analysed in disease-associated genes located in linkage disequilibrium blocks containing SNPs obtained from Genome Wide Association Studies (GWAS). The potential for the model to make functional TFBS predictions to aid in the annotation of orphan genes of unknown function is discussed. In addition this thesis presents computational work on a number of smaller published studies

Sussex Research Online