2,669 research outputs found
On Graphical Models via Univariate Exponential Family Distributions
Undirected graphical models, or Markov networks, are a popular class of
statistical models, used in a wide variety of applications. Popular instances
of this class include Gaussian graphical models and Ising models. In many
settings, however, it might not be clear which subclass of graphical models to
use, particularly for non-Gaussian and non-categorical data. In this paper, we
consider a general sub-class of graphical models where the node-wise
conditional distributions arise from exponential families. This allows us to
derive multivariate graphical model distributions from univariate exponential
family distributions, such as the Poisson, negative binomial, and exponential
distributions. Our key contributions include a class of M-estimators to fit
these graphical model distributions; and rigorous statistical analysis showing
that these M-estimators recover the true graphical model structure exactly,
with high probability. We provide examples of genomic and proteomic networks
learned via instances of our class of graphical models derived from Poisson and
exponential distributions.Comment: Journal of Machine Learning Researc
Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model
Cataloging the neuronal cell types that comprise circuitry of individual
brain regions is a major goal of modern neuroscience and the BRAIN initiative.
Single-cell RNA sequencing can now be used to measure the gene expression
profiles of individual neurons and to categorize neurons based on their gene
expression profiles. While the single-cell techniques are extremely powerful
and hold great promise, they are currently still labor intensive, have a high
cost per cell, and, most importantly, do not provide information on spatial
distribution of cell types in specific regions of the brain. We propose a
complementary approach that uses computational methods to infer the cell types
and their gene expression profiles through analysis of brain-wide single-cell
resolution in situ hybridization (ISH) imagery contained in the Allen Brain
Atlas (ABA). We measure the spatial distribution of neurons labeled in the ISH
image for each gene and model it as a spatial point process mixture, whose
mixture weights are given by the cell types which express that gene. By fitting
a point process mixture model jointly to the ISH images, we infer both the
spatial point process distribution for each cell type and their gene expression
profile. We validate our predictions of cell type-specific gene expression
profiles using single cell RNA sequencing data, recently published for the
mouse somatosensory cortex. Jointly with the gene expression profiles, cell
features such as cell size, orientation, intensity and local density level are
inferred per cell type
The Information Geometry of Sparse Goodness-of-Fit Testing
This paper takes an information-geometric approach to the challenging issue of goodness-of-fit testing in the high dimensional, low sample size context where—potentially—boundary effects dominate. The main contributions of this paper are threefold: first, we present and prove two new theorems on the behaviour of commonly used test statistics in this context; second, we investigate—in the novel environment of the extended multinomial model—the links between information geometry-based divergences and standard goodness-of-fit statistics, allowing us to formalise relationships which have been missing in the literature; finally, we use simulation studies to validate and illustrate our theoretical results and to explore currently open research questions about the way that discretisation effects can dominate sampling distributions near the boundary. Novelly accommodating these discretisation effects contrasts sharply with the essentially continuous approach of skewness and other corrections flowing from standard higher-order asymptotic analysis
- …