33 research outputs found
Convex Rank Tests and Semigraphoids
Convex rank tests are partitions of the symmetric group which have desirable
geometric properties. The statistical tests defined by such partitions involve
counting all permutations in the equivalence classes. Each class consists of
the linear extensions of a partially ordered set specified by data. Our methods
refine existing rank tests of non-parametric statistics, such as the sign test
and the runs test, and are useful for exploratory analysis of ordinal data. We
establish a bijection between convex rank tests and probabilistic conditional
independence structures known as semigraphoids. The subclass of submodular rank
tests is derived from faces of the cone of submodular functions, or from
Minkowski summands of the permutohedron. We enumerate all small instances of
such rank tests. Of particular interest are graphical tests, which correspond
to both graphical models and to graph associahedra
Generalized Permutohedra from Probabilistic Graphical Models
A graphical model encodes conditional independence relations via the Markov
properties. For an undirected graph these conditional independence relations
can be represented by a simple polytope known as the graph associahedron, which
can be constructed as a Minkowski sum of standard simplices. There is an
analogous polytope for conditional independence relations coming from a regular
Gaussian model, and it can be defined using multiinformation or relative
entropy. For directed acyclic graphical models and also for mixed graphical
models containing undirected, directed and bidirected edges, we give a
construction of this polytope, up to equivalence of normal fans, as a Minkowski
sum of matroid polytopes. Finally, we apply this geometric insight to construct
a new ordering-based search algorithm for causal inference via directed acyclic
graphical models.Comment: Appendix B is expanded. Final version to appear in SIAM J. Discrete
Mat
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies
The problem of finding periodically expressed genes from time course
microarray experiments is at the center of numerous efforts to identify the
molecular components of biological clocks. We present a new approach to this
problem based on the cyclohedron test, which is a rank test inspired by recent
advances in algebraic combinatorics. The test has the advantage of being robust
to measurement errors, and can be used to ascertain the significance of
top-ranked genes. We apply the test to recently published measurements of gene
expression during mouse somitogenesis and find 32 genes that collectively are
significant. Among these are previously identified periodic genes involved in
the Notch/FGF and Wnt signaling pathways, as well as novel candidate genes that
may play a role in regulating the segmentation clock. These results confirm
that there are an abundance of exceptionally periodic genes expressed during
somitogenesis. The emphasis of this paper is on the statistics and
combinatorics that underlie the cyclohedron test and its implementation within
a multiple testing framework.Comment: Revision consists of reorganization and further statistical
discussion; 19 pages, 4 figure
"Building" exact confidence nets
Confidence nets, that is, collections of confidence intervals that fill out
the parameter space and whose exact parameter coverage can be computed, are
familiar in nonparametric statistics. Here, the distributional assumptions are
based on invariance under the action of a finite reflection group. Exact
confidence nets are exhibited for a single parameter, based on the root system
of the group. The main result is a formula for the generating function of the
coverage interval probabilities. The proof makes use of the theory of
"buildings" and the Chevalley factorization theorem for the length distribution
on Cayley graphs of finite reflection groups.Comment: 20 pages. To appear in Bernoull
Using TPA to count linear extensions
A linear extension of a poset is a permutation of the elements of the set
that respects the partial order. Let denote the number of linear
extensions. It is a #P complete problem to determine exactly for an
arbitrary poset, and so randomized approximation algorithms that draw randomly
from the set of linear extensions are used. In this work, the set of linear
extensions is embedded in a larger state space with a continuous parameter ?.
The introduction of a continuous parameter allows for the use of a more
efficient method for approximating called TPA. Our primary result is
that it is possible to sample from this continuous embedding in time that as
fast or faster than the best known methods for sampling uniformly from linear
extensions. For a poset containing elements, this means we can approximate
to within a factor of with probability at least using an expected number of random bits and comparisons in the poset
which is at most Comment: 12 pages, 4 algorithm