1,500 research outputs found
Detecting epistasis via Markov bases
Rapid research progress in genotyping techniques have allowed large
genome-wide association studies. Existing methods often focus on determining
associations between single loci and a specific phenotype. However, a
particular phenotype is usually the result of complex relationships between
multiple loci and the environment. In this paper, we describe a two-stage
method for detecting epistasis by combining the traditionally used single-locus
search with a search for multiway interactions. Our method is based on an
extended version of Fisher's exact test. To perform this test, a Markov chain
is constructed on the space of multidimensional contingency tables using the
elements of a Markov basis as moves. We test our method on simulated data and
compare it to a two-stage logistic regression method and to a fully Bayesian
method, showing that we are able to detect the interacting loci when other
methods fail to do so. Finally, we apply our method to a genome-wide data set
consisting of 685 dogs and identify epistasis associated with canine hair
length for four pairs of SNPs
Packing ellipsoids with overlap
The problem of packing ellipsoids of different sizes and shapes into an
ellipsoidal container so as to minimize a measure of overlap between ellipsoids
is considered. A bilevel optimization formulation is given, together with an
algorithm for the general case and a simpler algorithm for the special case in
which all ellipsoids are in fact spheres. Convergence results are proved and
computational experience is described and illustrated. The motivating
application - chromosome organization in the human cell nucleus - is discussed
briefly, and some illustrative results are presented
Scalable Unbalanced Optimal Transport using Generative Adversarial Networks
Generative adversarial networks (GANs) are an expressive class of neural
generative models with tremendous success in modeling high-dimensional
continuous measures. In this paper, we present a scalable method for unbalanced
optimal transport (OT) based on the generative-adversarial framework. We
formulate unbalanced OT as a problem of simultaneously learning a transport map
and a scaling factor that push a source measure to a target measure in a
cost-optimal manner. In addition, we propose an algorithm for solving this
problem based on stochastic alternating gradient updates, similar in practice
to GANs. We also provide theoretical justification for this formulation,
showing that it is closely related to an existing static formulation by Liero
et al. (2018), and perform numerical experiments demonstrating how this
methodology can be applied to population modeling
Geometry of Log-Concave Density Estimation
Shape-constrained density estimation is an important topic in mathematical
statistics. We focus on densities on that are log-concave, and
we study geometric properties of the maximum likelihood estimator (MLE) for
weighted samples. Cule, Samworth, and Stewart showed that the logarithm of the
optimal log-concave density is piecewise linear and supported on a regular
subdivision of the samples. This defines a map from the space of weights to the
set of regular subdivisions of the samples, i.e. the face poset of their
secondary polytope. We prove that this map is surjective. In fact, every
regular subdivision arises in the MLE for some set of weights with positive
probability, but coarser subdivisions appear to be more likely to arise than
finer ones. To quantify these results, we introduce a continuous version of the
secondary polytope, whose dual we name the Samworth body. This article
establishes a new link between geometric combinatorics and nonparametric
statistics, and it suggests numerous open problems.Comment: 22 pages, 3 figure
Faithfulness and learning hypergraphs from discrete distributions
The concepts of faithfulness and strong-faithfulness are important for
statistical learning of graphical models. Graphs are not sufficient for
describing the association structure of a discrete distribution. Hypergraphs
representing hierarchical log-linear models are considered instead, and the
concept of parametric (strong-) faithfulness with respect to a hypergraph is
introduced. Strong-faithfulness ensures the existence of uniformly consistent
parameter estimators and enables building uniformly consistent procedures for a
hypergraph search. The strength of association in a discrete distribution can
be quantified with various measures, leading to different concepts of
strong-faithfulness. Lower and upper bounds for the proportions of
distributions that do not satisfy strong-faithfulness are computed for
different parameterizations and measures of association.Comment: 23 pages, 6 figure
- …