121 research outputs found
Detecting epistasis via Markov bases
Rapid research progress in genotyping techniques have allowed large
genome-wide association studies. Existing methods often focus on determining
associations between single loci and a specific phenotype. However, a
particular phenotype is usually the result of complex relationships between
multiple loci and the environment. In this paper, we describe a two-stage
method for detecting epistasis by combining the traditionally used single-locus
search with a search for multiway interactions. Our method is based on an
extended version of Fisher's exact test. To perform this test, a Markov chain
is constructed on the space of multidimensional contingency tables using the
elements of a Markov basis as moves. We test our method on simulated data and
compare it to a two-stage logistic regression method and to a fully Bayesian
method, showing that we are able to detect the interacting loci when other
methods fail to do so. Finally, we apply our method to a genome-wide data set
consisting of 685 dogs and identify epistasis associated with canine hair
length for four pairs of SNPs
Packing ellipsoids with overlap
The problem of packing ellipsoids of different sizes and shapes into an
ellipsoidal container so as to minimize a measure of overlap between ellipsoids
is considered. A bilevel optimization formulation is given, together with an
algorithm for the general case and a simpler algorithm for the special case in
which all ellipsoids are in fact spheres. Convergence results are proved and
computational experience is described and illustrated. The motivating
application - chromosome organization in the human cell nucleus - is discussed
briefly, and some illustrative results are presented
Scalable Unbalanced Optimal Transport using Generative Adversarial Networks
Generative adversarial networks (GANs) are an expressive class of neural
generative models with tremendous success in modeling high-dimensional
continuous measures. In this paper, we present a scalable method for unbalanced
optimal transport (OT) based on the generative-adversarial framework. We
formulate unbalanced OT as a problem of simultaneously learning a transport map
and a scaling factor that push a source measure to a target measure in a
cost-optimal manner. In addition, we propose an algorithm for solving this
problem based on stochastic alternating gradient updates, similar in practice
to GANs. We also provide theoretical justification for this formulation,
showing that it is closely related to an existing static formulation by Liero
et al. (2018), and perform numerical experiments demonstrating how this
methodology can be applied to population modeling
Mastitis in dairy production: Estimation of sensitivity, specificity and disease prevalence in the absence of a gold standard
Mastitis, a worldwide endemic disease of dairy cows, is an important cause of decreased efficiency in milk production. Early medical treatment can reduce the nonreversible losses in milk production caused by this infection. Various diagnostic tests for mastitis are available, including a test measuring the electrical conductivity of milk (MEC test), the industry standard of somatic cell counting (SCC test), a bacteriological test, and a recently developed test measuring mammary associated amyloid A (MAA test). None of these tests is considered a gold standard, however. The aim of the present study was to determine which of these tests provides the best results, and at what cost, to improve the efficiency of milk production. For this study, 25 cows were tested at all four quarters of the udder with each of the aforementioned mastitis diagnostic tests. Based on the data, the disease prevalence as well as the sensitivity and the specificity of the four tests were estimated with a Bayesian approach by extending the Hui and Walter model with two independent tests and two populations to a model with four partially dependent tests and one population. This model was further combined with a receiver operating characteristics analysis to estimate the overall test accurac
Geometry of maximum likelihood estimation in Gaussian graphical models
We study maximum likelihood estimation in Gaussian graphical models from a
geometric point of view. An algebraic elimination criterion allows us to find
exact lower bounds on the number of observations needed to ensure that the
maximum likelihood estimator (MLE) exists with probability one. This is applied
to bipartite graphs, grids and colored graphs. We also study the ML degree, and
we present the first instance of a graph for which the MLE exists with
probability one, even when the number of observations equals the treewidth.Comment: Published in at http://dx.doi.org/10.1214/11-AOS957 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Geometry of Log-Concave Density Estimation
Shape-constrained density estimation is an important topic in mathematical
statistics. We focus on densities on that are log-concave, and
we study geometric properties of the maximum likelihood estimator (MLE) for
weighted samples. Cule, Samworth, and Stewart showed that the logarithm of the
optimal log-concave density is piecewise linear and supported on a regular
subdivision of the samples. This defines a map from the space of weights to the
set of regular subdivisions of the samples, i.e. the face poset of their
secondary polytope. We prove that this map is surjective. In fact, every
regular subdivision arises in the MLE for some set of weights with positive
probability, but coarser subdivisions appear to be more likely to arise than
finer ones. To quantify these results, we introduce a continuous version of the
secondary polytope, whose dual we name the Samworth body. This article
establishes a new link between geometric combinatorics and nonparametric
statistics, and it suggests numerous open problems.Comment: 22 pages, 3 figure
- …