1,500 research outputs found

    Detecting epistasis via Markov bases

    Full text link
    Rapid research progress in genotyping techniques have allowed large genome-wide association studies. Existing methods often focus on determining associations between single loci and a specific phenotype. However, a particular phenotype is usually the result of complex relationships between multiple loci and the environment. In this paper, we describe a two-stage method for detecting epistasis by combining the traditionally used single-locus search with a search for multiway interactions. Our method is based on an extended version of Fisher's exact test. To perform this test, a Markov chain is constructed on the space of multidimensional contingency tables using the elements of a Markov basis as moves. We test our method on simulated data and compare it to a two-stage logistic regression method and to a fully Bayesian method, showing that we are able to detect the interacting loci when other methods fail to do so. Finally, we apply our method to a genome-wide data set consisting of 685 dogs and identify epistasis associated with canine hair length for four pairs of SNPs

    Packing ellipsoids with overlap

    Full text link
    The problem of packing ellipsoids of different sizes and shapes into an ellipsoidal container so as to minimize a measure of overlap between ellipsoids is considered. A bilevel optimization formulation is given, together with an algorithm for the general case and a simpler algorithm for the special case in which all ellipsoids are in fact spheres. Convergence results are proved and computational experience is described and illustrated. The motivating application - chromosome organization in the human cell nucleus - is discussed briefly, and some illustrative results are presented

    Scalable Unbalanced Optimal Transport using Generative Adversarial Networks

    Full text link
    Generative adversarial networks (GANs) are an expressive class of neural generative models with tremendous success in modeling high-dimensional continuous measures. In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework. We formulate unbalanced OT as a problem of simultaneously learning a transport map and a scaling factor that push a source measure to a target measure in a cost-optimal manner. In addition, we propose an algorithm for solving this problem based on stochastic alternating gradient updates, similar in practice to GANs. We also provide theoretical justification for this formulation, showing that it is closely related to an existing static formulation by Liero et al. (2018), and perform numerical experiments demonstrating how this methodology can be applied to population modeling

    Geometry of Log-Concave Density Estimation

    Full text link
    Shape-constrained density estimation is an important topic in mathematical statistics. We focus on densities on Rd\mathbb{R}^d that are log-concave, and we study geometric properties of the maximum likelihood estimator (MLE) for weighted samples. Cule, Samworth, and Stewart showed that the logarithm of the optimal log-concave density is piecewise linear and supported on a regular subdivision of the samples. This defines a map from the space of weights to the set of regular subdivisions of the samples, i.e. the face poset of their secondary polytope. We prove that this map is surjective. In fact, every regular subdivision arises in the MLE for some set of weights with positive probability, but coarser subdivisions appear to be more likely to arise than finer ones. To quantify these results, we introduce a continuous version of the secondary polytope, whose dual we name the Samworth body. This article establishes a new link between geometric combinatorics and nonparametric statistics, and it suggests numerous open problems.Comment: 22 pages, 3 figure

    Faithfulness and learning hypergraphs from discrete distributions

    Full text link
    The concepts of faithfulness and strong-faithfulness are important for statistical learning of graphical models. Graphs are not sufficient for describing the association structure of a discrete distribution. Hypergraphs representing hierarchical log-linear models are considered instead, and the concept of parametric (strong-) faithfulness with respect to a hypergraph is introduced. Strong-faithfulness ensures the existence of uniformly consistent parameter estimators and enables building uniformly consistent procedures for a hypergraph search. The strength of association in a discrete distribution can be quantified with various measures, leading to different concepts of strong-faithfulness. Lower and upper bounds for the proportions of distributions that do not satisfy strong-faithfulness are computed for different parameterizations and measures of association.Comment: 23 pages, 6 figure
    • …
    corecore