49,075 research outputs found

    A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees

    Get PDF
    Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work and are publicly available. We also apply our methods to real data

    A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

    Get PDF
    We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the Subtree Prune-and-Regraft (SPR) distance between two phylogenetic trees. Our result improves on the very recent 2.5-approximation algorithm due to Shi, Feng, You and Wang (2015). Our algorithm is the first approximation algorithm for this problem that uses LP duality in its analysis

    On unrooted and root-uncertain variants of several well-known phylogenetic network problems

    Get PDF
    The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an \emph{unrooted} phylogenetic network displays (i.e. embeds) an \emph{unrooted} phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the Tree Bisection and Reconnect (TBR) distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.Comment: 28 pages, 8 Figure

    A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

    Get PDF
    We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the rooted Subtree Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm is combinatorial and its running time is quadratic in the input size. To prove the approximation guarantee, we construct a feasible dual solution for a novel linear programming formulation. In addition, we show this linear program is stronger than previously known formulations, and we give a compact formulation, showing that it can be solved in polynomial tim

    Active Mean Fields for Probabilistic Image Segmentation: Connections with Chan-Vese and Rudin-Osher-Fatemi Models

    Get PDF
    Segmentation is a fundamental task for extracting semantically meaningful regions from an image. The goal of segmentation algorithms is to accurately assign object labels to each image location. However, image-noise, shortcomings of algorithms, and image ambiguities cause uncertainty in label assignment. Estimating the uncertainty in label assignment is important in multiple application domains, such as segmenting tumors from medical images for radiation treatment planning. One way to estimate these uncertainties is through the computation of posteriors of Bayesian models, which is computationally prohibitive for many practical applications. On the other hand, most computationally efficient methods fail to estimate label uncertainty. We therefore propose in this paper the Active Mean Fields (AMF) approach, a technique based on Bayesian modeling that uses a mean-field approximation to efficiently compute a segmentation and its corresponding uncertainty. Based on a variational formulation, the resulting convex model combines any label-likelihood measure with a prior on the length of the segmentation boundary. A specific implementation of that model is the Chan-Vese segmentation model (CV), in which the binary segmentation task is defined by a Gaussian likelihood and a prior regularizing the length of the segmentation boundary. Furthermore, the Euler-Lagrange equations derived from the AMF model are equivalent to those of the popular Rudin-Osher-Fatemi (ROF) model for image denoising. Solutions to the AMF model can thus be implemented by directly utilizing highly-efficient ROF solvers on log-likelihood ratio fields. We qualitatively assess the approach on synthetic data as well as on real natural and medical images. For a quantitative evaluation, we apply our approach to the icgbench dataset

    The Statistics of Density Peaks and the Column Density Distribution of the Lyman-Alpha Forest

    Get PDF
    We develop a method to calculate the column density distribution of the Lyman-alpha forest for column densities in the range 1012.51014.5cm210^{12.5} - 10^{14.5} cm^{-2}. The Zel'dovich approximation, with appropriate smoothing, is used to compute the density and peculiar velocity fields. The effect of the latter on absorption profiles is discussed and it is shown to have little effect on the column density distribution. An approximation is introduced in which the column density distribution is related to a statistic of density peaks (involving its height and first and second derivatives along the line of sight) in real space. We show that the slope of the column density distribution is determined by the temperature-density relation as well as the power spectrum on scales 2hMpc1<k<20hMpc12 h Mpc^{-1} < k < 20 h Mpc^{-1}. An expression relating the three is given. We find very good agreement between the column density distribution obtained by applying the Voigt-profile-fitting technique to the output of a full hydrodynamic simulation and that obtained using our approximate method for a test model. This formalism then is applied to study a group of CDM as well as CHDM models. We show that the amplitude of the column density distribution depends on the combination of parameters (Ωbh2)2T00.7JHI1(\Omega_b h^2)^2 T_0^{-0.7} J_{HI}^{-1}, which is not well-constrained by independent observations. The slope of the distribution, on the other hand, can be used to distinguish between different models: those with a smaller amplitude and a steeper slope of the power spectrum on small scales give rise to steeper distributions, for the range of column densities we study. Comparison with high resolution Keck data is made.Comment: match accepted version; discussion added: the effect of the shape of the power spectrum on the slope of the column density distributio

    Distributed Dominating Set Approximations beyond Planar Graphs

    Full text link
    The Minimum Dominating Set (MDS) problem is one of the most fundamental and challenging problems in distributed computing. While it is well-known that minimum dominating sets cannot be approximated locally on general graphs, over the last years, there has been much progress on computing local approximations on sparse graphs, and in particular planar graphs. In this paper we study distributed and deterministic MDS approximation algorithms for graph classes beyond planar graphs. In particular, we show that existing approximation bounds for planar graphs can be lifted to bounded genus graphs, and present (1) a local constant-time, constant-factor MDS approximation algorithm and (2) a local O(logn)\mathcal{O}(\log^*{n})-time approximation scheme. Our main technical contribution is a new analysis of a slightly modified variant of an existing algorithm by Lenzen et al. Interestingly, unlike existing proofs for planar graphs, our analysis does not rely on direct topological arguments.Comment: arXiv admin note: substantial text overlap with arXiv:1602.0299
    corecore