4,364 research outputs found

    A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

    Full text link
    Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we demonstrate that this improvement in alignment quality yields 40-280% more confidently aligned sites. Combined, these factors translate to higher estimated levels of overall conservation, while at the same time allowing for the detection of up to 180% more positively selected sites. MOSAIC is available as python package. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC

    Diffusion Approximations for Demographic Inference: DaDi

    Get PDF
    Models of demographic history (population sizes, migration rates, and divergence times) inferred from genetic data complement archeology and serve as null models in genome scans for selection. Most current inference methods are computationally limited to considering simple models or non-recombining data. We introduce a method based on a diffusion approximation to the joint frequency spectrum of genetic variation between populations. Our implementation, DaDi, can model up to three interacting populations and scales well to genome-wide data. We have applied DaDi to human data from Africa, Europe, and East Asia, building the most complex statistically well-characterized model of human migration out of Africa to date
    • …
    corecore