Search CORE

21 research outputs found

Recommended from our members

Standardised Benchmarking in the Quest for Orthologs

Author: Altenhoff Adrian M.
Boeckmann Brigitte
Bork Peer
Capella-Gutierrez Salvador
Dalquen Daniel A.
DeLuca Todd
Dessimoz Christophe
Forslund Kristoffer
Gabaldón Toni
Huerta-Cepas Jaime
Juhl Jensen Lars
Lecompte Odile
Lewis Suzanna E.
Linard Benjamin
Martin Maria J.
Muffato Matthieu
Pereira Cécile
Pryszcz Leszek P.
Schreiber Fabian
Sjölander Kimmen
Sonnhammer Erik
Sousa da Silva Alan
Szklarczyk Damian
Thomas Paul D.
Train Clément-Marie
von Mering Christian
Xenarios Ioannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2016
Field of study

The identification of evolutionarily related genes across different species—orthologs in particular—forms the backbone of many comparative, evolutionary, and functional genomic analyses. Achieving high accuracy in orthology inference is thus essential. Yet the true evolutionary history of genes, required to ascertain orthology, is generally unknown. Furthermore, orthologs are used for very different applications across different phyla, with different requirements in terms of the precision-recall trade-off. As a result, assessing the performance of orthology inference methods remains difficult for both users and method developers. Here, we present a community effort to establish standards in orthology benchmarking and facilitate orthology benchmarking through an automated web-based service (http://orthology.benchmarkservice.org). Using this new service, we characterise the performance of 15 well-established orthology inference methods and resources on a battery of 20 different benchmarks. Standardised benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimal requirement for new tools and resources, and guides the development of more accurate orthology inference methods

Harvard University - DASH

Data from: Maximum likelihood implementation of an isolation-with-migration model for three species

Author: Dalquen Daniel A.
Yang Ziheng
Zhu Tianqi
Publication venue
Publication date: 12/07/2016
Field of study

We develop a maximum likelihood (ML) method for estimating migration rates between species using genomic sequence data. A species tree is used to accommodate the phylogenetic relationships among three species, allowing for migration between the two sister species, while the third species is used as an out-group. A Markov chain characterization of the genealogical process of coalescence and migration is used to integrate out the migration histories at each locus analytically, whereas Gaussian quadrature is used to integrate over the coalescent times on each genealogical tree numerically. This is an extension of our early implementation of the symmetrical isolation-with-migration model for three species to accommodate arbitrary loci with two or three sequences per locus and to allow asymmetrical migration rates. Our implementation can accommodate tens of thousands of loci, making it feasible to analyze genome-scale data sets to test for gene flow. We calculate the posterior probabilities of gene trees at individual loci to identify genomic regions that are likely to have been transferred between species due to gene flow. We conduct a simulation study to examine the statistical properties of the likelihood ratio test for gene flow between the two in-group species and of the ML estimates of model parameters such as the migration rate. Inclusion of data from a third out-group species is found to increase dramatically the power of the test and the precision of parameter estimation. We compiled and analyzed several genomic data sets from the Drosophila fruit flies. Our analyses suggest no migration from D. melanogaster to D. simulans, and a significant amount of gene flow from D. simulans to D. melanogaster, at the rate of ~0.02 migrant individuals per generation. We discuss the utility of the multispecies coalescent model for species tree estimation, accounting for incomplete lineage sorting and migration

ZENODO

Dryad Digital Repository (Duke University)

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ALF—A Simulation Framework for Genome Evolution

Author: Christophe Dessimoz
Daniel A. Dalquen
Gaston H. Gonnet
Maria Anisimova
Publication venue
Publication date: 08/12/2011
Field of study

In computational evolutionary biology, verification and benchmarking is a challenging task because the evolutionary history of studied biological entities is usually not known. Computer programs for simulating sequence evolution in silico have shown to be viable test beds for the verification of newly developed methods and to compare different algorithms. However, current simulation packages tend to focus either on gene-level aspects of genome evolution such as character substitutions and insertions and deletions (indels) or on genome-level aspects such as genome rearrangement and speciation events. Here, we introduce Artificial Life Framework (ALF), which aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. The other distinctive feature of ALF is its user-friendly yet powerful web interface. We illustrate the utility of ALF with two possible applications: 1) we reanalyze data from a study of selection after globin gene duplication and test the statistical significance of the original conclusions and 2) we demonstrate that LGT can dramatically decrease the accuracy of two well-established orthology inference methods. ALF is available as a stand-alone application or via a web interface a

CiteSeerX

Repository for Publications and Research Data

Crossref

PubMed Central

UCL Discovery

Gene copy number variation and its significance in cyanobacterial phylogeny

Author: Anisimova Maria
Bagheri Homayoun C
Dalquen Daniel A
Schirrmeister Bettina E
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background : In eukaryotes, variation in gene copy numbers is often associated with deleterious effects, but may also have positive effects. For prokaryotes, studies on gene copy number variation are rare. Previous studies have suggested that high numbers of rRNA gene copies can be advantageous in environments with changing resource availability, but further association of gene copies and phenotypic traits are not documented. We used one of the morphologically most diverse prokaryotic phyla to test whether numbers of gene copies are associated with levels of cell differentiation. Results : We implemented a search algorithm that identified 44 genes with highly conserved copies across 22 fully sequenced cyanobacterial taxa. For two very basal cyanobacterial species, Gloeobacter violaceus and a thermophilic Synechococcus species, distinct phylogenetic positions previously found were supported by identical protein coding gene copy numbers. Furthermore, we found that increased ribosomal gene copy numbers showed a strong correlation to cyanobacteria capable of terminal cell differentiation. Additionally, we detected extremely low variation of 16S rRNA sequence copies within the cyanobacteria. We compared our results for 16S rRNA to three other eubacterial phyla (Chroroflexi, Spirochaetes and Bacteroidetes). Based on Bayesian phylogenetic inference and the comparisons of genetic distances, we could confirm that cyanobacterial 16S rRNA paralogs and orthologs show significantly stronger conservation than found in other eubacterial phyla. Conclusions : A higher number of ribosomal operons could potentially provide an advantage to terminally differentiated cyanobacteria. Furthermore, we suggest that 16S rRNA gene copies in cyanobacteria are homogenized by both concerted evolution and purifying selection. In addition, the small ribosomal subunit in cyanobacteria appears to evolve at extraordinary slow evolutionary rates, an observation that has been made previously for morphological characteristics of cyanobacteria

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ZORA

Explore Bristol Research

The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study

Author: Adrian M. Altenhoff (103706)
Christophe Dessimoz (18084)
Daniel A. Dalquen (381930)
Gaston H. Gonnet (18086)
Publication venue
Publication date: 01/01/2013
Field of study

<div><p>The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.</p> <p>Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.</p> </div

Public Library of Science (PLOS)

Repository for Publications and Research Data

CiteSeerX

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

FigShare

Baseline simulation parameters and key statistics.

Author: Adrian M. Altenhoff (103706)
Christophe Dessimoz (18084)
Daniel A. Dalquen (381930)
Gaston H. Gonnet (18086)
Publication venue
Publication date
Field of study

<p>Characteristics of the baseline parameters used to simulate the datasets and resulting key statistics for sequence length, insertions and deletions, and tree topology. Distances and tree height/length given in PAM units.</p

FigShare

Orthology inference vs. LGT.

Author: Adrian M. Altenhoff (103706)
Christophe Dessimoz (18084)
Daniel A. Dalquen (381930)
Gaston H. Gonnet (18086)
Publication venue
Publication date
Field of study

<p>Precision/recall of orthology predictions with different proportions of genes with a history of lateral gene transfer. Each data point corresponds to the mean over all orthologous relations in five replicates (with 95% confidence interval of the mean values in both dimensions).</p

FigShare

Simulation parameters for analysis of LGT.

Author: Adrian M. Altenhoff (103706)
Christophe Dessimoz (18084)
Daniel A. Dalquen (381930)
Gaston H. Gonnet (18086)
Publication venue
Publication date
Field of study

<p>Parameters for gene duplication, gene loss and LGT used to simulate the datasets for investigating the effect of LGT on orthology inference. These rates are per gene, per PAM unit (i.e. relative to substitutions).</p

FigShare

Orthology inference vs. insertions and deletions.

Author: Adrian M. Altenhoff (103706)
Christophe Dessimoz (18084)
Daniel A. Dalquen (381930)
Gaston H. Gonnet (18086)
Publication venue
Publication date
Field of study

<p>Precision/recall of orthology predictions with different rates of insertion and deletion events. Each data point corresponds to the mean of over all orthologous relations in five replicates (with 95% confidence interval of the mean values in both dimensions).</p

FigShare

Orthology inference vs. sequencing artefacts.

Author: Adrian M. Altenhoff (103706)
Christophe Dessimoz (18084)
Daniel A. Dalquen (381930)
Gaston H. Gonnet (18086)
Publication venue
Publication date
Field of study

<p>Precision/recall of orthology predictions with different proportions of ambiguous (i.e. “X”) characters. Each data point corresponds to the mean of over all orthologous relations in five replicates (with 95% confidence interval of the mean values in both dimensions).</p

FigShare