626 research outputs found

    PPNID : a reference database and molecular identification pipeline for plant-parasitic nematodes

    Get PDF
    Motivation: The phylum Nematoda comprises the most cosmopolitan and abundant metazoans on Earth and plant-parasitic nematodes represent one of the most significant nematode groups, causing severe losses in agriculture. Practically, the demands for accurate nematode identification are high for ecological, agricultural, taxonomic and phylogenetic researches. Despite their importance, the morphological diagnosis is often a difficult task due to phenotypic plasticity and the absence of clear diagnostic characters while molecular identification is very difficult due to the problematic database and complex genetic background. Results: The present study attempts to make up for currently available databases by creating a manually-curated database including all up-to-date authentic barcoding sequences. To facilitate the laborious process associated with the interpretation and identification of a given query sequence, we developed an automatic software pipeline for rapid species identification. The incorporated alignment function facilitates the examination of mutation distribution and therefore also reveals nucleotide autapomorphies, which are important in species delimitation. The implementation of genetic distance, plot and maximum likelihood phylogeny analysis provides more powerful optimality criteria than similarity searching and facilitates species delimitation using evolutionary or phylogeny species concepts. The pipeline streamlines several functions to facilitate more precise data analyses, and the subsequent interpretation is easy and straightforward

    Effects of phylogenetic reconstruction method on the robustness of species delimitation using single-locus data

    Get PDF
    1. Coalescent-based species delimitation methods combine population genetic and phylogenetic theory to provide an objective means for delineating evolutionarily significant units of diversity. The Generalized Mixed Yule Coalescent (GMYC) and the Poisson Tree Process (PTP) are methods that use ultrametric (GMYC or PTP) or non-ultrametric (PTP) gene trees as input, intended for use mostly with single-locus data such as DNA barcodes. 2. Here we assess how robust the GMYC and PTP are to different phylogenetic reconstruction and branch smoothing methods. We reconstruct over 400 ultrametric trees using up to 30 different combinations of phylogenetic and smoothing methods and perform over 2,000 separate species delimitation analyses across 16 empirical datasets. We then assess how variable diversity estimates are, in terms of richness and identity, with respect to species delimitation, phylogenetic and smoothing methods. 3. The PTP method generally generates diversity estimates that are more robust to different phylogenetic methods. The GMYC is more sensitive, but provides consistent estimates for BEAST trees. The lower consistency of GMYC estimates is likely a result of differences among gene trees introduced by the smoothing step. Unresolved nodes (real anomalies or methodological artefacts) affect both GMYC and PTP estimates, but have a greater effect on GMYC estimates. Branch smoothing is a difficult step and perhaps an underappreciated source of bias that may be widespread among studies of diversity and diversification. 4. Nevertheless, careful choice of phylogenetic method does produce equivalent PTP and GMYC diversity estimates. We recommend simultaneous use of the PTP model with any model-based gene tree (e.g. RAxML) and GMYC approaches with BEAST trees for obtaining species hypotheses

    SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements

    Get PDF
    Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp, a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny‐aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp‐derived metrics can classify samples by their diversity‐correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp

    A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets

    Get PDF
    Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces two new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published and newly generated data. Analyses of simulated data demonstrate that the combined method has favourable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases

    High-Performance approaches for Phylogenetic Placement, and its application to species and diversity quantification

    Get PDF
    In den letzten Jahren haben Fortschritte in der Hochdurchsatz-Genesequenzierung, in Verbindung mit dem anhaltenden exponentiellen Wachstum und der VerfĂŒgbarkeit von Rechenressourcen, zu fundamental neuen analytischen AnsĂ€tzen in der Biologie gefĂŒhrt. Es ist nun möglich den genetischen Inhalt ganzer Organismengemeinschaften anhand einzelner Umweltproben umfassend zu sequenzieren. Solche Methoden sind besonders fĂŒr die Mikrobiologie relevant. Die Mikrobiologie war zuvor weitgehend auf die Untersuchung jener Mikroben beschrĂ€nkt, welche im Labor (d.h., in vitro) kultiviert werden konnten, was jedoch lediglich einen kleinen Teil der in der Natur vorkommenden DiversitĂ€t abdeckt. Im Gegensatz dazu ermöglicht die Hochdurchsatzsequenzierung nun die direkte Erfassung der genetischen Sequenzen eines Mikrobioms, wie es in seiner natĂŒrlichen Umgebung vorkommt (d.h., in situ). Ein typisches Ziel von Mikrobiomstudien besteht in der taxonomischen Klassifizierung der in einer Probe enthaltenen Sequenzen (Querysequenzen). Üblicherweise werden phylogenetische Methoden eingesetzt, um detaillierte taxonomische Beziehungen zwischen Querysequenzen und vertrauenswĂŒrdigen Referenzsequenzen, die von bereits klassifizierten Organismen stammen, zu bestimmen. Aufgrund des hohen Volumens (106 10 ^ 6 bis 109 10 ^ 9 ) von Querysequenzen, die aus einer Mikrobiom-Probe mittels Hochdurchsatzsequenzierung generiert werden können, ist eine akkurate phylogenetische Baumrekonstruktion rechnerisch nicht mehr möglich. DarĂŒber hinaus erzeugen derzeit ĂŒblicherweise verwendete Sequenzierungstechnologien vergleichsweise kurze Sequenzen, die ein begrenztes phylogenetisches Signal aufweisen, was zu einer InstabilitĂ€t bei der Inferenz der Phylogenien aus diesen Sequenzen fĂŒhrt. Ein weiteres typisches Ziel von Mikrobiomstudien besteht in der Quantifizierung der DiversitĂ€t innerhalb einer Probe, bzw. zwischen mehreren Proben. Auch hierfĂŒr werden ĂŒblicherweise phylogenetische Methoden verwendet. Oftmals setzen diese Methoden die Inferenz eines phylogenetischen Baumes voraus, welcher entweder alle Sequenzen, oder eine geclusterte Teilmenge dieser Sequenzen, umfasst. Wie bei der taxonomischen Identifizierung können Analysen, die auf dieser Art von Bauminferenz basieren, zu ungenauen Ergebnissen fĂŒhren und/oder rechnerisch nicht durchfĂŒhrbar sein. Im Gegensatz zu einer umfassenden phylogenetischen Inferenz ist die phylogenetische Platzierung eine Methode, die den phylogenetischen Kontext einer Querysequenz innerhalb eines etablierten Referenzbaumes bestimmt. Dieses Verfahren betrachtet den Referenzbaum typischerweise als unverĂ€nderlich, d.h. der Referenzbaum wird vor, wĂ€hrend oder nach der Platzierung einer Sequenz nicht geĂ€ndert. Dies erlaubt die phylogenetische Platzierung einer Sequenz in linearer Zeit in Bezug auf die GrĂ¶ĂŸe des Referenzbaums durchzufĂŒhren. In Kombination mit taxonomischen Informationen ĂŒber die Referenzsequenzen ermöglicht die phylogenetische Platzierung somit die taxonomische Identifizierung einer Sequenz. DarĂŒber hinaus erlaubt eine phylogenetische Platzierung die Anwendung einer Vielzahl zusĂ€tzlicher Analyseverfahren, die beispielsweise die Zuordnung der Zusammensetzungen humaner Mikrobiome zu klinisch-diagnostischen Eigenschaften ermöglicht. In dieser Dissertation prĂ€sentiere ich meine Arbeit bezĂŒglich des Entwurfs, der Implementierung, und Verbesserung von EPA-ng, einer Hochleistungsimplementierung der phylogenetischen Platzierung anhand des Maximum-Likelihood Modells. EPA-ng wurde entwickelt um auf Milliarden von Querysequenzen zu skalieren und auf Tausenden von Kernen in Systemen mit gemeinsamem und verteiltem Speicher ausgefĂŒhrt zu werden. EPA-ng beschleunigt auch die Verarbeitungsgeschwindigkeit auf einzelnen Kernen um das bis zu 3030-fache, im Vergleich zu dessen direkten Konkurrenzprogrammen. Vor kurzem haben wir eine zusĂ€tzliche Methode fĂŒr EPA-ng eingefĂŒhrt, welche die Platzierung in wesentlich grĂ¶ĂŸeren ReferenzbĂ€umen ermöglicht. HierfĂŒr verwenden wir einen aktiven Speicherverwaltungsansatz, bei dem reduzierter Speicherverbrauch gegen grĂ¶ĂŸere AusfĂŒhrungszeiten eingetauscht wird. ZusĂ€tzlich prĂ€sentiere ich einen massiv-parallelen Ansatz um die DiversitĂ€t einer Probe zu quantifizieren, welcher auf den Ergebnissen phylogenetischer Platzierungen basiert. Diese Software, genannt \toolname{SCRAPP}, kombiniert aktuelle Methoden fĂŒr die Maximum-Likelihood basierte phylogenetische Inferenz mit Methoden zur Abgrenzung molekularer Spezien. Daraus resultiert eine Verteilung der Artenanzahl auf den Kanten eines Referenzbaums fĂŒr eine gegebene Probe. DarĂŒber hinaus beschreibe ich einen neuartigen Ansatz zum Clustering von Platzierungsergebnissen, anhand dessen der Benutzer den Rechenaufwand reduzieren kann

    Inferring kangaroo phylogeny from incongruent nuclear and mitochondrial genes

    Get PDF
    The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression

    Inferring Kangaroo Phylogeny from Incongruent Nuclear and Mitochondrial Genes

    Get PDF
    The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression.This work has been supported by Australian Research Council grants to MJP (DP07745015) and MB (FT0991741). The website for the funder is www.arc.gov.au. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Codon-based analysis of selection pressure and genetic structure in the Psammobates tentorius (Bell, 1828) species complex, and phylogeny inferred from both codons and amino acid sequences

    Get PDF
    This study used codon analysis (dN/dS and Tv/Ti) to investigate selection pressure and genetic structure in the highly polymorphic Psammobates tentorius species complex, and amino acid sequences to construct a phylogeny tree for it. Our results revealed a strong selection signal at node ‘C2 + C3’, possibly driven by aridity intensification resulting from the development of the Benguela Current. A similar signal was noticed at C3, possibly due to the same driving force. These findings suggest that environmental selection pressure favoured those groups and that further cladogenic events were possible. Selection pressure was also found to be high at C1, C4 and C7, which may indicate that they are also favoured by the current selection pressure. The codon-based phylogeny did not retrieve any potentially undescribed species, but nonetheless provided support for the validity of the seven distinct clades retrieved with the DNA sequence data. The amino acid sequence-based phylogeny generally supported the seven lineages as valid putative species. Investigation at the genomic scale could, however, help to solve the issue. In general, we found the codon, dN, dS, Tv, Ti and amino acid sequence-based phylogenetic inferences useful in species delimitation and recommend their use in species delimitation studies
    • 

    corecore