Search CORE

3,121 research outputs found

Using tree diversity to compare phylogenetic heuristics

Author: Matthews Suzanne
Sul Seung-Jin
Williams Tiffani L
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Texas A&M Repository

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

High-Performance approaches for Phylogenetic Placement, and its application to species and diversity quantification

Author: Barbera Pierre
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 08/11/2021
Field of study

In den letzten Jahren haben Fortschritte in der Hochdurchsatz-Genesequenzierung, in Verbindung mit dem anhaltenden exponentiellen Wachstum und der Verfügbarkeit von Rechenressourcen, zu fundamental neuen analytischen Ansätzen in der Biologie geführt. Es ist nun möglich den genetischen Inhalt ganzer Organismengemeinschaften anhand einzelner Umweltproben umfassend zu sequenzieren. Solche Methoden sind besonders für die Mikrobiologie relevant. Die Mikrobiologie war zuvor weitgehend auf die Untersuchung jener Mikroben beschränkt, welche im Labor (d.h., in vitro) kultiviert werden konnten, was jedoch lediglich einen kleinen Teil der in der Natur vorkommenden Diversität abdeckt. Im Gegensatz dazu ermöglicht die Hochdurchsatzsequenzierung nun die direkte Erfassung der genetischen Sequenzen eines Mikrobioms, wie es in seiner natürlichen Umgebung vorkommt (d.h., in situ). Ein typisches Ziel von Mikrobiomstudien besteht in der taxonomischen Klassifizierung der in einer Probe enthaltenen Sequenzen (Querysequenzen). Üblicherweise werden phylogenetische Methoden eingesetzt, um detaillierte taxonomische Beziehungen zwischen Querysequenzen und vertrauenswürdigen Referenzsequenzen, die von bereits klassifizierten Organismen stammen, zu bestimmen. Aufgrund des hohen Volumens (

10 ^ 6

bis

10 ^ 9

) von Querysequenzen, die aus einer Mikrobiom-Probe mittels Hochdurchsatzsequenzierung generiert werden können, ist eine akkurate phylogenetische Baumrekonstruktion rechnerisch nicht mehr möglich. Darüber hinaus erzeugen derzeit üblicherweise verwendete Sequenzierungstechnologien vergleichsweise kurze Sequenzen, die ein begrenztes phylogenetisches Signal aufweisen, was zu einer Instabilität bei der Inferenz der Phylogenien aus diesen Sequenzen führt. Ein weiteres typisches Ziel von Mikrobiomstudien besteht in der Quantifizierung der Diversität innerhalb einer Probe, bzw. zwischen mehreren Proben. Auch hierfür werden üblicherweise phylogenetische Methoden verwendet. Oftmals setzen diese Methoden die Inferenz eines phylogenetischen Baumes voraus, welcher entweder alle Sequenzen, oder eine geclusterte Teilmenge dieser Sequenzen, umfasst. Wie bei der taxonomischen Identifizierung können Analysen, die auf dieser Art von Bauminferenz basieren, zu ungenauen Ergebnissen führen und/oder rechnerisch nicht durchführbar sein. Im Gegensatz zu einer umfassenden phylogenetischen Inferenz ist die phylogenetische Platzierung eine Methode, die den phylogenetischen Kontext einer Querysequenz innerhalb eines etablierten Referenzbaumes bestimmt. Dieses Verfahren betrachtet den Referenzbaum typischerweise als unveränderlich, d.h. der Referenzbaum wird vor, während oder nach der Platzierung einer Sequenz nicht geändert. Dies erlaubt die phylogenetische Platzierung einer Sequenz in linearer Zeit in Bezug auf die Größe des Referenzbaums durchzuführen. In Kombination mit taxonomischen Informationen über die Referenzsequenzen ermöglicht die phylogenetische Platzierung somit die taxonomische Identifizierung einer Sequenz. Darüber hinaus erlaubt eine phylogenetische Platzierung die Anwendung einer Vielzahl zusätzlicher Analyseverfahren, die beispielsweise die Zuordnung der Zusammensetzungen humaner Mikrobiome zu klinisch-diagnostischen Eigenschaften ermöglicht. In dieser Dissertation präsentiere ich meine Arbeit bezüglich des Entwurfs, der Implementierung, und Verbesserung von EPA-ng, einer Hochleistungsimplementierung der phylogenetischen Platzierung anhand des Maximum-Likelihood Modells. EPA-ng wurde entwickelt um auf Milliarden von Querysequenzen zu skalieren und auf Tausenden von Kernen in Systemen mit gemeinsamem und verteiltem Speicher ausgeführt zu werden. EPA-ng beschleunigt auch die Verarbeitungsgeschwindigkeit auf einzelnen Kernen um das bis zu

30

-fache, im Vergleich zu dessen direkten Konkurrenzprogrammen. Vor kurzem haben wir eine zusätzliche Methode für EPA-ng eingeführt, welche die Platzierung in wesentlich größeren Referenzbäumen ermöglicht. Hierfür verwenden wir einen aktiven Speicherverwaltungsansatz, bei dem reduzierter Speicherverbrauch gegen größere Ausführungszeiten eingetauscht wird. Zusätzlich präsentiere ich einen massiv-parallelen Ansatz um die Diversität einer Probe zu quantifizieren, welcher auf den Ergebnissen phylogenetischer Platzierungen basiert. Diese Software, genannt \toolname{SCRAPP}, kombiniert aktuelle Methoden für die Maximum-Likelihood basierte phylogenetische Inferenz mit Methoden zur Abgrenzung molekularer Spezien. Daraus resultiert eine Verteilung der Artenanzahl auf den Kanten eines Referenzbaums für eine gegebene Probe. Darüber hinaus beschreibe ich einen neuartigen Ansatz zum Clustering von Platzierungsergebnissen, anhand dessen der Benutzer den Rechenaufwand reduzieren kann

KITopen

Recommended from our members

Shallow Genome Sequencing for Phylogenomics of Mycorrhizal Fungi from Endangered Orchids

Author: Barry Kerrie
Daum Christopher
Erba Luigi
Grigoriev Igor
Lipzen Anna
Pires Chris
Stajich Jason
Unruh Sarah
Zettler Lawrence
Publication venue: eScholarship, University of California
Publication date: 03/12/2019
Field of study

ABSTRACT Most plant species form symbioses with mycorrhizal fungi and this relationship is especially important for orchids. Fungi in the genera Tulasnella, Ceratobasidium, and Serendipita are critically important for orchid germination, growth and development. The goals of this study are to understand the phylogenetic relationships of mycorrhizal fungi and to improve the taxonomic resources for these groups. We identified 32 fungal isolates with the internal transcribed spacer region and used shallow genome sequencing to functionally annotate these isolates. We constructed phylogenetic trees from 408 orthologous nuclear genes for 50 taxa representing 14 genera, 11 families, and five orders in Agaricomycotina. While confirming relationships among the orders Cantharellales, Sebacinales, and Auriculariales, our results suggest novel relationships between families in the Cantharellales. Consistent with previous studies, we found the genera Ceratobasidium and Thanatephorus of Cerabotasidiaceae to not be monophyletic. Within the monophyletic genus Tulasnella , we found strong phylogenetic signals that suggest a potentially new species and a revision of current species boundaries (e.g. Tulasnella calospora ); however it is premature to make taxonomic revisions without further sampling and morphological descriptions. There is low resolution of Serendipita isolates collected. More sampling is needed from areas around the world before making evolutionary-informed changes in taxonomy. Our study adds value to an important living collection of fungi isolated from endangered orchid species, but also informs future investigations of the evolution of orchid mycorrhizal fungi

eScholarship - University of California

Accidental Father-to-Son HIV-1 Transmission During the Seroconversion Period

Author: Abecasis A
Bártolo I
Campos T
Ezeonwumelu I
Leitner T
Martin F
Romero-Severson EO
Taveira N
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2018
Field of study

A 4-year-old child born to an HIV-1 seronegative mother was diagnosed with HIV-1, the main risk factor being transmission from the child's father who was seroconverting at the time of the child's birth. In the context of a forensic investigation, we aimed to identify the source of infection of the child and date of the transmission event. Samples were collected from the father and child at two time points about 4 years after the child's birth. Partial segments of three HIV-1 genes (gag, pol, and env) were sequenced and maximum likelihood (ML) and Bayesian methods were used to determine direction and estimate date of transmission. Neutralizing antibodies were determined using a single cycle assay. Bayesian trees displayed a paraphyletic-monophyletic topology in all three genomic regions, with the father's host label at the root, which is consistent with father-to-son transmission. ML trees found similar topologies in gag and pol and a monophyletic-monophyletic topology in env. Analysis of the time of the most recent common ancestor of each HIV-1 gene population indicated that the child was infected shortly after the father. Consistent with the infection history, both father and son developed broad and potent HIV-specific neutralizing antibody responses. In conclusion, the direction of transmission implicated the father as the source of transmission. Transmission occurred during the seroconversion period when the father was unaware of the infection and was likely accidental. This case shows how genetic, phylogenetic, and serological data can contribute for the forensic investigation of HIV transmission.info:eu-repo/semantics/publishedVersio

Repositório Comum

Repositório do Hospital Prof. Doutor Fernando Fonseca

The effect of primer choice and short read sequences on the outcome of 16S rRNA gene based diversity studies

Author: De Vos Paul
Ghyselinck Jonas
Heylen Kim
Pfeiffer Stefan
Sessitsch Angela
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Different regions of the bacterial 16S rRNA gene evolve at different evolutionary rates. The scientific outcome of short read sequencing studies therefore alters with the gene region sequenced. We wanted to gain insight in the impact of primer choice on the outcome of short read sequencing efforts. All the unknowns associated with sequencing data, i.e. primer coverage rate, phylogeny, OTU-richness and taxonomic assignment, were therefore implemented in one study for ten well established universal primers (338f/r, 518f/r, 799f/r, 926f/r and 1062f/r) targeting dispersed regions of the bacterial 16S rRNA gene. All analyses were performed on nearly full length and in silico generated short read sequence libraries containing 1175 sequences that were carefully chosen as to present a representative substitute of the SILVA SSU database. The 518f and 799r primers, targeting the V4 region of the 16S rRNA gene, were found to be particularly suited for short read sequencing studies, while the primer 1062r, targeting V6, seemed to be least reliable. Our results will assist scientists in considering whether the best option for their study is to select the most informative primer, or the primer that excludes interferences by host-organelle DNA. The methodology followed can be extrapolated to other primers, allowing their evaluation prior to the experiment

CiteSeerX

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.

Author: Chan Michelle M
Hussmann Jeffrey A
Jones Matthew G
Khodaverdian Alex
Quinn Jeffrey J
Wang Robert
Weissman Jonathan S
Xu Chenling
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia

eScholarship - University of California

Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood

Author: Ababneh
Alexandros Stamatakis
Altschul
Berger
Bininda-Emonds
Brady
Chakravorty
Denis Krompass
DeSantis
Eddy
Edgar
Felsenstein
Felsenstein
Fierer
Galtier
Ganzert
Hamming
Han
Hillis
Ho
Hudson
Jayaswal
Jermiin
Katoh
Kluge
Koski
Ley
Ley
Ley
Lozupone
Ludwig
Matsen
McHardy
Moret
Munch
Nielsen
Pruitt
Ronaghi
Simon A. Berger
Stamatakis
Stamatakis
Stamatakis
Stamatakis
Stamatakis
Strimmer
Turnbaugh
Von Mering
Publication venue: Oxford University Press
Publication date
Field of study

We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA

Crossref

PubMed Central