Search CORE

59 research outputs found

TreeToReads - a pipeline for simulating raw reads from phylogenies.

Author: Allard Marc
Davis Steven
McTavish Emily Jane
Pettengill James
Rand Hugh
Strain Errol
Timme Ruth E
Publication venue: eScholarship, University of California
Publication date: 01/03/2017
Field of study

BackgroundUsing phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA's SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered.ResultsTo resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree.ConclusionsSuch critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Uncovering the evolutionary origin of plant molecular processes: comparison of Coleochaete (Coleochaetales) and Spirogyra (Zygnematales) transcriptomes

Author: Delwiche Charles F
Timme Ruth E
Publication venue
Publication date: 25/05/2010
Field of study

Background: The large and diverse land plant lineage is nested within a clade of fresh water green algae, the charophytes. Collection of genome-scale data for land plants and other organisms over the past decade has invigorated the field of evolutionary biology. One of the core questions in the field asks: how did a colonization event by a green algae over 450 mya lead to one of the most successful lineages on the tree of life? This question can best be answered using the comparative method, the first step of which is to gather genome-scale data across closely related lineages to land plants. Before sequencing an entire genome it is useful to first gather transcriptome data: it is less expensive, it targets the protein coding regions of the genome, and provides support for gene models for future genome sequencing. We built Expressed Sequence Tag (EST) libraries for two charophyte species, Coleochaete orbicularis (Coleochaetales) and Spirogyra pratensis (Zygnematales). We used both Sanger sequencing and next generation 454 sequencing to cover as much of the transcriptome as possible. Results: Our sequencing effort for Spirogyra pratensis yielded 9,984 5' Sanger reads plus 598,460 GS FLX Standard 454 sequences; Coleochaete orbicularis yielded 4,992 5' Sanger reads plus 673,811 GS FLX Titanium 454 sequences. After clustering S. pratensis yielded 12,000 unique transcripts, or unigenes, and C. orbicularis yielded 19,000. Both transcriptomes were very plant-like, i.e. most of the transcripts were more similar to streptophytes (land plants + charophyte green algae) than to other green algae in the sister group chlorophytes. BLAST results of several land plant genes hypothesized to be important in early land plant evolution resulted in high quality hits in both transcriptomes revealing putative orthologs ripe for follow-up studies. Conclusions: Two main conclusions were drawn from this study. One illustrates the utility of next generation sequencing for transcriptome studies: larger scale data collection at a lower cost enabled us to cover a considerable portion of the transcriptome for both species. And, two, that the charophyte green algal transcriptoms are remarkably plant-like, which gives them the unique capacity to be major players for future evolutionary genomic studies addressing origin of land plant questions.https://doi.org/10.1186/1471-2229-10-9

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Editorial: Integration of NGS in clinical and public health microbiology workflows: applications, compliance, quality considerations

Author: Peera Hemarajata
Ruth E. Timme
Shangxin Yang
Varvara K. Kozyreva
Publication venue: Frontiers Media S.A.
Publication date: 01/01/2024
Field of study

Directory of Open Access Journals

Phylogenetic networks: modeling, reconstructibility, and accuracy

Author: Linder C. Randy
Moret Bernard M. E.
Nakhleh Luay
Padolina A.
Sun J.
Tholse Anna
Timme Ruth
Warnow Tandy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2006
Field of study

Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific data sets. We present a general definition of phylogenetic networks in terms of directed acyclic graphs (DAGs) and a set of conditions. Further, we distinguish between model networks and reconstructible ones and characterize the effect of extinction and taxon sampling on the reconstructibility of the network. Simulation studies are a standard technique for assessing the performance of phylogenetic methods. A main step in such studies entails quantifying the topological error between the model and inferred phylogenies. While many measures of tree topological accuracy have been proposed, none exist for phylogenetic networks. Previously, we proposed the first such measure, which applied only to a restricted class of networks. In this paper, we extend that measure to apply to all networks, and prove that it is a metric on the space of phylogenetic networks. Our results allow for the systematic study of existing network methods, and for the design of new accurate ones

Infoscience - École polytechnique fédérale de Lausanne

PHA4GE quality control contextual data tags:standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training

Author: Barclay Charlotte
Cameron Rhiannon
Chindelevitch Leonid
Dave Mugdha
Dooley Damion
Griffiths Emma J
Guthrie Jennifer L
Holt Kathryn
Hsiao William W L
Karsch-Mizrachi Ilene
Katz Lee
MacCannell Duncan
Maguire Finlay
Mendes Inês
Nasar Muhammad Ibtisam
Oluniyi Paul
Petit Iii Robert
Raphenya Amogelang
Schmedes Sarah
Timme Ruth E
Waheed Zahra
Wee Bryan A
Yadav Chanchal
Publication venue
Publication date: 11/06/2024
Field of study

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA's GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.</p

Edinburgh Research Explorer

Broad Phylogenomic Sampling and the Sister Lineage of Land Plants

Author: A Simon
A Stamatakis
A Stamatakis
A Stamatakis
A-C Berglund
B Becker
B Chevreux
B Marin
C Finet
C Iseli
Charles F. Delwiche
CR Linder
DJ Zwickl
DL Swofford
F Abascal
F Bower
F Bower
H Schmidt
I Ebersberger
J Forment
J Gray
JD Hall
JW Leigh
K Karol
K Mattox
LS Kubatko
M Turmel
M Turmel
M Turmel
N Lartillot
N Lartillot
N Lartillot
NA Campbell
P Gensel
R Edgar
R Edgar
R McCourt
RE Timme
Ruth E. Timme
S Capella-Gutierrez
S Wodniok
Simon Joly
Tsvetan R. Bachvaroff
X Huang
Y Qiu
Y Qiu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The tremendous diversity of land plants all descended from a single charophyte green alga that colonized the land somewhere between 430 and 470 million years ago. Six orders of charophyte green algae, in addition to embryophytes, comprise the Streptophyta s.l. Previous studies have focused on reconstructing the phylogeny of organisms tied to this key colonization event, but wildly conflicting results have sparked a contentious debate over which lineage gave rise to land plants. The dominant view has been that ‘stoneworts,’ or Charales, are the sister lineage, but an alternative hypothesis supports the Zygnematales (often referred to as “pond scum”) as the sister lineage. In this paper, we provide a well-supported, 160-nuclear-gene phylogenomic analysis supporting the Zygnematales as the closest living relative to land plants. Our study makes two key contributions to the field: 1) the use of an unbiased method to collect a large set of orthologs from deeply diverging species and 2) the use of these data in determining the sister lineage to land plants. We anticipate this updated phylogeny not only will hugely impact lesson plans in introductory biology courses, but also will provide a solid phylogenetic tree for future green-lineage research, whether it be related to plants or green algae

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

A One Health Perspective on Salmonella enterica Serovar Infantis, an Emerging Human Multidrug-Resistant Pathogen

Author: Baker Dave J.
Chattaway Marie Anne
Dallman Timothy J.
Duze Sanelisiwe T.
Hartman Hassan
Keddy Karen
Langridge Gemma C.
Manners Emma J.
Mather Alison E.
Mattock Jennifer
Petrovska Liljana
Smith Anthony M.
Smouse Shannon
Tau Nomsa
Timme Ruth
Wain John
Publication venue
Publication date: 01/04/2024
Field of study

Salmonella enterica serovar Infantis presents an ever-increasing threat to public health because of its spread throughout many countries and association with high levels of antimicrobial resistance (AMR). We analyzed whole-genome sequences of 5,284 Salmonella Infantis strains from 74 countries, isolated during 1989-2020 from a wide variety of human, animal, and food sources, to compare genetic phylogeny, AMR determinants, and plasmid presence. The global Salmonella Infantis population structure diverged into 3 clusters: a North American cluster, a European cluster, and a global cluster. The levels of AMR varied by Salmonella Infantis cluster and by isolation source; 73% of poultry isolates were multidrug resistant, compared with 35% of human isolates. This finding correlated with the presence of the pESI megaplasmid; 71% of poultry isolates contained pESI, compared with 32% of human isolates. This study provides key information for public health teams engaged in reducing the spread of this pathogen

University of East Anglia digital repository

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package

Background The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. Results As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. Conclusions Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI’s BioSample database

Online Research @ Cardiff

PubMed Central

A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower

Author: Timme Ruth E.
Publication venue: eScholarship, University of California
Publication date: 24/07/2009
Field of study

Ezid

eScholarship - University of California