Search CORE

7,112 research outputs found

Bioinformatics tools for analysing viral genomic data

Author: Davison A.
Gu Q.
Hughes J.
Maabar M.
Modha S.
Orton R.J.
Vattipally Sreenu
Wilkie G.S.
Publication venue: 'O.I.E (World Organisation for Animal Health)'
Publication date: 01/04/2016
Field of study

The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

Enlighten

Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi

Author: Abarenkov K
Aime MC
Ariyawansa HA
Bidartondo M
Boekhout T
Buyck B
Cai Q
Cardinali G
Chen J
Crespo A
Crous PW
Damm U
De Beer ZW
Dentinger BTM
Dieguez Uribeondo J
Divakar PK
Duenas M
Duong V
Feau N
Federhen S
Fliegerova K
Garcia MA
Ge Z-W
Griffith G
Groenewald JZ
Groenewald M
Grube M
Gryzenhout M
Gueidan C
Guo L
Hambleton S
Hamelin R
Hansen K
Hofstetter V
Hong S-B
Houbraken J
Hughes K
Hyde KD
Inderbitzin P
Irinyi L
Johnston PR
Karunarathna SC
Kirk PM
Koljalg U
Kovacs GM
Kraichak E
Krizsan K
Kurtzman CP
Larsson K-H
Leavitt S
Letcher PM
Liimatainen K
Liu J-K
Lodge DJ
Luangsa-ard JJ
Lumbsch HT
Maharachchikumbura SSN
Manamgoda D
Martin MP
Meyer W
Miller AN
Minnis AM
Moncalvo J-M
Mule G
Nakasone KK
Nilsson RH
Niskanen T
Olariaga I
Papp T
Petkovits T
Pino-Bodas R
Powell MJ
Raja HA
Redecker D
Robbertse B
Robert V
Sarmiento-Ramirez JM
Schoch CL
Seifert KA
Shrestha B
Stenroos S
Stielow B
Subbarao KV
Suh S-O
Tanaka K
Tedersoo L
Teresa Telleria M
Udayanga D
Untereiner WA
Vagvoelgyi C
Visagie C
Voigt K
Walker DM
Weir BS
Weiss M
Wijayawardene NN
Wingfield MJ
Xu JP
Yang ZL
Zhang N
Zhuang W-Y
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi

Shared Research Repository

Wageningen University & Research Publications

Spiral - Imperial College Digital Repository

ViCTree: an automated framework for taxonomic classification from protein sequences

Author: Adams
Adams
Altschul
Andrew J Davison
Anil S Thanki
Bao
Cotmore
Di Tommaso
Edgar
Fu
Hibbett
Izquierdo-Carrasco
Janet Kelso
Joseph Hughes
Kapli
Katoh
Katoh
Kozlov
Lauber
Löytynoja
Löytynoja
Nishimura
Sejal Modha
Sievers
Simmonds
Simmonds
Smith
Stamatakis
Susan F Cotmore
Thompson
Vilella
Wu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/02/2018
Field of study

Motivation: The increasing rate of submission of genetic sequences into public databases is providing a growing resource for classifying the organisms that these sequences represent. To aid viral classification, we have developed ViCTree, which automatically integrates the relevant sets of sequences in NCBI GenBank and transforms them into an interactive maximum likelihood phylogenetic tree that can be updated automatically. ViCTree incorporates ViCTreeView, which is a JavaScript-based visualisation tool that enables the tree to be explored interactively in the context of pairwise distance data. Results: To demonstrate utility, ViCTree was applied to subfamily Densovirinae of family Parvoviridae. This led to the identification of six new species of insect virus. Availability: ViCTree is open-source and can be run on any Linux- or Unix-based computer or cluster. A tutorial, the documentation and the source code are available under a GPL3 license, and can be accessed at http://bioinformatics.cvr.ac.uk/victree_web/

Crossref

Enlighten

Metagenomics for Bacteriology

Author: del Castillo Erika
Izard Jacques
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2015
Field of study

The study of bacteria, or bacteriology, has gone through transformative waves since its inception in the 1600s. It all started by the visualization of bacteria using light microscopy by Antonie van Leeuwenhoek, when he first described “animalcules.” Direct cellular observation then evolved into utilizing different wavelengths on novel platforms such as electron, fluorescence, and even near-infrared microscopy. Understanding the link between microbes and disease (pathogenicity) began with the ability to isolate and cultivate organisms through aseptic methodologies starting in the 1700s. These techniques became more prevalent in the following centuries with the work of famous scientists such as Louis Pasteur and Robert Koch, and many others since then. The relationship between bacteria and the host’s immune system was first inferred in the 1800s, and to date is continuing to unveil its mysteries. During the last century, researchers initiated the era of molecular genetics. The discovery of the first-generation sequencing technology, the Sanger method, and, later, the polymerase chain reaction technology propelled the molecular genetics field by exponentially expanding the knowledge of relationship between gene structure and function. The rise of commercially available next-generation sequencing methodologies, in the beginning of this century, is drastically allowing larger amount of information to be acquired, in a manner open to the democratization of the approach

DigitalCommons@University of Nebraska

Bayesian Model-building in Phylogenetics

Author: Nelson Bradley
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, including mundane computational issues such as data management and complex statistical problems such as obtaining a single species tree from multiple conflicting gene trees. Developing new methods to make better use of existing data and probe the causes of conflicting signal will be necessary to confidently resolve phylogenies in the genomic era. Here, we examine two current problems in statistical phylogenetics and attempt to address them in a Bayesian framework. The first problem involves inflated tree lengths in Bayesian phylogenies, which can be an order of magnitude longer than maximum likelihood estimates. We developed EmpPrior, a program which queries TreeBASE for datasets similar to the focal data, then estimates parameters from each dataset to inform priors on the focal data. This approach greatly improves the tree length credible intervals in four exemplar datasets and, when combined with other approaches such as the use of a compound Dirichlet prior on tree length, can nearly eliminate the problem of inflated trees. The second problem involves incongruence between morphological and molecular phylogenies in squamates. Here, we use posterior prediction with inferential test statistics to investigate whether systematic error may be biasing inference in the molecular data. While we detected some model violation in most of the 44 genes, the genes with the most model violation were more distant from the molecular phylogeny. This suggests that model violation is not a major source of error in the molecular data. Hence, the source of incongruence between the molecular and morphological squamate topologies remains unknown. In both problems, we found that incorporating tools such as informed priors and posterior prediction from Bayesian statistical literature into phylogenetic analyses can improve results and help uncover why different datasets lead to conflicting topologies. As phylogenetic datasets continue to grow, using methodological best practices will only become more important if we want to have confidence in our conclusions

Louisiana State University

iDNA from terrestrial haematophagous leeches as a wildlife surveying and monitoring tool - prospects, pitfalls and avenues to be developed

Author: Calvignac-Spencer Sebastien
Gilbert M. Thomas. P.
Schnell Ida Baerholm
Siddall Mark E.
Sollmann Rahel
Wilting Andreas
Yu Douglas W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Invertebrate-derived DNA (iDNA) from terrestrial haematophagous leeches has recently been proposed as a powerful non-invasive tool with which to detect vertebrate species and thus to survey their populations. However, to date little attention has been given to whether and how this, or indeed any other iDNA-derived data, can be combined with state-of-the-art analytical tools to estimate wildlife abundances, population dynamics and distributions. In this review, we discuss the challenges that face the application of existing analytical methods such as site-occupancy and spatial capture-recapture (SCR) models to terrestrial leech iDNA, in particular, possible violations of key assumptions arising from factors intrinsic to invertebrate parasite biology. Specifically, we review the advantages and disadvantages of terrestrial leeches as a source of iDNA and summarize the utility of leeches for presence, occupancy, and spatial capture-recapture models. The main source of uncertainty that attends species detections derived from leech gut contents is attributable to uncertainty about the spatio-temporal sampling frame, since leeches retain host-blood for months and can move after feeding. Subsequently, we briefly address how the analytical challenges associated with leeches may apply to other sources of iDNA. Our review highlights that despite the considerable potential of leech (and indeed any) iDNA as a new survey tool, further pilot studies are needed to assess how analytical methods can overcome or not the potential biases and assumption violations of the new field of iDNA. Specifically we argue that studies to compare iDNA sampling with standard survey methods such as camera trapping, and those to improve our knowledge on leech (and other invertebrate parasite) physiology, taxonomy, and ecology will be of immense future value

Springer - Publisher Connector

Copenhagen University Research Information System

PubMed Central

eScholarship - University of California

University of East Anglia digital repository

espace@Curtin

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Publikationsserver des Robert Koch-Instituts

Incorporating molecular data in fungal systematics: a guide for aspiring researchers

Author: Abarenkov Kessy
Bertrand Yann J. K.
Hartmann Martin
Hyde Kevin D.
Kauserud Håvard
Kristiansson Erik
Larsson Ellen
Manamgoda Dimuthu S.
Nilsson Henrik R.
Oxelman Bengt
Ryberg Martin
Tedersoo Leho
Udayanga Dhanushka
Publication venue
Publication date: 01/01/2013
Field of study

The last twenty years have witnessed molecular data emerge as a primary research instrument in most branches of mycology. Fungal systematics, taxonomy, and ecology have all seen tremendous progress and have undergone rapid, far-reaching changes as disciplines in the wake of continual improvement in DNA sequencing technology. A taxonomic study that draws from molecular data involves a long series of steps, ranging from taxon sampling through the various laboratory procedures and data analysis to the publication process. All steps are important and influence the results and the way they are perceived by the scientific community. The present paper provides a reflective overview of all major steps in such a project with the purpose to assist research students about to begin their first study using DNA-based methods. We also take the opportunity to discuss the role of taxonomy in biology and the life sciences in general in the light of molecular data. While the best way to learn molecular methods is to work side by side with someone experienced, we hope that the present paper will serve to lower the learning threshold for the reader.Comment: Submitted to Current Research in Environmental and Applied Mycology - comments most welcom

arXiv.org e-Print Archive

Chalmers Research

Chalmers Publication Library

Deflating trees: Improving bayesian branch-length estimates using informed priors

Author: Andersen John J.
Brown Jeremy M.
Nelson Bradley J.
Publication venue: LSU Digital Commons
Publication date: 01/05/2015
Field of study

© 2015 © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: [email protected]. Prior distributions can have a strong effect on the results of Bayesian analyses. However, no general consensus exists for how priors should be set in all circumstances. Branch-length priors are of particular interest for phylogenetics, because they affect many parameters and biologically relevant inferences have been shown to be sensitive to the chosen prior distribution. Here, we explore the use of outside information to set informed branch-length priors and compare inferences from these informed analyses to those using default settings. For both the commonly used exponential and the newly proposed compound Dirichlet prior distributions, the incorporation of relevant outside information improves inferences for data sets that have produced problematic branch- and tree-length estimates under default settings. We suggest that informed priors are worthy of further exploration for phylogenetics

Louisiana State University

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

Crossref

eScholarship - University of California