Search CORE

104 research outputs found

eCAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains

Author: A Palleja
A Pati
A Roetzer
C Camacho
CR Laing
D Hyatt
D Kim
D Vallenet
DE Wood
EJ Richardson
J Dunbar
J Zhou
J-F Yu
Jerzy Tiuryn
JJ Gillespie
JL Klassen
Limsoon Wong
M Touchon
M Wozniak
M Wozniak
M Wozniak
ME Wall
Michal Wozniak
MS Poptsova
NJ Loman
NM Daniels
P Fournier
P-R Loh
PD Karp
PJA Cock
S Kasif
SP Shah
SV Angiuoli
SV Angiuoli
T Yada
THA Ederveen
V Pavlović
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

Author: A Orgiazzi
AP Jackson
BL Boyanton Jr
C Koetschan
C Koetschan
C Landlinger
C Landlinger
CL Schoch
D Kruger
DB Rusch
E Bellemain
E Zhang
EK Costello
F Meyer
G Liguori
HK Park
J Orvis
J Ravel
J Wilkening
JG Caporaso
JR White
JR White
LC Paulino
M Ryberg
MA Ghannoum
MD Edge
ME Lucero
N Fierer
PC Woo
PD Schloss
RC Edgar
RC Edgar
RC Edgar
RH Nilsson
S Chen
S Dollive
S Kosakovsky Pond
SF Altschul
SV Angiuoli
SV Angiuoli
T Mullineux
TM Porter
Z Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bioinformatics on the Cloud Computing Platform Azure

Author: Andrew P. Harrison
Anne M. Owen
BD Halligan
DA de Lima Morais
DP Wall
E Afgan
GEP Ropella
H Eriksson
H Kim
H Parkinson
Hugh P. Shanahan
J Qiu
L Zhang
LD Stein
M Abouelhoda
P Di Tommaso
RC Taylor
S Contrino
Shyamal D. Peddada
SV Angiuoli
T Barrett
VA Fusaro
WB Langdon
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. © 2014 Shanahan et al

University of Essex Research Repository

CiteSeerX

Public Library of Science (PLOS)

Crossref

Royal Holloway - Pure

Directory of Open Access Journals

PubMed Central

FigShare

Processing and analyzing multiple genomes alignments with MafFilter

Author: A Scally
Aaron E. Darling
CC Chang
Danecek P Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group
DG Higgins
EH Stukenbrock
EH Stukenbrock
J Casper
J Felsenstein
JB Lack
K Katoh
K Prüfer
L Duret
M Blanchette
M Hasegawa
M Hasegawa
M Slatkin
O Gascuel
S Guindon
S Kurtz
S Myers
S Schiffels
S Schwartz
SM Kiełbasa
SV Angiuoli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2020
Field of study

As the number of available genome sequences from both closely related species and individuals withinspecies increased, theoretical and methodological convergences between the fields of phylogenomics andpopulation genomics emerged. Population genomics typically focuses on the analysis of variants, whilephylogenomics heavily relies on genome alignments. However, these are playing an increasingly importantrole in studies at the population level. Multiple genome alignments of individuals are used when structuralvariation is of primary interest and when genome architecture permits to assemblede novogenomesequences. Here I describe MafFilter, a command-line-driven program allowing to process genome align-ments in the Multiple Alignment Format (MAF). Using concrete examples based on publicly availabledatasets, I demonstrate how MafFilter can be used to develop efficient and reproducible pipelines withquality assurance for downstream analyses. I further show how MafFilter can be used to perform both basicand advanced population genomic analyses in order to infer the patterns of nucleotide diversity alonggenomes

Crossref

MPG.PuRe

Short Term Evolution of a Highly Transmissible Methicillin-Resistant Staphylococcus aureus Clone (ST228) in a Tertiary Care Hospital

Author: A Clements
A Massouras
A Stamatakis
AD Kennedy
B Langmead
BA Diep
C Backman
D Gordon
DA Benson
DH Huson
Dominique S. Blanc
DS Blanc
DS Blanc
DS Blanc
FD Lowy
G Kuhn
H Grundmann
H Li
HA Schmidt
JA Lindsay
JA Lindsay
JM-L Sung
JR Fitzgerald
K Strimmer
Laurent Falquet
M Aires de Sousa
M Kuroda
M Magrane
M Salemi
Marco Salemi
MC Enright
MJ Pallen
MTG Holden
N Woodford
ND Pattengale
Patrick Basset
R Li
R Mato
RH Deurenberg
RJL Willems
RK Aziz
RR Gray
RW Jackson
S Calderon-Copete
S Monecke
Sandra P. Calderon-Copete
SR Gill
SR Harris
SV Angiuoli
T Conceição
TC Bruen
U Nübel
U Nübel
V Lazarevic
Valérie Vogel
W Witte
WJB Wannet
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Staphylococcus aureus is recognized as one of the major human pathogens and is by far one of the most common nosocomial organisms. The genetic basis for the emergence of highly epidemic strains remains mysterious. Studying the microevolution of the different clones of S. aureus is essential for identifying the forces driving pathogen emergence and spread. The aim of the present study was to determine the genetic changes characterizing a lineage belonging to the South German clone (ST228) that spread over ten years in a tertiary care hospital in Switzerland. For this reason, we compared the whole genome of eight isolates recovered between 2001 and 2008 at the Lausanne hospital. The genetic comparison of these isolates revealed that their genomes are extremely closely related. Yet, a few more important genetic changes, such as the replacement of a plasmid, the loss of large fragments of DNA, or the insertion of transposases, were observed. These transfers of mobile genetic elements shaped the evolution of the ST228 lineage that spread within the Lausanne hospital. Nevertheless, although the strains analyzed differed in their dynamics, we have not been able to link a particular genetic element with spreading success. Finally, the present study showed that new sequencing technologies improve considerably the quality and quantity of information obtained for a single strain; but this information is still difficult to interpret and important investments are required for the technology to become accessible for routine investigations

CiteSeerX

Public Library of Science (PLOS)

Crossref

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central

FigShare

Whole genome sequencing to investigate the emergence of clonal complex 23 Neisseria meningitidis serogroup Y disease in the United States

In the United States, serogroup Y, ST-23 clonal complex Neisseria meningitidis was responsible for an increase in meningococcal disease incidence during the 1990s. This increase was accompanied by antigenic shift of three outer membrane proteins, with a decrease in the population that predominated in the early 1990s as a different population emerged later in that decade. To understand factors that may have been responsible for the emergence of serogroup Y disease, we used whole genome pyrosequencing to investigate genetic differences between isolates from early and late N. meningitidis populations, obtained from meningococcal disease cases in Maryland in the 1990s. The genomes of isolates from the early and late populations were highly similar, with 1231 of 1776 shared genes exhibiting 100% amino acid identity and an average πN = 0.0033 and average πS = 0.0216. However, differences were found in predicted proteins that affect pilin structure and antigen profile and in predicted proteins involved in iron acquisition and uptake. The observed changes are consistent with acquisition of new alleles through horizontal gene transfer. Changes in antigen profile due to the genetic differences found in this study likely allowed the late population to emerge due to escape from population immunity. These findings may predict which antigenic factors are important in the cyclic epidemiology of meningococcal disease

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%–63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with “overprediction” of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A High-Resolution View of Genome-Wide Pneumococcal Transformation

Transformation is an important mechanism of microbial evolution through which bacteria have been observed to rapidly adapt in response to clinical interventions; examples include facilitating vaccine evasion and the development of penicillin resistance in the major respiratory pathogen Streptococcus pneumoniae. To characterise the process in detail, the genomes of 124 S. pneumoniae isolates produced through in vitro transformation were sequenced and recombination events detected. Those recombinations importing the selected marker were independent of unselected events elsewhere in the genome, the positions of which were not significantly affected by local sequence similarity between donor and recipient or mismatch repair processes. However, both types of recombinations were sometimes mosaic, with multiple non-contiguous segments originating from the same molecule of donor DNA. The lengths of the unselected events were exponentially distributed with a mean of 2.3 kb, implying that recombinations are stochastically resolved with a fixed per base probability of 4.4×10−4 bp−1. This distribution of recombination sizes, coupled with an observed under representation of large insertions within transferred sequence, suggests transformation has the potential to reduce the size of bacterial genomes, and is unlikely to act as an efficient mechanism for the uptake of accessory genomic loci

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Bioinformatics for the human microbiome project

Author: A Brady
A Brady
A Dupuy
AA Fodor
B Langmead
B Liu
BJ Haas
BL Cantarel
C Luo
CR Woese
Curtis Huttenhower
D Dalevi
D Medini
D Wu
DA Soergel
DB Rusch
Dirk Gevers
DN Frank
DR Littman
DR Zerbino
DT Pride
E Pruesse
EA Grice
EK Costello
F Meyer
GD Wu
GW Tyson
H Li
J Goll
J Kuczynski
J Martin
J Peterson
J Qin
J Raes
J Ravel
JA Eisen
JC Venter
JC Wooley
JG Caporaso
JG Caporaso
Jonathan A. Eisen
JT Simpson
KE Nelson
KE Wommack
L Wen
M Arumugam
M Ghodsi
M Hamady
M Kanehisa
M Margulies
M Rho
MD Mailman
Mihai Pop
N Segata
NR Pace
P Narasingarao
P Yilmaz
Patrick D. Schloss
PB Eckburg
PD Schloss
PD Schloss
PD Schloss
PJ Turnbaugh
PJ Turnbaugh
PJ Turnbaugh
R Caspi
R Li
RC Edgar
RC Edgar
S Abubucker
S Devkota
S Istrail
S Koren
SG Tringe
SM Huse
SR Gill
SV Angiuoli
T Namiki
T Yatsunenko
TJ Sharpton
TZ DeSantis
V Iverson
V Kunin
V Kunin
WR Streit
X Li
Y Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2012
Field of study

Microbes inhabit virtually all sites of the human body, yet we know very little about the role they play in our health. In recent years, there has been increasing interest in studying human-associated microbial communities, particularly since microbial dysbioses have now been implicated in a number of human diseases [1]–[3]. Dysbiosis, the disruption of the normal microbial community structure, however, is impossible to define without first establishing what “normal microbial community structure” means within the healthy human microbiome. Recent advances in sequencing technologies have made it feasible to perform large-scale studies of microbial communities, providing the tools necessary to begin to address this question [4], [5]. This led to the implementation of the Human Microbiome Project (HMP) in 2007, an initiative funded by the National Institutes of Health Roadmap for Biomedical Research and constructed as a large, genome-scale community research project [6]. Any such project must plan for data analysis, computational methods development, and the public availability of tools and data; here, we provide an overview of the corresponding bioinformatics organization, history, and results from the HMP (Figure 1).National Institutes of Health (U.S.) (NIH U54HG004969)National Institutes of Health (U.S.) (grant R01HG004885)National Institutes of Health (U.S.) (grant R01HG005975)National Institutes of Health (U.S.) (grant R01HG005969

DSpace@MIT

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central