Search CORE

192 research outputs found

Hidden breakpoints in genome alignments

Author: A. Rambaut
A.C.E. Darling
A.E. Darling
A.L. Delcher
C.D. Greenman
D. Medini
E. Tannier
G. Fudenberg
M. Blanchette
M. Nowacki
M.A. Umbarger
S. De
S. Schwartz
S.V. Angiuoli
V. Kolmogorov
Publication venue
Publication date: 01/01/2012
Field of study

During the course of evolution, an organism's genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these "hidden" breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons. We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm. We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Bacterial microevolution and the Pangenome

Author: A Bankevich
AE Darling
AE Darling
AJ Page
AO Kislyuk
B Charlesworth
C Buckee
C Collins
C Wiuf
CM Thomas
CS Pepperell
DJ Wilson
DR Zerbino
E Jacox
F Lassalle
GE Sims
GJ Szollosi
GJ Szollosi
GJ Szollosi
H Ochman
IJ Wilson
J Hedge
J Lawrence
JB Joy
JFC Kingman
KAA Jolley
KE Dingle
KT Konstantinidis
L Li
L Petersen
M Csurös
M Nordborg
M Pagel
M Steinegger
M Touchon
M Vos
M Vos
M Vos
MJ Ward
MTG Holden
NA Rosenberg
NJ Croucher
P Donnelly
PAP Moran
R Griffiths
RC Griffiths
RG Everitt
RK Aziz
S Castillo-Ramírez
S Kurtz
S Wright
SF Altschul
SK Sheppard
SK Sheppard
SS Abby
SV Angiuoli
T Ohta
T Seemann
TG Vaughan
WP Maddison
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
Z Yang
Z Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

The comparison of multiple genome sequences sampled from a bacterial population reveals considerable diversity in both the core and the accessory parts of the pangenome. This diversity can be analysed in terms of microevolutionary events that took place since the genomes shared a common ancestor, especially deletion, duplication, and recombination. We review the basic modelling ingredients used implicitly or explicitly when performing such a pangenome analysis. In particular, we describe a basic neutral phylogenetic framework of bacterial pangenome microevolution, which is not incompatible with evaluating the role of natural selection. We survey the different ways in which pangenome data is summarised in order to be included in microevolutionary models, as well as the main methodological approaches that have been proposed to reconstruct pangenome microevolutionary history

Crossref

Warwick Research Archives Portal Repository

CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

Author: A Orgiazzi
AP Jackson
BL Boyanton Jr
C Koetschan
C Koetschan
C Landlinger
C Landlinger
CL Schoch
D Kruger
DB Rusch
E Bellemain
E Zhang
EK Costello
F Meyer
G Liguori
HK Park
J Orvis
J Ravel
J Wilkening
JG Caporaso
JR White
JR White
LC Paulino
M Ryberg
MA Ghannoum
MD Edge
ME Lucero
N Fierer
PC Woo
PD Schloss
RC Edgar
RC Edgar
RC Edgar
RH Nilsson
S Chen
S Dollive
S Kosakovsky Pond
SF Altschul
SV Angiuoli
SV Angiuoli
T Mullineux
TM Porter
Z Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

eCAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains

Author: A Palleja
A Pati
A Roetzer
C Camacho
CR Laing
D Hyatt
D Kim
D Vallenet
DE Wood
EJ Richardson
J Dunbar
J Zhou
J-F Yu
Jerzy Tiuryn
JJ Gillespie
JL Klassen
Limsoon Wong
M Touchon
M Wozniak
M Wozniak
M Wozniak
ME Wall
Michal Wozniak
MS Poptsova
NJ Loman
NM Daniels
P Fournier
P-R Loh
PD Karp
PJA Cock
S Kasif
SP Shah
SV Angiuoli
SV Angiuoli
T Yada
THA Ederveen
V Pavlović
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bioinformatics on the Cloud Computing Platform Azure

Author: Andrew P. Harrison
Anne M. Owen
BD Halligan
DA de Lima Morais
DP Wall
E Afgan
GEP Ropella
H Eriksson
H Kim
H Parkinson
Hugh P. Shanahan
J Qiu
L Zhang
LD Stein
M Abouelhoda
P Di Tommaso
RC Taylor
S Contrino
Shyamal D. Peddada
SV Angiuoli
T Barrett
VA Fusaro
WB Langdon
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. © 2014 Shanahan et al

University of Essex Research Repository

CiteSeerX

Public Library of Science (PLOS)

Crossref

Royal Holloway - Pure

Directory of Open Access Journals

PubMed Central

FigShare

Processing and analyzing multiple genomes alignments with MafFilter

Author: A Scally
Aaron E. Darling
CC Chang
Danecek P Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group
DG Higgins
EH Stukenbrock
EH Stukenbrock
J Casper
J Felsenstein
JB Lack
K Katoh
K Prüfer
L Duret
M Blanchette
M Hasegawa
M Hasegawa
M Slatkin
O Gascuel
S Guindon
S Kurtz
S Myers
S Schiffels
S Schwartz
SM Kiełbasa
SV Angiuoli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2020
Field of study

As the number of available genome sequences from both closely related species and individuals withinspecies increased, theoretical and methodological convergences between the fields of phylogenomics andpopulation genomics emerged. Population genomics typically focuses on the analysis of variants, whilephylogenomics heavily relies on genome alignments. However, these are playing an increasingly importantrole in studies at the population level. Multiple genome alignments of individuals are used when structuralvariation is of primary interest and when genome architecture permits to assemblede novogenomesequences. Here I describe MafFilter, a command-line-driven program allowing to process genome align-ments in the Multiple Alignment Format (MAF). Using concrete examples based on publicly availabledatasets, I demonstrate how MafFilter can be used to develop efficient and reproducible pipelines withquality assurance for downstream analyses. I further show how MafFilter can be used to perform both basicand advanced population genomic analyses in order to infer the patterns of nucleotide diversity alonggenomes

Crossref

MPG.PuRe

Ergatis: a web interface and scalable software system for bioinformatics workflows

Author: A. Gussman
A. Mahurkar
Ashburner
B. Whitty
Besemer
Birney
Carlton
Crabtree
D. Riley
Dibernardo
Dunning Hotopp
E. Lee
Eilbeck
El-Sayed
Emanuelsson
J. Crabtree
J. M. Inman
J. Orvis
J. P. Sundaram
J. Wortman
K. Galens
Mungall
O. White
S. Nampally
S. V. Angiuoli
Shah
Siepel
Tang
Tettelin
Tiwari
V. Felix
Wilkinson
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users

Crossref

PubMed Central

CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

Author: A Bateman
A Bateman
A Tridgell
Aaron Gussman
AC Stewart
AL Delcher
B Langmead
B Langmead
BE Suzek
C Hemmerich
C Rapier
Cesar Arze
D Field
D Hull
David R Riley
DL Wheeler
DR Zerbino
E Afgan
EE Schadt
F Meyer
J Dean
J Goecks
J Orvis
J White
J White
J White
James R White
JD Selengut
JG Caporaso
JP Mesirov
JR Cole
JR Miller
JR White
JT Dudley
K Galens
K Keahey
K Lagesen
Kevin Galens
LD Stein
M Reich
Mahesh Vangala
Malcolm Matalka
MC Schatz
MC Schatz
MC Schatz
O Trelles
Owen White
PD Schloss
RC Edgar
RK Aziz
RL Tatusov
S Angiuoli
Samuel V Angiuoli
SD Kahn
SF Altschul
SF Altschul
SR Eddy
TM Lowe
W Florian Fricke
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35

Crossref

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Correction: Comparative Genomics of Emerging Human Ehrlichiosis Agents

Crossref

Directory of Open Access Journals

PubMed Central