Search CORE

811 research outputs found

Optimizing substitution matrix choice and gap parameters for sequence alignment

Author: CB Do
CB Do
CN Dewey
D Gusfield
DT Jones
E Kim
G Blackshields
GA Price
GH Gonnet
I Van Walle
J Flannick
J Kececioglu
J Pei
JD Thompson
JD Thompson
JG Henikoff
K Katoh
M Box
MA Larkin
MO Dayhoff
MP Styczynski
MS Waterman
O Chapelle
RC Edgar
RC Edgar
Robert C Edgar
S Henikoff
T Lassmann
T Muller
T Muller
TM Phuong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments. Results POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB. Conclusion The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

A user-friendly web portal for T-Coffee on supercomputers

Author: C Notredame
C Notredame
CB Do
Cedric Notredame
Fernando Cores
Francesc Solsona
J Koetsier
J Zola
Jano I van Hemert
Jos Koetsier
Josep Rius
M Orobitg
P Di Tommaso
RC Edgar
RD Finn
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.</p

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Repositori Obert UdL

Reef fishes at all trophic levels respond positively to effective marine protected areas

Author: A Rogers
AB Hollowed
AE Bates
Anthony T. F. Bernard
B Silverman
BD Stewart
BP Kelaher
BS Halpern
C Mora
C Ward-Paige
CB Edwards
CJ Walters
CR Kelble
D Pauly
D Pauly
David E. Galván
Dennis M. Higgs
DR Bellwood
E Sala
F Micheli
German A. Soler
GJ Edgar
GJ Edgar
GJ Edgar
GJ Edgar
GJ Edgar
GJ Edgar
Graham J. Edgar
H Hillebrand
JB Kellner
KJ Gaston
L Tyberghein
M Ortiz
MD Spalding
NAJ Graham
NC Ban
Neville S. Barrett
NS Barrett
PJ Jones
PJ Mumby
RC Babcock
RD Stuart-Smith
Rick D. Stuart-Smith
RS Steneck
Russell J. Thomson
S Jennings
SE Kingsland
Stuart J. Campbell
Stuart Kininmonth
Terence P. Dawson
Timothy J. Alexander
TJ Willis
TJ Willis
TP Hughes
Trevor J. Willis
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Marine Protected Areas (MPAs) offer a unique opportunity to test the assumption that fishing pressure affects some trophic groups more than others. Removal of larger predators through fishing is often suggested to have positive flow-on effects for some lower trophic groups, in which case protection from fishing should result in suppression of lower trophic groups as predator populations recover. We tested this by assessing differences in the trophic structure of reef fish communities associated with 79 MPAs and open-access sites worldwide, using a standardised quantitative dataset on reef fish community structure. The biomass of all major trophic groups (higher carnivores, benthic carnivores, planktivores and herbivores) was significantly greater (by 40% - 200%) in effective no-take MPAs relative to fished open-access areas. This effect was most pronounced for individuals in large size classes, but with no size class of any trophic group showing signs of depressed biomass in MPAs, as predicted from higher predator abundance. Thus, greater biomass in effective MPAs implies that exploitation on shallow rocky and coral reefs negatively affects biomass of all fish trophic groups and size classes. These direct effects of fishing on trophic structure appear stronger than any top down effects on lower trophic levels that would be imposed by intact predator populations. We propose that exploitation affects fish assemblages at all trophic levels, and that local ecosystem function is generally modified by fishing

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

University of the South Pacific Electronic Research Repository

PubMed Central

Portsmouth University Research Portal (Pure)

Western Sydney ResearchDirect

University of Dundee Online Publications

King's Research Portal

Osteoarticular Infection in Three Young Thoroughbred Horses Caused by a Novel Gram Negative Cocco-Bacillus

Author: Adkins AR
Begg AP
Blishen A
Bogema D
Chan L
Charles IG
Chicken C
Djordjevic SP
Edgar A
Hudson BJ
Karagiannis T
Mitsakos K
O'Rourke BA
O'Sullivan CB
Raymond B
Roy Chowdhury P
Todhunter KH
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2020
Field of study

© 2020 Bernard J. Hudson et al. We describe three cases of osteoarticular infection (OAI) in young thoroughbred horses in which the causative organism was identified by MALDI-TOF as Kingella species. The pattern of OAI resembled that reported with Kingella infection in humans. Analysis by 16S rRNA PCR enabled construction of a phylogenetic tree that placed the isolates closer to Simonsiella and Alysiella species, rather than Kingella species. Average nucleotide identity (ANI) comparison between the new isolate and Kingella kingae and Alysiella crassa however revealed low probability that the new isolate belonged to either of these species. This preliminary analysis suggests the organism isolated is a previously unrecognised species

OPUS - University of Technology Sydney

Directory of Open Access Journals

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

Author: AR Panchenko
B Morgenstern
B Rost
C Chothia
C Kemena
C Notredame
CB Do
Cédric Notredame
D Baker
DG Higgins
DT Jones
Eugene A. Permyakov
F Armougom
G Yona
GH Gonnet
H-N Lin
Hsin-Nan Lin
HY Zhou
HY Zhou
J Skolnick
J Soding
JD Thompson
Jia-Ming Chang
JM Pei
JM Pei
JM Pei
K Katoh
L Rychlewski
L Wang
LA Kelley
MJ Sternberg
MO Dayhoff
O O'Sullivan
P Hogeweg
R Hagopian
R Sadreyev
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
S Henikoff
SF Altschul
SF Altschul
T Hara
T Müller
Ting-Yi Sung
U Roshan
VA Simossis
W Kabsch
W Kabsch
Wen-Lian Hsu
Y Zhang
Y Zhang
Y Zhang
Publication venue: Public Library of Science
Publication date: 02/12/2011
Field of study

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Grammar-based distance in progressive multiple sequence alignment

Author: AY Mitrophanov
C Notredame
C Notredame
CB Do
David J Russell
DJ Lipman
GH Gonnet
Hasan H Otu
HH Otu
J Stoye
J Ziv
J Ziv
JD Thompson
JD Thompson
K Katoh
K Katoh
K Katoh
Khalid Sayood
MO Albertson
P Clote
R Durbin
RC Edgar
RC Edgar
S Henikoff
S Sze
SB Needleman
VD Gusev
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets

Crossref

DigitalCommons@University of Nebraska

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

Shallow water marine sediment bacterial community shifts along a natural CO2 gradient in the Mediterranean Sea off Vulcano, Italy.

Author: A Ghosh
A Rusch
AD Moy
AS Roy
BB Dias
Binu M. Tripathi
BM Tripathi
CB Munn
D Meron
D Meron
DH Choi
Dorsaf Kerfahi
F Inagaki
FS Chapin
I Lidbury
IE Hendriks
J Olafsson
J Piontek
Jason M. Hall-Spencer
JJ Farmer III
JL Ray
Jonathan M. Adams
JP Bowman
Junghoon Lee
JW Liu
K Caldeira
K Ravenschlag
KE Fabricius
KJ Kroeker
LK Newbold
M Allgaier
M Heyndrickx
M Sperling
M Wagner
Marco Milazzo
MN Price
MV Lindh
N Fierer
P Altenburger
P Kerrison
PD Schloss
R Rodolfo-Metalpa
RC Edgar
S Park
S Simmons
S Uthicke
S Widdicombe
SM Huse
T Unno
V Kitidis
WJ Li
X Hongxiang
Y Nogi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The effects of increasing atmospheric CO(2) on ocean ecosystems are a major environmental concern, as rapid shoaling of the carbonate saturation horizon is exposing vast areas of marine sediments to corrosive waters worldwide. Natural CO(2) gradients off Vulcano, Italy, have revealed profound ecosystem changes along rocky shore habitats as carbonate saturation levels decrease, but no investigations have yet been made of the sedimentary habitat. Here, we sampled the upper 2 cm of volcanic sand in three zones, ambient (median pCO(2) 419 μatm, minimum Ω(arag) 3.77), moderately CO(2)-enriched (median pCO(2) 592 μatm, minimum Ω(arag) 2.96), and highly CO(2)-enriched (median pCO(2) 1611 μatm, minimum Ω(arag) 0.35). We tested the hypothesis that increasing levels of seawater pCO(2) would cause significant shifts in sediment bacterial community composition, as shown recently in epilithic biofilms at the study site. In this study, 454 pyrosequencing of the V1 to V3 region of the 16S rRNA gene revealed a shift in community composition with increasing pCO(2). The relative abundances of most of the dominant genera were unaffected by the pCO(2) gradient, although there were significant differences for some 5 % of the genera present (viz. Georgenia, Lutibacter, Photobacterium, Acinetobacter, and Paenibacillus), and Shannon Diversity was greatest in sediments subject to long-term acidification (>100 years). Overall, this supports the view that globally increased ocean pCO(2) will be associated with changes in sediment bacterial community composition but that most of these organisms are resilient. However, further work is required to assess whether these results apply to other types of coastal sediments and whether the changes in relative abundance of bacterial taxa that we observed can significantly alter the biogeochemical functions of marine sediments

Crossref

PEARL (Univ. of Plymouth)

Publishing Network for Geoscientific and Environmental Data

Archivio istituzionale della ricerca - Università di Palermo

Protein sequence alignment with family-specific amino acid similarity matrices

Author: A Agrawal
A Prlić
AR Panchenko
B Qian
B Rost
C Notredame
CB Do
CN Cavasotto
G Vogt
GH Gonnet
GP Raghava
I Van Walle
Igor B Kuznetsov
IN Shindyalov
J Pei
J Söding
JD Blake
JD Thompson
JM Sauder
JS Bernardes
K Mizuguchi
L Holm
L Lo Conte
ML Sierk
MO Dayhoff
MS Johnson
RB Vilim
RC Edgar
RC Edgar
RC Edgar
S Henikoff
S Salem
SB Needleman
SE Brenner
SF Altschul
SR Eddy
T Müller
TF Smith
V Ahola
WR Pearson
WR Taylor
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central