Search CORE

2,473 research outputs found

A diagnostic dilemma: a case report

Author: AG Blundell
CA Langford
David M Comer
H Klinger
J David M Edgar
J Savige
JD Edgar
JD Edgar
JK Rao
L Vittaz
M Boucelma
O Weijtens
SR Weiner
SS Krafcik
Publication venue: BioMed Central
Publication date: 29/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

Optimizing substitution matrix choice and gap parameters for sequence alignment

Author: CB Do
CB Do
CN Dewey
D Gusfield
DT Jones
E Kim
G Blackshields
GA Price
GH Gonnet
I Van Walle
J Flannick
J Kececioglu
J Pei
JD Thompson
JD Thompson
JG Henikoff
K Katoh
M Box
MA Larkin
MO Dayhoff
MP Styczynski
MS Waterman
O Chapelle
RC Edgar
RC Edgar
Robert C Edgar
S Henikoff
T Lassmann
T Muller
T Muller
TM Phuong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments. Results POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB. Conclusion The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

State of the art: refinement of multiple sequence alignments

Author: A Marchler-Bauer
AB Robinson
AJ Jennings
Anna R Panchenko
C Notredame
C Notredame
CB Do
Christopher J Lanczycki
GJ Barton
IM Wallace
J Chen
J Heringa
J Heringa
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JF Gibrat
K Katoh
K Katoh
O Gotoh
Paul A Thiessen
RC Edgar
S Chakrabarti
Saikat Chakrabarti
SR Eddy
Stephen H Bryant
T Lassmann
T Madej
Teresa M Przytycka
WR Taylor
WS Valdar
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Accurate multiple sequence alignments of proteins are very important in computational biology today. Despite the numerous efforts made in this field, all alignment strategies have certain shortcomings resulting in alignments that are not always correct. Refinement of existing alignment can prove to be an intelligent choice considering the increasing importance of high quality alignments in large scale high-throughput analysis. RESULTS: We provide an extensive comparison of the performance of the alignment refinement algorithms. The accuracy and efficiency of the refinement programs are compared using the 3D structure-based alignments in the BAliBASE benchmark database as well as manually curated high quality alignments from Conserved Domain Database (CDD). CONCLUSION: Comparison of performance for refined alignments revealed that despite the absence of dramatic improvements, our refinement method, REFINER, which uses conserved regions as constraints performs better in improving the alignments generated by different alignment algorithms. In most cases REFINER produces a higher-scoring, modestly improved alignment that does not deteriorate the well-conserved regions of the original alignment

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Grammar-based distance in progressive multiple sequence alignment

Author: AY Mitrophanov
C Notredame
C Notredame
CB Do
David J Russell
DJ Lipman
GH Gonnet
Hasan H Otu
HH Otu
J Stoye
J Ziv
J Ziv
JD Thompson
JD Thompson
K Katoh
K Katoh
K Katoh
Khalid Sayood
MO Albertson
P Clote
R Durbin
RC Edgar
RC Edgar
S Henikoff
S Sze
SB Needleman
VD Gusev
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets

Crossref

DigitalCommons@University of Nebraska

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

Author: A Bahr
AR Subramanian
C Grasso
C Notredame
C Notredame
CB Do
Hayato Yamana
J Kececioglu
JD Thompson
JD Thompson
JD Thompson
JD Thompson
K Karplus
K Katoh
K Katoh
MA McClure
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O Gotoh
O O'Sullivan
Osamu Gotoh
RC Edgar
Shinsuke Yamada
T Jiang
W Miller
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. RESULTS: We propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee. CONCLUSION: PRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The fitness of African malaria vectors in the presence and limitation of host behaviour

Author: A Kiszewski
Ally A Daraja
B Pluess
BD Roitberg
BL Hart
C Lengeler
CR Davies
Daniel T Haydon
DW Kelly
Edgar M Mbehela
GF Killeen
GL Zhou
GMM Chege
H Briegel
H Briegel
H Briegel
H Ferguson
H Hawlena
H Hurd
Heather M Ferguson
I Lyimo
IN Lyimo
IN Lyimo
IS Khokhlova
Issa N Lyimo
J Jaenike
JB Duchemin
JC Hodgson
JD Edman
JD Edman
JD Edman
JF Day
JK Nayar
JK Waage
JL Clarke III
JM Darbro
JP Bryant
Kasian F Mbina
L Despres
LC Harrington
MJ Crawley
MJ Kirby
MJ Stout
MM Wintrobe
P Bize
PJ Taylor
R Ziegler
RA Anderson
RA Anderson
Richard Reeve
S Holm
S Schofield
SC Weaver
SJ Torr
SJ Torr
SM Muriu
T Dekker
T Habtewold
T Lefevre
TM Katz
W Takken
WHO
X Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background Host responses are important sources of selection upon the host species range of ectoparasites and phytophagous insects. However little is known about the role of host responses in defining the host species range of malaria vectors. This study aimed to estimate the relative importance of host behaviour to the feeding success and fitness of African malaria vectors, and assess its ability to predict their known host species preferences in nature. Methods Paired evaluations of the feeding success and fitness of African vectors Anopheles arabiensis and Anopheles gambiae s.s in the presence and limitation of host behaviour were conducted in a semi-field system (SFS) at Ifakara Health Institute, Tanzania. In one set of trials, mosquitoes were released within the SFS and allowed to forage overnight on a host that was free to exhibit natural behaviour in response to insect biting. In the other, mosquitoes were allowed to feed directly on from the skin surface of immobile hosts. The feeding success and subsequent fitness of vectors under these conditions were investigated on 6 host types (humans, calves, chickens, cows, dogs and goats) to assess whether physical movements of preferred host species (cattle for An. arabiensis, humans for An. gambiae s.s.) were less effective at preventing mosquito bites than those of common alternatives. Results Anopheles arabiensis generally had greater feeding success when applied directly to host skin than when foraging on unrestricted hosts (in five of six host species). However, An. gambiae s.s obtained blood meals from free and restrained hosts with similar success from most host types (four out of six). Overall, the blood meal size, oviposition rate, fecundity and post-feeding survival of mosquito vectors were significantly higher after feeding on hosts free to exhibit behaviour, than those who were immobilized during feeding trials. Conclusions Allowing hosts to move freely during exposure to mosquitoes was associated with moderate reductions in mosquito feeding success, but no detrimental impact to the subsequent fitness of mosquitoes that were able to feed upon them. This suggests that physical defensive behaviours exhibited by common host species including humans do not impose substantial fitness costs on African malaria vectors.</p&gt

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease

Author: A Krogh
AH Gaspar
C Meissner
Daniel H. Haft
DH Haft
DH Haft
F Brossier
GE Crooks
JD Bendtsen
JD Selengut
JD Selengut
JD Thompson
K Hofmann
K Strisovsky
LG Stevenson
M Freeman
M Shoji
M Zettl
Maureen J. Donlin
MJ Pallen
Neha Varghese
O Schneewind
RC Edgar
RD Finn
S Urban
S Urban
S Urban
SH Payne
SJ Callister
SR Eddy
Y Sugano
Z Wu
Publication venue: Public Library of Science
Publication date: 14/12/2011
Field of study

The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain's tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Protein sequence alignment with family-specific amino acid similarity matrices

Author: A Agrawal
A Prlić
AR Panchenko
B Qian
B Rost
C Notredame
CB Do
CN Cavasotto
G Vogt
GH Gonnet
GP Raghava
I Van Walle
Igor B Kuznetsov
IN Shindyalov
J Pei
J Söding
JD Blake
JD Thompson
JM Sauder
JS Bernardes
K Mizuguchi
L Holm
L Lo Conte
ML Sierk
MO Dayhoff
MS Johnson
RB Vilim
RC Edgar
RC Edgar
RC Edgar
S Henikoff
S Salem
SB Needleman
SE Brenner
SF Altschul
SR Eddy
T Müller
TF Smith
V Ahola
WR Pearson
WR Taylor
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Evaluation of a Bayesian inference network for ligand-based virtual screening

Author: A Abdo
A Bender
AG Maldonado
AN Jain
AR Leach
AR Leach
Beining Chen
Christoph Mueller
CX Zhai
D Metzler
EJ Gardiner
EM Voorhees
G Salton
GW Bemis
H Eckert
H Turtle
J Bajorath
J Hert
J Hert
J-F Truchon
JA Grant
JD Holliday
JP Callan
JP Callan
JR Fischer
K Spärck Jones
K Spärck Jones
N Nikolova
P Prathipati
P Willett
P Willett
P Willett
P Willett
P Willett
Peter Willett
RC Glen
RD Brown
RP Sheridan
RP Sheridan
S Siegel
SJ Edgar
T Lengauer
T Strohman
TI Oprea
WR Greiff
X Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background Bayesian inference networks enable the computation of the probability that an event will occur. They have been used previously to rank textual documents in order of decreasing relevance to a user-defined query. Here, we modify the approach to enable a Bayesian inference network to be used for chemical similarity searching, where a database is ranked in order of decreasing probability of bioactivity. Results Bayesian inference networks were implemented using two different types of network and four different types of belief function. Experiments with the MDDR and WOMBAT databases show that a Bayesian inference network can be used to provide effective ligand-based screening, especially when the active molecules being sought have a high degree of structural homogeneity; in such cases, the network substantially out-performs a conventional, Tanimoto-based similarity searching system. However, the effectiveness of the network is much less when structurally heterogeneous sets of actives are being sought. Conclusion A Bayesian inference network provides an interesting alternative to existing tools for ligand-based virtual screening

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

White Rose Research Online