Search CORE

18 research outputs found

Determination of optimal parameters of MAFFT program based on BAliBASE3.0 database

Author
Publication venue: Springer
Publication date: 16/06/2016
Field of study

Springer - Publisher Connector

Determination of optimal parameters of MAFFT program based on BAliBASE3.0 database

Author: C Gondro
C Notredame
FS Pais
JD Thompson
JD Thompson
JD Thompson
JT Reese
K Katoh
K Katoh
K Katoh
K Kazutaka
MO Francisco
MS Madhusudhan
O Gotoh
PA Nuin
RH Lathrop
V Ahola
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor

Author: Gentles Andrew J
Hankus Lukasz
Jurka Jerzy
Kohany Oleksiy
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases. RESULTS: We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments. CONCLUSION: Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at (RepbaseSubmitter) and (Censor)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Testing statistical significance scores of sequence comparison methods with structure similarity

Author: AA Schaffer
AD Kester
EV Kriventseva
G Salton
GA Price
HS Booth
J Park
Jack AM Leunissen
Jacob de Vlieg
JJ Codani
JP Comet
JT Reese
M Gribskov
O Bastien
P Agarwal
Peter MA Groenen
R Apweiler
RF Doolittle
S Henikoff
SE Brenner
SE Brenner
SF Altschul
T Hulsen
T Rognes
TF Smith
Tim Hulsen
WR Pearson
WR Pearson
WR Pearson
WR Pearson
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Radboud Repository

The effectiveness of position- and composition-specific gap costs for protein similarity searches

Author: A. Stojmirovic
Barrett
Benner
Chandonia
Chang
E. M. Gertz
Eddy
Finn
Gotoh
Gough
Gribskov
Gribskov
Hajian-Tilaki
Hanley
Henikoff
Hughey
Krogh
Madera
Murzin
Pascarella
Qiu
Reese
S. F. Altschul
Schaffer
Smith
Vinga
Wistrand
Wrabl
Y.-K. Yu
Yu
Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/01/2008
Field of study

The flexibility in gap cost enjoyed by Hidden Markov Models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments. We found that position-specific gap penalties have an advantage over uniform gap costs. We did not explore optimizing distinct uniform gap costs for each query. For Pfam, PSSMs iteratively constructed from seeds based on HMM consensus sequences perform equivalently to HMMs that were adjusted to have constant gap transition probabilities, albeit with much greater variance. We observed no effect of composition-specific gap costs on retrieval performance.Comment: 17 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Crossref

PubMed Central

Bioinformatics analysis to assess potential risks of allergenicity and toxicity of HRAP and PFLP proteins in genetically modified bananas resistant to Xanthomonas wilt disease

Author: Goodman R.E.
Jin Y.
Lu M.
Tetteh A.O.
Tripathi L.
Publication venue: 'Elsevier BV'
Publication date: 16/10/2017
Field of study

Article purchased; Published online:19 August 2017Banana Xanthomonas wilt (BXW) disease threatens banana production and food security throughout East Africa. Natural resistance is lacking among common cultivars. Genetically modified (GM) bananas resistant to BXW disease were developed by inserting the hypersensitive response-assisting protein (Hrap) or/and the plant ferredoxin-like protein (Pflp) gene(s) from sweet pepper (Capsicum annuum). Several of these GM banana events showed 100% resistance to BXW disease under field conditions in Uganda. The current study evaluated the potential allergenicity and toxicity of the expressed proteins HRAP and PFLP based on evaluation of published information on the history of safe use of the natural source of the proteins as well as established bioinformatics sequence comparison methods to known allergens (www.AllergenOnline.org and NCBI Protein) and toxins (NCBI Protein). The results did not identify potential risks of allergy and toxicity to either HRAP or PFLP proteins expressed in the GM bananas that might suggest potential health risks to humans. We recognize that additional tests including stability of these proteins in pepsin assay, nutrient analysis and possibly an acute rodent toxicity assay may be required by national regulatory authorities

CGSpace

Dynamic use of multiple parameter sets in sequence alignment

Author: Brutlag Douglas
Huang Xiaoqiu
Huang Xiaoqiu
Publication venue: Oxford University Press
Publication date: 19/12/2006
Field of study

The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257 716 pairs of homologous sequences from 100 protein families. On 168 475 of the 257 716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM

Digital Repository @ Iowa State University (ISU)

CiteSeerX

Crossref

PubMed Central

A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships

Author: A Barré
A Kuzniar
AM Altenhoff
Aurélien Barré
C Notredame
Christine Citti
Claire Lemaitre
EV Koonin
Florence Tardy
François Thiaucourt
GA Singer
JE Coronado
JT Reese
K Brick
M Dayhoff
MJ Gardner
O Bastien
P Sirand-Pugnet
Pascal Sirand-Pugnet
Patricia Thébault
RD Finn
S Henikoff
S Henikoff
S Henikoff
S Pereyre
SF Altschul
SF Altschul
T Fawcett
T Gabaldón
TF Smith
U Paila
WR Pearson
WR Pearson
X Robin
Y Yu
Y Yu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to accurately estimate alignment scores and statistical significance with sequences sharing marked compositional biases. Results We present a general and simple methodology to build matrices that are especially fitted to the compositional bias of proteins. Our approach is inspired from the one used to build the BLOSUM matrices and is based on learning substitution and amino acid frequencies on real sequences with the corresponding compositional bias. We applied it to the large scale comparison of Mollicute AT-rich genomes. The new matrix, MOLLI60, was used to predict pairwise orthology relationships, as well as homolog families among 24 Mollicute genomes. We show that this new matrix enables to better discriminate between true and false orthologs and improves the clustering of homologous proteins, with respect to the use of the classical matrix BLOSUM62. Conclusions We show in this paper that well-fitted matrices can improve the predictions of orthologous and homologous relationships among proteins with a similar compositional bias. With the ever-increasing number of sequenced genomes, our approach could prove valuable in numerous comparative studies focusing on atypical genomes.</p

HAL-CentraleSupelec

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

Parameters for accurate genome alignment

Author: A Morgulis
A Morgulis
A Schwartz
A Stark
B Paten
CH Yuh
CN Dewey
D Gusfield
D Karolchik
D States
DA Pollard
E Kim
EH Margulies
F Chiaromonte
G Benson
G Lunter
G Lunter
I Holmes
J Ruan
J Wang
JC Wootton
JE Janecka
JO Kriegs
JT Reese
KD Pruitt
KM Wong
LA Newberg
LE Carvalho
M Brudno
M Hamada
Martin C Frith
MC Frith
Michiaki Hamada
MS Waterman
Paul Horton
PP Gardner
R Durbin
RC Friedman
RK Bradley
S Karlin
S Kumar
S Miyazawa
S Schwartz
S Sheetlin
SF Altschul
SF Altschul
TJ Treangen
W Huang
WJ Kent
WJ Kent
YK Yu
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. Results We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases. Conclusions These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Optimal Sequence Alignment and Its Relationship with Phylogeny

Author: Atoosa Ghahremani
Mahmood A. Mahdavi
Publication venue: 'IntechOpen'
Publication date: 02/11/2011
Field of study

IntechOpen