Search CORE

5 research outputs found

Prediction of Moisture Content for Congou Black Tea Withering Leaves Using Image Features and Nonlinear Method

Author: Dong Chunwang
Hao Guoshuang
Hu Bin
Jiang Yongwen
Liang Gaozhen
Yuan Haibo
Zhu Hongkai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Copenhagen University Research Information System

Testing statistical significance scores of sequence comparison methods with structure similarity

Author: AA Schaffer
AD Kester
EV Kriventseva
G Salton
GA Price
HS Booth
J Park
Jack AM Leunissen
Jacob de Vlieg
JJ Codani
JP Comet
JT Reese
M Gribskov
O Bastien
P Agarwal
Peter MA Groenen
R Apweiler
RF Doolittle
S Henikoff
SE Brenner
SE Brenner
SF Altschul
T Hulsen
T Rognes
TF Smith
Tim Hulsen
WR Pearson
WR Pearson
WR Pearson
WR Pearson
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Radboud Repository

Measuring Global Credibility with Application to Local Sequence Alignment

Author: Andrey Rzhetsky
B-JM Webb
Bobbie-Jo M. Webb-Robertson
BP Carlin
C Webber
Charles E. Lawrence
D Naor
DJ Lipman
HS Booth
HT Mevissen
I Holmes
J Zhu
JP Comet
JS Liu
JS Liu
JS Liu
KA Perry
KM Chao
L Yu
LE Carvalho
Lee Ann McCue
M Kendall
M Schlosshauer
M Vingron
M Vingron
M Zuker
ME Dayhoff
ML Tress
MS Waterman
R Durbin
RL Ott
S Henikoff
S Karlin
S Miyazawa
SF Altschul
SF Altschul
TF Smith
W Thompson
WR Pearson
WR Pearson
WR Pearson
Y Ding
YK Yu
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Ribosomal History Reveals Origins of Modern Protein Synthesis

Author: Caetano-Anollés Gustavo
Harish Ajith
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The origin and evolution of the ribosome is central to our understanding of the cellular world. Most hypotheses posit that the ribosome originated in the peptidyl transferase center of the large ribosomal subunit. However, these proposals do not link protein synthesis to RNA recognition and do not use a phylogenetic comparative framework to study ribosomal evolution. Here we infer evolution of the structural components of the ribosome. Phylogenetic methods widely used in morphometrics are applied directly to RNA structures of thousands of molecules and to a census of protein structures in hundreds of genomes. We find that components of the small subunit involved in ribosomal processivity evolved earlier than the catalytic peptidyl transferase center responsible for protein synthesis. Remarkably, subunit RNA and proteins coevolved, starting with interactions between the oldest proteins (S12 and S17) and the oldest substructure (the ribosomal ratchet) in the small subunit and ending with the rise of a modern multi-subunit ribosome. Ancestral ribonucleoprotein components show similarities to in vitro evolved RNA replicase ribozymes and protein structures in extant replication machinery. Our study therefore provides important clues about the chicken-or-egg dilemma associated with the central dogma of molecular biology by showing that ribosomal history is driven by the gradual structural accretion of protein and RNA structures. Most importantly, results suggest that functionally important and conserved regions of the ribosome were recruited and could be relics of an ancient ribonucleoprotein world

Public Library of Science (PLOS)

Lund University Publications

Directory of Open Access Journals

PubMed Central

FigShare

An efficient Z-score algorithm for assessing sequence alignments.

Author: Booth Hilary
Gready Jill
Maindonald John
Wilson Susan
Publication venue: Mary Ann Liebert Inc.
Publication date: 11/12/2015
Field of study

We describe an alternative method for scoring of the pairwise alignment of two biological sequences. Designed to overcome the bias due to the composition of the alignment, it measures the distance (in standard deviations) between the given alignment and the mean value of all other alignments that can be obtained by a permutation of either sequence. We demonstrate that the standard deviation can be calculated efficiently. By concentrating upon the ungapped case, the mean and standard deviation can be calculated exactly and in two steps, the first being O (N) time, where N is the length of the sequence, the second in a fixed number of calculations, i.e., in O (1) time. We argue that this statistic is a more consistent measure than a similarity score based upon a standard scoring matrix. Even in the ungapped case, the statistic proves in many cases to be more accurate than the commonly used (FASTA) (Pearson and Lipman, 1988) gapped Z-score in which the sequence is matched against a random sample of the database. We demonstrate the use of the POZ-score as a secondary filter which screens out several well-known types of false positive, reducing the amount of manual screening to be done by the biologist

The Australian National University