Search CORE

4 research outputs found

Using Phylogeny to Improve Genome-Wide Distant Homology Recognition

Author: Abeln Sanne
Deane Charlotte M
Teubner Carlo
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition methods. We developed a method that detects false positives made by these genome-wide structural assignment experiments by identifying isolated occurrences. The method was tested using two sets of assignments, generated by SUPERFAMILY and PSI-BLAST, on 150 completed genomes. A phylogeny of these genomes was built and a parsimony algorithm was used to identify isolated occurrences by detecting occurrences that cause a gain at leaf level. Isolated occurrences tend to have high e-values, and in both sets of assignments, a sudden increase in isolated occurrences is observed for e-values >10(−8) for SUPERFAMILY and >10(−4) for PSI-BLAST. Conditions to predict false positives are based on these results. Independent tests confirm that the predicted false positives are indeed more likely to be incorrectly assigned. Evaluation of the predicted false positives also showed that the accuracy of profile-based fold-recognition methods might depend on secondary structure content and sequence length. We show that false positives generated by fold-recognition methods can be identified by considering structural occurrence patterns on completed genomes; occurrences that are isolated within the phylogeny tend to be less reliable. The method provides a new independent way to examine the quality of fold assignments and may be used to improve the output of any genome-wide fold assignment method

CiteSeerX

VU Research Portal

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison

Author: Dov Joseph Stekel
Fukun Zhao
Lihua Li
Michael Zhang
Qi Dai
Xiaoqing Liu
Yuhua Yao
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Using phylogeny to improve genome wide distant homology recognition

Author
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2005
Field of study

Crossref

Fast, sensitive protein sequence searches using iterative pairwise comparison of hidden Markov models

Author: Remmert Michael
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2011
Field of study

Digitale Hochschulschriften der LMU

MPG.PuRe