Skip to main content
Article thumbnail
Location of Repository

Statistical distance between texts and filtration methods in sequence comparison

By Pavel A. Pevzner

Abstract

Upon searching local similarities in long sequences, the necessity of a 'rapid' similarity search becomes acute. Quadratic complexity of dynamic programming algorithms forces the employment of filtration methods that allow elimination of the sequences with a low similarity level. The paper is devoted to the theoretical substantiations of the filtration method based on the statistical distance between texts. The notion of the filtration efficiency is introduced and the efficiency of several filters is estimated. It is shown that the efficiency of the statistical ltuple filtration upon DNA database search is associated with a potential extension of the original four-letter alphabet and grows exponentially with increasing I. The formula that allows one to estimate the filtration parameters is presented

Year: 1992
OAI identifier: oai:CiteSeerX.psu:10.1.1.134.147
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://proteomics.bioprojects.... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.