An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Abstract

Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly ar-rangements of motifs that are responsible for similar ex-pression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA se-quences or require explicit specification of sequence lengths in advance. The challenge is to find longer se-quences without specifying sequence lengths in ad-vance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time

Similar works

Full text

thumbnail-image
oaioai:CiteSeerX.psu:10.1...Last time updated on 10/30/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.