Novel algorithms for motif discovery in bio-sequence datasets

Balla, Sudha

Novel algorithms for motif discovery in bio-sequence datasets

Authors: Sudha Balla
Publication date: 1 January 2007
Publisher: OpenCommons@UConn

Abstract

A significant growth in the volume of bio-molecular sequence data (DNA, RNA and protein sequences) over the past decade calls for novel computational techniques to extract meaningful information from such data. Existing methods to extract such information predominantly consist of identifying patterns or motifs, for example, repeated substrings of bio-sequences, conserved substrings in a group of homologous protein sequences, or similar substrings in a set of DNA sequences. Identifying such motifs has applications in, to name a few, understanding gene function, human disease, and identifying potential therapeutic drug targets. Several variants of the motif discovery problem could be identified in the literature and numerous algorithms have been proposed for such variants. In this research work, we propose novel algorithms, significantly different from the techniques adopted so far by the existing algorithms, to address salient problems in the domain of molecular biology that require discovering motifs in a set of bio-sequences. The proposed algorithms employ basic sorting techniques and simple data structures such as arrays and linked lists, and have been shown to perform better in practice than many of the previously known algorithms, when applied to synthetic and real biological datasets.

Similar works

Full text

Available Versions

DigitalCommons@UConn

oai:digitalcommons.lib.uconn.e...

Last time updated on 30/09/2023

OpenCommons at University of Connecticut

oai:digitalcommons.lib.uconn.e...

Last time updated on 19/09/2023