1 research outputs found
Nucleotide Sequence Similarity Search Using Techniques from Content-Based Image Retrieval
The amount of DNA data continues to increase exponentially as a result of high-
throughput next generation sequencing. Current state-of-the-art tools for nucleotide
sequence similarity search are not equipped to deal with this growth and new
thinking is needed to tackle the rising scalability challenges.
This thesis investigates the experimental approach of translating DNA sequences
into images and applying state of the art techniques from the field of content-
based image retrieval to index and search the resulting images. The challenges
of translating DNA sequences into images are discussed and two algorithms for
image generation are proposed. We look into the different feature descriptors that
are available and evaluate them in the context of the generated images. Lastly the
approach as a whole is evaluated with the mean average precision metric using
BLAST as the gold standard reference.
The results show that the proposed approach is not successful in approaching
BLAST in retrieval performance, but offers a significant reduce in index sizes
and thus better performance and scalability on large DNA databases