Location of Repository

A two-base encoded DNA sequence alignment problem in computational biology

By Jen-Mei Chang, Chiaka Drakes, Erya Huang, Deidrey Langat, Joseph McGrath, Mark Morabito, Jose Pacheco, Nancy Rodriguez, Daniel Salazar, Rao Vemuri, Man Vu, Hem Wadhar and Qin Wu

Abstract

The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics. The primary objective of the "sequence alignment" problem is to search for a new algorithm that facilitates the use of two-base encoded data for large-scale re-sequencing projects. This algorithm should be able to perform local sequence alignment as well as error detection and correction in a reliable and systematic manner, enabling the direct comparison of encoded DNA sequence reads to a candidate reference DNA sequence.\ud \ud We will first briefly review two well-known sequence alignment approaches and provide a rudimentary improvement for implementation on parallel systems. Then, we carefully examin a unique sequencing technique known as the SOLiDTM System that can be implemented, and follow by the results from the global and local sequence alignment.\ud \ud In this report, the team presents an explanation of the algorithms for color space sequence data from the high-throughput re-sequencing technology and a theoretical parallel approach to the dynamic programming method for global and local alignment. The combination of the di-base approach and dynamic programming provides a possible viewpoint for large-scale re-sequencing projects. We anticipate the use of distributed computing to be the next-generation engine for large-scale problems like such

Topics: Medical and pharmaceutical
Year: 2009
OAI identifier: oai:generic.eprints.org:285/core70

Suggested articles

Preview

Citations

  1. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. doi
  2. (2008). A theoretical understanding of 2 base color codes and its application to annotation, error detection, and error correction.
  3. (1981). Identification of common molecular subsequences. doi
  4. (2007). Introduction to Computational Genomics: A case studies approach. doi
  5. (2008). Principles of di-base sequencing and the advantages of color space analysis in the SOLiD system.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.