Article thumbnail
Location of Repository

The construction of DNA codes using a computer algebra system

By Niema Ali Aboluion


Coding theory has several applications in Genetics and Bioengineering. This thesis concentrates on a specific application from Computational Biology. This concerns the construction of new DNA codes which satisfy certain combinatorial constraints, using an alphabet of four symbols. The interest in these codes arises because it is possible to synthesise short single strands of DNA known as oligonucleotides. The codes can be useful in the design of these oligonucleotides. For example, the codes are used in DNA computing, as bar codes in molecular libraries and in microarray technologies. The computer algebra system Magma, which deals successfully with coding theory computation, is applied initially to the construction of DNA codes sat- isfying a GC-content constraint and a minimum Hamming distance constraint. The constraints are specified to avoid unwanted hybridizations and to ensure uniform melting temperatures. Additionally, another constraint, known as a reverse-complement constraint, is added to further prevent unwanted hybridiza- tions. This additional constraint is studied using involutions in a permutation group. Codes constructed in this thesis are derived from linear codes over GF(4) and Z4 and additive codes over GF(4). Previous approaches to the construction of these codes are extended in several ways. Longer codes are constructed, the examination of cyclic and extended cyclic codes is more comprehensive, and cosets of codes are considered. In addition, attention is paid to the mapping from field or ring elements to the DNA nucleotides; different mappings can give different lower bounds. Further improvements have been made after the tech- niques of shortening and puncturing are applied to the table of best codes, and also by searching for codes in the tables that have an all-ones vector in their dual. The use of a database of best known linear codes is also considered. In many cases codes are obtained which are larger than the best codes currently known. In the case of codes of length greater than twenty, linear DNA codes have not been constructed previously and so all codes obtained are the best known re- sults. Generator polynomials are given for the codes constructed. Coset leaders are also given in cases where cosets of linear codes are used. Thus it is possible for the reader to construct the codes without repeating the work presented in the thesis. Additionally, files of codewords are available online when the codes constructed are the best known codes and have fewer than 50000 codewords

Topics: Genetic engineering, Biological systems, 572.86
Publisher: University of Glamorgan
Year: 2012
OAI identifier:
Provided by: Glamorgan Dspace

Suggested articles


  1. (2008). [41] The University of Sydney. MAGMA Computational Algebra System [online]. Sydney: Computer Algebra Group. Available at: URL:
  2. (1997). 50 [29] Maplesoft. Maple Product History [online]. Canada: Maplesoft. Available at: URL:
  3. (1999). Application of Abstract Algebra with Maple. Boca Racon,
  4. Bounds on the minimum distance of linear codes. doi
  5. (2000). Class prediction and discovery using gene expression data. doi
  6. (2010). Code tables: Bounds on the parameters of various types of codes, Available at: URL: [accessed 06
  7. (1970). Coding theory. doi
  8. Construction of constant GC-content DNA codes via a variable neighbourhood search algorithm, doi
  9. Cosets of extended cyclic codes over Z4 have been computed for 4
  10. Demonstration of a word design strategy for DNA computing on surfaces, Nucleic Acids Res.
  11. (2006). Discovering mathematics with Magma, doi
  12. DNA molecule provides a computing machine with both data and fuel. doi
  13. (2001). DNA sequences and quaternary cyclic codes, doi
  14. DNA solution of hard computational problems. doi
  15. DNA word design strategy for creating sets of non-interacting oligonucleotides for DNA microarrays. doi
  16. (1997). Encoded combinatorial chemistry. doi
  17. (1996). Genetic search of reliable encodings for DNA-based computation. doi
  18. Hybrid randomised neighbourhoods improve stochastic local search for DNA Code design. doi
  19. (1997). Introduction to Maple. Second Edition. doi
  20. (1997). KBMAG Knuth-Bendix in Monoids and Automatic Groups.
  21. (2006). Linear and nonlinear constructions of DNA codes with Hamming distance d and constant GC-content, doi
  22. Linear constructions for DNA codes. doi
  23. (1998). Maple V Learning Guide. doi
  24. Metaheuristics for the construction of constant GC-content DNA codes, doi
  25. Modular and p-adic cyclic codes. Designs, Codes and Cryptography. doi
  26. Molecular computation of solutions to combinatorial problems. doi
  27. (1997). On the computation of conjugacy classes in permutation groups. doi
  28. Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy. doi
  29. Quantum error correction via codes over GF(4), doi
  30. spatially addressable parallel chemical synthesis. doi
  31. Stochastic local search algorithms for DNA word design. doi
  32. Table 10.30: Generator polynomials and coset leaders L for codes with more than 4 codewords, 15 ≤ n ≤ 20 . 105 (n,d) generator polynomials and oset leaders L (15; 5) !p(x)+ q(x)
  33. Table 10.49: Best lower bounds for AGC,RC4 (n, d, w) before shortening or puncturing for 3
  34. Table 10.51: Best lower bounds for AGC,RC4 (n, d, w) after shortening or puncturing for 3
  35. Table 10.67: Best lower bounds constructed for AGC,RC4 (n, d, w) for linear codes for 3
  36. (2008). Tables of lower bounds for DNA codes with constant GC-content, Available at: URL: [accessed 15
  37. (1997). The Art of Computer Programming, doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.