A number of methods have been proposed over the last decade for encoding
information using deoxyribonucleic acid (DNA), giving rise to the emerging area
of DNA data embedding. Since a DNA sequence is conceptually equivalent to a
sequence of quaternary symbols (bases), DNA data embedding (diversely called
DNA watermarking or DNA steganography) can be seen as a digital communications
problem where channel errors are tantamount to mutations of DNA bases.
Depending on the use of coding or noncoding DNA hosts, which, respectively,
denote DNA segments that can or cannot be translated into proteins, DNA data
embedding is essentially a problem of communications with or without side
information at the encoder. In this paper the Shannon capacity of DNA data
embedding is obtained for the case in which DNA sequences are subject to
substitution mutations modelled using the Kimura model from molecular evolution
studies. Inferences are also drawn with respect to the biological implications
of some of the results presented.Comment: 22 pages, 13 figures; preliminary versions of this work were
presented at the SPIE Media Forensics and Security XII conference (January
2010) and at the IEEE ICASSP conference (March 2010