Recent advances in next-generation sequencing technologies have facilitated
the use of deoxyribonucleic acid (DNA) as a novel covert channels in
steganography. There are various methods that exist in other domains to detect
hidden messages in conventional covert channels. However, they have not been
applied to DNA steganography. The current most common detection approaches,
namely frequency analysis-based methods, often overlook important signals when
directly applied to DNA steganography because those methods depend on the
distribution of the number of sequence characters. To address this limitation,
we propose a general sequence learning-based DNA steganalysis framework. The
proposed approach learns the intrinsic distribution of coding and non-coding
sequences and detects hidden messages by exploiting distribution variations
after hiding these messages. Using deep recurrent neural networks (RNNs), our
framework identifies the distribution variations by using the classification
score to predict whether a sequence is to be a coding or non-coding sequence.
We compare our proposed method to various existing methods and biological
sequence analysis methods implemented on top of our framework. According to our
experimental results, our approach delivers a robust detection performance
compared to other tools