A new generalized growth threshold for dynamic SOM for comparing average mutual information and oligonucleotide frequency as a species signature

Abstract

The average mutual information (AMI) known from information theory has been reported as a strong genome signature in some literature and we have reported the use of oligonucleotide frequencies as a genome signature. In this work we improve the use of AMI as a training feature for Growing Self Organising Maps (GSOM). Although the range of k is considered as an important parameter in AMI, no standard range for k is proposed. Our first contribution is to introduce a new growth threshold (GT) for GSOM and use it to identify the best range of k for clustering prokaryotic sequence fragments of 10 kb. We then, compare the results using the best k range of AMI against our previously published results using oligonucleotide frequencies. These experiments showed that the newly proposed GT equation makes GSOM able to efficiently and effectively analyse different data features for the same data. The results also emphasize our use of oligonucleotide frequencies as opposed to AMI

    Similar works

    Full text

    thumbnail-image