1 research outputs found

    Application of Neural Networks to Biological Data Mining for Automatic Species Identification

    No full text
    Abstract. The paper aims at designing a scheme for automatic identification of a species from its genome sequence. A set of 64 three-tuple keywords is first generated using the four types of bases: A, T, C and G. These keywords are searched on N randomly sampled genome sequences, each of a given length (10,000 elements) and the frequency count for each of the 43=64 keywords is performed to obtain a DNA-descriptor for each sample. Principal Component analysis is then employed on the DNA-descriptors for N sampled instances. The principal component analysis yields a unique feature descriptor for identifying the species from its genome sequence. The variance of the descriptors for a given genome sequence being negligible, the proposed scheme finds extensive applications in automatic species identification. Next, a computational map is trained by the Self-Organizing Feature Map algorithm using the DNAdescriptors from different species as the training inputs. The map is shown to provide an easier technique for recognition and classification of a species based on its genomic data.
    corecore