1 research outputs found
Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm
Microbial clades modeling is a challenging problem in biology based on
microarray genome sequences, especially in new species gene isolates discovery
and category. Marker family genome sequences play important roles in describing
specific microbial clades within species, a framework of support vector machine
(SVM) based microbial species classification with N-best algorithm is
constructed to classify the centroid marker genome fragments randomly generated
from marker genome sequences on MetaRef. A time series feature extraction
method is proposed by segmenting the centroid gene sequences and mapping into
different dimensional spaces. Two ways of data splitting are investigated
according to random splitting fragments along genome sequence (DI) , or
separating genome sequences into two parts (DII).Two strategies of fragments
recognition tasks, dimension-by-dimension and sequence--by--sequence, are
investigated. The k-mer size selection, overlap of segmentation and effects of
random split percents are also discussed. Experiments on 12390 maker genome
sequences belonging to marker families of 17 species from MetaRef show that,
both for DI and DII in dimension-by-dimension and sequence-by-sequence
recognition, the recognition accuracy rates can achieve above 28\% in top-1
candidate, and above 91\% in top-10 candidate both on training and testing sets
overall.Comment: 17 pages, 59 figure