2 research outputs found
Delineation of Techniques to implement on the enhanced proposed model using data mining for protein sequence classification
In post genomic era with the advent of new technologies a huge amount of
complex molecular data are generated with high throughput. The management of
this biological data is definitely a challenging task due to complexity and
heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in
biological domain has made its inventory success. Discovering new knowledge
from the biological data is a major challenge in data mining technique. The
novelty of the proposed model is its combined use of intelligent techniques to
classify the protein sequence faster and efficiently. Use of FFT, fuzzy
classifier, String weighted algorithm, gram encoding method, neural network
model and rough set classifier in a single model and in an appropriate place
can enhance the quality of the classification system.Thus the primary challenge
is to identify and classify the large protein sequences in a very fast and easy
but intellectual way to decrease the time complexity and space complexity.Comment: 8 pages, 1 figure
Accuracy of String Kernels for Protein Sequence Classification
Abstract. Determining protein sequence similarity is an important task for protein classification and homology detection. Typically this may be done using sequence alignment algorithms, yet fast and accurate alignment-free kernel based classifiers exist. Viewing sequences as a “bag of words”, we test a simple weighted string kernel, investigating the effects of k-mer length, sequence length and choice of weighting. We also extend the kernel to operate on the k-mer frequency representation of a sequence rather than the “bag of words ” representation