2,323 research outputs found
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Efficient classification using parallel and scalable compressed model and Its application on intrusion detection
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger volume of
training data with scalability, MapReduce based parallelization approach is
then implemented and evaluated for each step of the model compression process
abovementioned, on which common but efficient classification methods can be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the
classification using the compressed model proposed can effectively speed up the
detection procedure at up to 184 times, most importantly at the cost of a
minimal accuracy difference with less than 1% on average
Classification of microarray gene expression cancer data by using artificial intelligence methods
Günümüzde bilgisayar teknolojilerinin gelişmesi ile birçok alanda yapılan çalışmaları etkilemiştir. Moleküler biyoloji ve bilgisayar teknolojilerinde meydana gelen gelişmeler biyoinformatik adlı bilimi ortaya çıkarmıştır. Biyoinformatik alanında meydana gelen hızlı gelişmeler, bu alanda çözülmeyi bekleyen birçok probleme çözüm olma yolunda büyük katkılar sağlamıştır. DNA mikroarray gen ekspresyonlarının sınıflandırılması da bu problemlerden birisidir. DNA mikroarray çalışmaları, biyoinformatik alanında kullanılan bir teknolojidir. DNA mikroarray veri analizi, kanser gibi genlerle alakalı hastalıkların teşhisinde çok etkin bir rol oynamaktadır. Hastalık türüne bağlı gen ifadeleri belirlenerek, herhangi bir bireyin hastalıklı gene sahip olup olmadığı büyük bir başarı oranı ile tespit edilebilir. Bireyin sağlıklı olup olmadığının tespiti için, mikroarray gen ekspresyonları üzerinde yüksek performanslı sınıflandırma tekniklerinin kullanılması büyük öneme sahiptir.
DNA mikroarray’lerini sınıflandırmak için birçok yöntem bulunmaktadır. Destek Vektör Makinaları, Naive Bayes, k-En yakın Komşu, Karar Ağaçları gibi birçok istatistiksel yöntemler yaygın olarak kullanlmaktadır. Fakat bu yöntemler tek başına kullanıldığında, mikroarray verilerini sınıflandırmada her zaman yüksek başarı oranları vermemektedir. Bu yüzden mikroarray verilerini sınıflandırmada yüksek başarı oranları elde etmek için yapay zekâ tabanlı yöntemlerin de kullanılması yapılan çalışmalarda görülmektedir.
Bu çalışmada, bu istatistiksel yöntemlere ek olarak yapay zekâ tabanlı ANFIS gibi bir yöntemi kullanarak daha yüksek başarı oranları elde etmek amaçlanmıştır. İstatistiksel sınıflandırma yöntemleri olarak K-En Yakın Komşuluk, Naive Bayes ve Destek Vektör Makineleri kullanılmıştır. Burada Göğüs ve Merkezi Sinir Sistemi kanseri olmak üzere iki farklı kanser veri seti üzerinde çalışmalar yapılmıştır.
Sonuçlardan elde edilen bilgilere göre, genel olarak yapay zekâ tabanlı ANFIS tekniğinin, istatistiksel yöntemlere göre daha başarılı olduğu tespit edilmiştir
An Advanced Conceptual Diagnostic Healthcare Framework for Diabetes and Cardiovascular Disorders
The data mining along with emerging computing techniques have astonishingly
influenced the healthcare industry. Researchers have used different Data Mining
and Internet of Things (IoT) for enrooting a programmed solution for diabetes
and heart patients. However, still, more advanced and united solution is needed
that can offer a therapeutic opinion to individual diabetic and cardio
patients. Therefore, here, a smart data mining and IoT (SMDIoT) based advanced
healthcare system for proficient diabetes and cardiovascular diseases have been
proposed. The hybridization of data mining and IoT with other emerging
computing techniques is supposed to give an effective and economical solution
to diabetes and cardio patients. SMDIoT hybridized the ideas of data mining,
Internet of Things, chatbots, contextual entity search (CES), bio-sensors,
semantic analysis and granular computing (GC). The bio-sensors of the proposed
system assist in getting the current and precise status of the concerned
patients so that in case of an emergency, the needful medical assistance can be
provided. The novelty lies in the hybrid framework and the adequate support of
chatbots, granular computing, context entity search and semantic analysis. The
practical implementation of this system is very challenging and costly.
However, it appears to be more operative and economical solution for diabetes
and cardio patients.Comment: 11 PAGE
CLASSIFICATION OF KIDNEY DISEASE USING GENETIC MODIFIED KNN AND ARTIFICIAL BEE COLONY ALGORITHM
The health care system is currently improving with the development of intelligent artificial systems in detecting diseases. Early detection of kidney disease is essential by recognizing symptoms to prevent more severe damages. This study introduces a classification system for kidney diseases using the Artificial Bee Colony (ABC) algorithm and genetically modified K-Nearest Neighbor (KNN). ABC algorithm is used as a feature selection to determine relevant symptoms used in influencing kidney disease and Genetic modified KNN used for classification. This research consists of 3 stages: pre-processing, feature selection, and classification. However, it focuses on the pre-processing stage of chronic kidney disease using 400 records with 24 attributes for the feature selection and classification. Kidney disease data is classified into two classes, namely chronic kidney disease and not chronic kidney disease. Furthermore, the performance of the proposed method is compared with other methods. The result showed that an accuracy of 98.27% was obtained by dividing the dataset into 280 training and 120 test data
- …