8,154 research outputs found
Efficient Decision Trees for Multi-class Support Vector Machines Using Entropy and Generalization Error Estimation
We propose new methods for Support Vector Machines (SVMs) using tree
architecture for multi-class classi- fication. In each node of the tree, we
select an appropriate binary classifier using entropy and generalization error
estimation, then group the examples into positive and negative classes based on
the selected classi- fier and train a new classifier for use in the
classification phase. The proposed methods can work in time complexity between
O(log2N) to O(N) where N is the number of classes. We compared the performance
of our proposed methods to the traditional techniques on the UCI machine
learning repository using 10-fold cross-validation. The experimental results
show that our proposed methods are very useful for the problems that need fast
classification time or problems with a large number of classes as the proposed
methods run much faster than the traditional techniques but still provide
comparable accuracy
Role of Class-specific Features in Various Classification Frameworks for Human Epithelial (HEp-2) Cell Images
The antinuclear antibody detection with human epithelial cells is a popular
approach for autoimmune diseases diagnosis. The manual evaluation demands time,
effort and capital, and automation in screening can greatly aid the physicians
in these respects. In this work, we employ simple, efficient and visually more
interpretable, class-specific features which defined based on the visual
characteristics of each class. We believe that defining features with a good
visual interpretation, is indeed important in a scenario, where such an
approach is used in an interactive CAD system for pathologists. Considering
that problem consists of few classes, and our rather simplistic feature
definitions, frameworks can be structured as hierarchies of various binary
classifiers. These variants include frameworks which are earlier explored and
some which are not explored for this task. We perform various experiments which
include traditional texture features and demonstrate the effectiveness of
class-specific features in various frameworks. We make insightful comparisons
between different types of classification frameworks given their silent aspects
and pros and cons over each other. We also demonstrate an experiment with only
intermediates samples for testing. The proposed work yields encouraging results
with respect to the state-of-the-art and highlights the role of class-specific
features in different classification frameworks
Enhancing Multi-Class Classification of Random Forest using Random Vector Functional Neural Network and Oblique Decision Surfaces
Both neural networks and decision trees are popular machine learning methods
and are widely used to solve problems from diverse domains. These two
classifiers are commonly used base classifiers in an ensemble framework. In
this paper, we first present a new variant of oblique decision tree based on a
linear classifier, then construct an ensemble classifier based on the fusion of
a fast neural network, random vector functional link network and oblique
decision trees. Random Vector Functional Link Network has an elegant closed
form solution with extremely short training time. The neural network partitions
each training bag (obtained using bagging) at the root level into C subsets
where C is the number of classes in the dataset and subsequently, C oblique
decision trees are trained on such partitions. The proposed method provides a
rich insight into the data by grouping the confusing or hard to classify
samples for each class and thus, provides an opportunity to employ fine-grained
classification rule over the data. The performance of the ensemble classifier
is evaluated on several multi-class datasets where it demonstrates a superior
performance compared to other state-of- the-art classifiers.Comment: 8 pages, 5 figure
A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities
Forensic Document Analysis (FDA) addresses the problem of finding the
authorship of a given document. Identification of the document writer via a
number of its modalities (e.g. handwriting, signature, linguistic writing style
(i.e. stylome), etc.) has been studied in the FDA state-of-the-art. But, no
research is conducted on the fusion of stylome and signature modalities. In
this paper, we propose such a bimodal FDA system (which has vast applications
in judicial, police-related, and historical documents analysis) with a focus on
time-complexity. The proposed bimodal system can be trained and tested with
linear time complexity. For this purpose, we first revisit Multinomial Na\"ive
Bayes (MNB), as the best state-of-the-art linear-complexity authorship
attribution system and, then, prove its superior accuracy to the well-known
linear-complexity classifiers in the state-of-the-art. Then, we propose a fuzzy
version of MNB for being fused with a state-of-the-art well-known
linear-complexity fuzzy signature recognition system. For the evaluation
purposes, we construct a chimeric dataset, composed of signatures and textual
contents of different letters. Despite its linear-complexity, the proposed
multi-biometric system is proven to meaningfully improve its state-of-the-art
unimodal counterparts, regarding the accuracy, F-Score, Detection Error
Trade-off (DET), Cumulative Match Characteristics (CMC), and Match Score
Histograms (MSH) evaluation metrics
Building an Effective Intrusion Detection System using Unsupervised Feature Selection in Multi-objective Optimization Framework
Intrusion Detection Systems (IDS) are developed to protect the network by
detecting the attack. The current paper proposes an unsupervised feature
selection technique for analyzing the network data. The search capability of
the non-dominated sorting genetic algorithm (NSGA-II) has been employed for
optimizing three different objective functions utilizing different information
theoretic measures including mutual information, standard deviation, and
information gain to identify mutually exclusive and a high variant subset of
features. Finally, the Pareto optimal front of the different optimal feature
subsets are obtained and these feature subsets are utilized for developing
classification systems using different popular machine learning models like
support vector machines, decision trees and k-nearest neighbour (k=5)
classifier etc. We have evaluated the results of the algorithm on KDD-99,
NSL-KDD and Kyoto 2006+ datasets. The experimental results on KDD-99 dataset
show that decision tree provides better results than other available
classifiers. The proposed system obtains the best results of 99.78% accuracy,
99.27% detection rate and false alarm rate of 0.2%, which are better than all
the previous results for KDD dataset. We achieved an accuracy of 99.83% for 20%
testing data of NSL-KDD dataset and 99.65% accuracy for 10-fold
cross-validation on Kyoto dataset. The most attractive characteristic of the
proposed scheme is that during the selection of appropriate feature subset, no
labeled information is utilized and different feature quality measures are
optimized simultaneously using the multi-objective optimization framework.Comment: 3 figure
A New Approach in Persian Handwritten Letters Recognition Using Error Correcting Output Coding
Classification Ensemble, which uses the weighed polling of outputs, is the
art of combining a set of basic classifiers for generating high-performance,
robust and more stable results. This study aims to improve the results of
identifying the Persian handwritten letters using Error Correcting Output
Coding (ECOC) ensemble method. Furthermore, the feature selection is used to
reduce the costs of errors in our proposed method. ECOC is a method for
decomposing a multi-way classification problem into many binary classification
tasks; and then combining the results of the subtasks into a hypothesized
solution to the original problem. Firstly, the image features are extracted by
Principal Components Analysis (PCA). After that, ECOC is used for
identification the Persian handwritten letters which it uses Support Vector
Machine (SVM) as the base classifier. The empirical results of applying this
ensemble method using 10 real-world data sets of Persian handwritten letters
indicate that this method has better results in identifying the Persian
handwritten letters than other ensemble methods and also single
classifications. Moreover, by testing a number of different features, this
paper found that we can reduce the additional cost in feature selection stage
by using this method.Comment: Journal of Advances in Computer Researc
Automated detection and classification of cryptographic algorithms in binary programs through machine learning
Threats from the internet, particularly malicious software (i.e., malware)
often use cryptographic algorithms to disguise their actions and even to take
control of a victim's system (as in the case of ransomware). Malware and other
threats proliferate too quickly for the time-consuming traditional methods of
binary analysis to be effective. By automating detection and classification of
cryptographic algorithms, we can speed program analysis and more efficiently
combat malware.
This thesis will present several methods of leveraging machine learning to
automatically discover and classify cryptographic algorithms in compiled binary
programs.
While further work is necessary to fully evaluate these methods on real-world
binary programs, the results in this paper suggest that machine learning can be
used successfully to detect and identify cryptographic primitives in compiled
code. Currently, these techniques successfully detect and classify
cryptographic algorithms in small single-purpose programs, and further work is
proposed to apply them to real-world examples.Comment: Thesis submitted in partial fulfillment of MSE CS degree at Johns
Hopkins University, 25 page
Ontology-supported processing of clinical text using medical knowledge integration for multi-label classification of diagnosis coding
This paper discusses the knowledge integration of clinical information
extracted from distributed medical ontology in order to ameliorate a machine
learning-based multi-label coding assignment system. The proposed approach is
implemented using a decision tree based cascade hierarchical technique on the
university hospital data for patients with Coronary Heart Disease (CHD). The
preliminary results obtained show a satisfactory finding.Comment: IEEE Publication format, ISSN 1947 5500,
http://sites.google.com/site/ijcsis
Image Classification on IoT Edge Devices: Profiling and Modeling
With the advent of powerful, low-cost IoT systems, processing data closer to
where the data originates, known as edge computing, has become an increasingly
viable option. In addition to lowering the cost of networking infrastructures,
edge computing reduces edge-cloud delay, which is essential for
mission-critical applications. In this paper, we show the feasibility and study
the performance of image classification using IoT devices. Specifically, we
explore the relationships between various factors of image classification
algorithms that may affect energy consumption such as dataset size, image
resolution, algorithm type, algorithm phase, and device hardware. Our
experiments show a strong, positive linear relationship between three predictor
variables, namely model complexity, image resolution, and dataset size, with
respect to energy consumption. In addition, in order to provide a means of
predicting the energy consumption of an edge device performing image
classification, we investigate the usage of three machine learning algorithms
using the data generated from our experiments. The performance as well as the
trade offs for using linear regression, Gaussian process, and random forests
are discussed and validated. Our results indicate that the random forest model
outperforms the two former algorithms, with an R-squared value of 0.95 and 0.79
for two different validation datasets
Hierarchical Routing Mixture of Experts
In regression tasks the distribution of the data is often too complex to be
fitted by a single model. In contrast, partition-based models are developed
where data is divided and fitted by local models. These models partition the
input space and do not leverage the input-output dependency of
multimodal-distributed data, and strong local models are needed to make good
predictions. Addressing these problems, we propose a binary tree-structured
hierarchical routing mixture of experts (HRME) model that has classifiers as
non-leaf node experts and simple regression models as leaf node experts. The
classifier nodes jointly soft-partition the input-output space based on the
natural separateness of multimodal data. This enables simple leaf experts to be
effective for prediction. Further, we develop a probabilistic framework for the
HRME model, and propose a recursive Expectation-Maximization (EM) based
algorithm to learn both the tree structure and the expert models. Experiments
on a collection of regression tasks validate the effectiveness of our method
compared to a variety of other regression models.Comment: 9 pages,4 figure
- …