722 research outputs found
A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes
In many classification models, data is discretized to better estimate its
distribution. Existing discretization methods often target at maximizing the
discriminant power of discretized data, while overlooking the fact that the
primary target of data discretization in classification is to improve the
generalization performance. As a result, the data tend to be over-split into
many small bins since the data without discretization retain the maximal
discriminant information. Thus, we propose a Max-Dependency-Min-Divergence
(MDmD) criterion that maximizes both the discriminant information and
generalization ability of the discretized data. More specifically, the
Max-Dependency criterion maximizes the statistical dependency between the
discretized data and the classification variable while the Min-Divergence
criterion explicitly minimizes the JS-divergence between the training data and
the validation data for a given discretization scheme. The proposed MDmD
criterion is technically appealing, but it is difficult to reliably estimate
the high-order joint distributions of attributes and the classification
variable. We hence further propose a more practical solution,
Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute
is discretized separately, by simultaneously maximizing the discriminant
information and the generalization ability of the discretized data. The
proposed MRmD is compared with the state-of-the-art discretization algorithms
under the naive Bayes classification framework on 45 machine-learning benchmark
datasets. It significantly outperforms all the compared methods on most of the
datasets.Comment: Under major revision of Pattern Recognitio
A Decision tree-based attribute weighting filter for naive Bayes
The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness--the assumption that attributes are independent given the class. All of them improve the performance of naïve Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its
run-time complexity and the fact that it maintains the simplicity of the final model
Bayesian network classifiers for categorizing cortical gABAergic interneurons
Abstract
An accepted classification of GABAergic interneurons of the cerebral cortex is a major goal in neuroscience. A recently proposed taxonomy based on patterns of axonal arborization promises to be a pragmatic method for achieving this goal. It involves characterizing interneurons according to five axonal arborization features, called F1–F5, and classifying them into a set of predefined types, most of which are established in the literature.
Unfortunately, there is little consensus among expert neuroscientists regarding the morphological definitions of
some of the proposed types. While supervised classifiers
were able to categorize the interneurons in accordance with
experts’ assignments, their accuracy was limited because
they were trained with disputed labels. Thus, here we automatically classify interneuron subsets with different label reliability thresholds (i.e., such that every cell’s label is backed by at least a certain (threshold) number of experts).
We quantify the cells with parameters of axonal and dendritic morphologies and, in order to predict the type, also with axonal features F1–F4 provided by the experts. Using Bayesian network classifiers, we accurately characterize and classify the interneurons and identify useful predictor variables. In particular, we discriminate among reliable examples of common basket, horse-tail, large basket, and Martinotti cells with up to 89.52 % accuracy, and single out the number of branches at 180 µm from the soma, the convex hull 2D area, and axonal features F1–F4 as especially useful predictors for distinguishing among these types.
These results open up new possibilities for an objective and
pragmatic classification of interneurons
Distribution of Mutual Information from Complete and Incomplete Data
Mutual information is widely used, in a descriptive way, to measure the
stochastic dependence of categorical random variables. In order to address
questions such as the reliability of the descriptive value, one must consider
sample-to-population inferential approaches. This paper deals with the
posterior distribution of mutual information, as obtained in a Bayesian
framework by a second-order Dirichlet prior distribution. The exact analytical
expression for the mean, and analytical approximations for the variance,
skewness and kurtosis are derived. These approximations have a guaranteed
accuracy level of the order O(1/n^3), where n is the sample size. Leading order
approximations for the mean and the variance are derived in the case of
incomplete samples. The derived analytical expressions allow the distribution
of mutual information to be approximated reliably and quickly. In fact, the
derived expressions can be computed with the same order of complexity needed
for descriptive mutual information. This makes the distribution of mutual
information become a concrete alternative to descriptive mutual information in
many applications which would benefit from moving to the inductive side. Some
of these prospective applications are discussed, and one of them, namely
feature selection, is shown to perform significantly better when inductive
mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table
Improvement of the Accuracy of Prediction Using Unsupervised Discretization Method: Educational Data Set Case Study
This paper presents a comparison of the efficacy of unsupervised and supervised discretization methods for educational data from blended learning environment. Naïve Bayes classifier was trained for each discretized data set and comparative analysis of prediction models was conducted. The research goal was to transform numeric features into maximum independent discrete values with minimum loss of information and reduction of classification error. Proposed unsupervised discretization method was based on the histogram distribution and implementation of oversampling technique. The main contribution of this research is improvement of accuracy prediction using the unsupervised discretization method which reduces the effect of ignoring class feature for educational data set
Occam's hammer: a link between randomized learning and multiple testing FDR control
We establish a generic theoretical tool to construct probabilistic bounds for
algorithms where the output is a subset of objects from an initial pool of
candidates (or more generally, a probability distribution on said pool). This
general device, dubbed "Occam's hammer'', acts as a meta layer when a
probabilistic bound is already known on the objects of the pool taken
individually, and aims at controlling the proportion of the objects in the set
output not satisfying their individual bound. In this regard, it can be seen as
a non-trivial generalization of the "union bound with a prior'' ("Occam's
razor''), a familiar tool in learning theory. We give applications of this
principle to randomized classifiers (providing an interesting alternative
approach to PAC-Bayes bounds) and multiple testing (where it allows to retrieve
exactly and extend the so-called Benjamini-Yekutieli testing procedure).Comment: 13 pages -- conference communication type forma
Multi-label and multimodal classifier for affectve states recognition in virtual rehabilitation
Computational systems that process multiple affective states may benefit from explicitly considering the interaction between
the states to enhance their recognition performance. This work proposes the combination of a multi-label classifier, Circular Classifier
Chain (CCC), with a multimodal classifier, Fusion using a Semi-Naive Bayesian classifier (FSNBC), to include explicitly the
dependencies between multiple affective states during the automatic recognition process. This combination of classifiers is applied to a
virtual rehabilitation context of post-stroke patients. We collected data from post-stroke patients, which include finger pressure, hand
movements, and facial expressions during ten longitudinal sessions. Videos of the sessions were labelled by clinicians to recognize
four states: tiredness, anxiety, pain, and engagement. Each state was modelled by the FSNBC receiving the information of finger
pressure, hand movements, and facial expressions. The four FSNBCs were linked in the CCC to exploit the dependency relationships
between the states. The convergence of CCC was reached by 5 iterations at most for all the patients. Results (ROC AUC) of CCC with
the FSNBC are over 0.940 ± 0.045 (mean ± std. deviation) for the four states. Relationships of mutual exclusion between engagement
and all the other states and co-occurrences between pain and anxiety were detected and discussed
Analysis of Intelligent Classifiers and Enhancing the Detection Accuracy for Intrusion Detection System
In this paper we discuss and analyze some of the intelligent classifiers
which allows for automatic detection and classification of networks attacks for
any intrusion detection system. We will proceed initially with their analysis
using the WEKA software to work with the classifiers on a well-known IDS
(Intrusion Detection Systems) dataset like NSL-KDD dataset. The NSL-KDD dataset
of network attacks was created in a military network by MIT Lincoln Labs. Then
we will discuss and experiment some of the hybrid AI (Artificial Intelligence)
classifiers that can be used for IDS, and finally we developed a Java software
with three most efficient classifiers and compared it with other options. The
outputs would show the detection accuracy and efficiency of the single and
combined classifiers used
Incremental Decision Tree based on order statistics
International audienceNew application domains generate data which are not persistent anymore but volatile: network management, web profile modeling... These data arrive quickly, massively and are visible just once. Thus they necessarily have to be learnt according to their arrival orders. For classification problems online decision trees are known to perform well and are widely used on streaming data. In this paper, we propose a new decision tree method based on order statistics. The construction of an online tree usually needs summaries in the leaves. Our solution uses bounded error quantiles summaries. A robust and performing discretization or grouping method uses these summaries to provide, at the same time, a criterion to find the best split and better density estimations. This estimation is then used to build a na¨ıve Bayes classifier in the leaves to improve the prediction in the early learning stage
- …