9,989 research outputs found
Information theoretic novelty detection
We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related
to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data
sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
BINet: Multi-perspective Business Process Anomaly Classification
In this paper, we introduce BINet, a neural network architecture for
real-time multi-perspective anomaly detection in business process event logs.
BINet is designed to handle both the control flow and the data perspective of a
business process. Additionally, we propose a set of heuristics for setting the
threshold of an anomaly detection algorithm automatically. We demonstrate that
BINet can be used to detect anomalies in event logs not only on a case level
but also on event attribute level. Finally, we demonstrate that a simple set of
rules can be used to utilize the output of BINet for anomaly classification. We
compare BINet to eight other state-of-the-art anomaly detection algorithms and
evaluate their performance on an elaborate data corpus of 29 synthetic and 15
real-life event logs. BINet outperforms all other methods both on the synthetic
as well as on the real-life datasets
Beat histogram features for rhythm-based musical genre classification using multiple novelty functions
In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks
- …