Search CORE

758 research outputs found

One-class classifiers based on entropic spanning graphs

Author: Alippi Cesare
Livi Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2016
Field of study

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the

\alpha

-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Kernel Mean Shrinkage Estimators

Author: Fukumizu Kenji
Gretton Arthur
Muandet Krikamol
Schölkopf Bernhard
Sriperumbudur Bharath
Publication venue
Publication date: 01/01/2016
Field of study

A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs. Given a finite sample, an empirical average has been used commonly as a standard estimator of the true kernel mean. Despite a widespread use of this estimator, we show that it can be improved thanks to the well-known Stein phenomenon. We propose a new family of estimators called kernel mean shrinkage estimators (KMSEs), which benefit from both theoretical justifications and good empirical performance. The results demonstrate that the proposed estimators outperform the standard one, especially in a "large d, small n" paradigm.Comment: 41 page

arXiv.org e-Print Archive

CiteSeerX

POSTERIORI PROBABILITY ESTIMATION AND PATTERN CLASSIFICATION WITH HADAMARD TRANSFORMED NEURAL NETWORKS

Author: Ersoy Okan
Gulden Peter G.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/08/1997
Field of study

Neural networks, trained with the backpropagation algorithm have: been applied to various classification problems. For linearly separable and nonseparahle problems, they have been shown to approximate the a posteriori probability of an input vector X belonging to a specific class C. In order to achieve high accuracy, large training data sets have to be used. For a small number of input dimensions, the accuracy of estimation was inferior to estimates using the Parzen density estimation. In this thesis, we propose two new techniques, lowering the mean square estimation error drastically and achieving better classification. In the past, t:he desired output patterns used for training have been of binary nature, using one for the class C the vector belongs to, and zero for the other classes. This work will show that by training against the columns of a Hadamard matrix, and then taking the inverse Hadamard transform of the network output, we can obtain more accurate estimates. The second change proposed in comparison with standard backpropagation networks will be the use of redundant output nodes. In standard backpropagat:ion the number of output nodes equals the number of different classes. In this thesis, it is shown that adding redundant output nodes enables us to decrease the mean square error at the output further, reaching better classification and lower mean square error rates than the Parzen density estimator. Comparisons between the statistical methods, the Parzen density estimation and histogramming, the conventional neural network and the Hadamard transformed neural network with redundant output nodes are given. Further, the effects of the proposed changes to the backpropagation algorithm on the convergence speed and the risk of getting stuck in a local minimum are: studied

Deep Divergence-Based Approach to Clustering

Author: Bianchi Filippo M.
Jenssen Robert
Kampffmeyer Michael
Livi Lorenzo
Løkse Sigurd
Salberg Arnt-Børre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open question. Our contribution to this emerging field is a new deep clustering network that leverages the discriminative power of information-theoretic divergence measures, which have been shown to be effective in traditional clustering. We propose a novel loss function that incorporates geometric regularization constraints, thus avoiding degenerate structures of the resulting clustering partition. Experiments on synthetic benchmarks and real datasets show that the proposed network achieves competitive performance with respect to other state-of-the-art methods, scales well to large datasets, and does not require pre-training steps

arXiv.org e-Print Archive