Search CORE

77 research outputs found

Cross-Entropy Clustering

Author: Spurek Przemysław
Tabor Jacek
Publication venue: 'Elsevier BV'
Publication date: 11/12/2012
Field of study

We construct a cross-entropy clustering (CEC) theory which finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, our theory gives simple and efficient criterion to verify cluster validity. Although CEC can be build on an arbitrary family of densities, in the most important case of Gaussian CEC: {\em -- the division into clusters is affine invariant; -- the clustering will have the tendency to divide the data into ellipsoid-type shapes; -- the approach is computationally efficient as we can apply Hartigan approach.} We study also with particular attention clustering based on the Spherical Gaussian densities and that of Gaussian densities with covariance s \I. In the letter case we show that with

s

converging to zero we obtain the classical k-means clustering

arXiv.org e-Print Archive

CiteSeerX

Uniform Cross-entropy Clustering

Author: Brzeski Maciej
Spurek Przemysław
Publication venue: 'Uniwersytet Jagiellonski - Wydawnictwo Uniwersytetu Jagiellonskiego'
Publication date: 01/01/2016
Field of study

Robust mixture models approaches, which use non-normal distributions have recently been upgraded to accommodate data with fixed bounds. In this article we propose a new method based on uniform distributions and Cross-Entropy Clustering (CEC). We combine a simple density model with a clustering method which allows to treat groups separately and estimate parameters in each cluster individually. Consequently, we introduce an effective clustering algorithm which deals with non-normal data

Jagiellonian Univeristy Repository

Cross-entropy based image thresholding

Author: Malik Mateusz
Spurek Przemysław
Tabor Jacek
Publication venue: 'Uniwersytet Jagiellonski - Wydawnictwo Uniwersytetu Jagiellonskiego'
Publication date: 01/01/2015
Field of study

This paper presents a novel global thresholding algorithm for the binarization of documents and gray-scale images using Cross Entropy Clustering. In the first step, a gray-level histogram is constructed, and the Gaussian densities are fitted. The thresholds are then determined as the cross-points of the Gaussian densities. This approach automatically detects the number of components (the upper limit of Gaussian densities is required)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Jagiellonian Univeristy Repository

Subspace memory clustering

Author: Spurek Przemysław
Struski Łukasz
Tabor Jacek
Publication venue: 'Uniwersytet Jagiellonski - Wydawnictwo Uniwersytetu Jagiellonskiego'
Publication date: 01/01/2015
Field of study

We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D c RN into k 2 N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments

Portal Czasopism Naukowych (E-Journals)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Jagiellonian Univeristy Repository

SVM with a neutral class

Author: Spurek Przemysław
Tabor Jacek
Śmieja Marek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In many real binary classification problems, in addition to the presence of positive and negative classes, we are also given the examples of third neutral class, i.e., the examples with uncertain or intermediate state between positive and negative. Although it is a common practice to ignore the neutral class in a learning process, its appropriate use can lead to the improvement in classification accuracy. In this paper, to include neutral examples in a training stage, we adapt two variants of Tri-Class SVM (proposed by Angulo et al. in Neural Process Lett 23(1):89–101, 2006), the method designed to solve three-class problems with a use of single learning model. In analogy to classical SVM, we look for such a hyperplane, which maximizes the margin between positive and negative instances and which is localized as close to the neutral class as possible. In addition to original Angulo’s paper, we give a new interpretation of the model and show that it can be easily implemented in the primal. Our experiments demonstrate that considered methods obtain better results in binary classification problems than classical SVM and semi-supervised SVM

Jagiellonian Univeristy Repository

Online updating of active function cross-entropy clustering

Author: Byrski Krzysztof
Spurek Przemysław
Tabor Jacek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2018
Field of study

Gaussian mixture models have many applications in density estimation and data clustering. However, the model does not adapt well to curved and strongly nonlinear data, since many Gaussian components are typically needed to appropriately fit the data that lie around the nonlinear manifold. To solve this problem, the active function cross-entropy clustering (afCEC) method was constructed. In this article, we present an online afCEC algorithm. Thanks to this modification, we obtain a method which is able to remove unnecessary clusters very fast and, consequently, we obtain lower computational complexity. Moreover, we obtain a better minimum (with a lower value of the cost function). The modification allows to process data streams

Crossref

Jagiellonian Univeristy Repository

Set Aggregation Network as a Trainable Pooling Layer

Author: Maziarka Łukasz
Nowak Aleksandra
Spurek Przemysław
Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer.Comment: ICONIP 201

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository