Search CORE

1,193 research outputs found

Semi-supervised cross-entropy clustering with information bottleneck constraint

Author: Geiger Bernhard C.
Śmieja Marek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Semi-supervised model-based clustering with controlled clusters leakage

Author: Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Online updating of active function cross-entropy clustering

Author: Byrski Krzysztof
Spurek Przemysław
Tabor Jacek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2018
Field of study

Gaussian mixture models have many applications in density estimation and data clustering. However, the model does not adapt well to curved and strongly nonlinear data, since many Gaussian components are typically needed to appropriately fit the data that lie around the nonlinear manifold. To solve this problem, the active function cross-entropy clustering (afCEC) method was constructed. In this article, we present an online afCEC algorithm. Thanks to this modification, we obtain a method which is able to remove unnecessary clusters very fast and, consequently, we obtain lower computational complexity. Moreover, we obtain a better minimum (with a lower value of the cost function). The modification allows to process data streams

Crossref

Jagiellonian Univeristy Repository

Designing large quantum key distribution networks via medoid-based algorithms

Author: Garcia-Cobo I.
Garcia-Cobo I.
Menéndez H.
Menéndez H.
Publication venue: Elsevier Science
Publication date: 01/01/2021
Field of study

The current development of quantum mechanics and its applications suppose a threat to modern cryptography as it was conceived. The abilities of quantum computers for solving complex mathematical problems, as a strong computational novelty, is the root of that risk. However, quantum technologies can also prevent this threat by leveraging quantum methods to distribute keys. This field, called Quantum Key Distribution (QKD) is growing, although it still needs more physical basics to become a reality as popular as the Internet. This work proposes a novel methodology that leverages medoid-based clustering techniques to design quantum key distribution networks on commercial fiber optics systems. Our methodology focuses on the current limitations of these communication systems, their error loss and how trusted repeaters can lead to achieve a proper communication with the current technology. We adapt our model to the current data on a wide territory covering an area of almost 100,000 km2, and prove that considering physical limitations of around 45km with 3.1 error loss, our design can provide service to the whole area. This technique is the first to extend the state of the art network’s design, that is focused on up to 10 nodes, to networks dealing with more than 200 nodes

Middlesex University Research Repository

Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning

Author: Berkhout E.
Cooper M.
Fegraus E.
Hengl T.
Heuvelink Gerard B.M.
Kwabena N. A.
Leenaars Johan G.B.
Mamo T.
Shepherd Keith D.
Tilahun H.
Walsh M. G.
Wheeler I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Spatial predictions of soil macro and micro-nutrient content across Sub-Saharan Africa at 250 m spatial resolution and for 0–30 cm depth interval are presented. Predictions were produced for 15 target nutrients: organic carbon (C) and total (organic) nitrogen (N), total phosphorus (P), and extractable—phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sulfur (S), sodium (Na), iron (Fe), manganese (Mn), zinc (Zn), copper (Cu), aluminum (Al) and boron (B). Model training was performed using soil samples from ca. 59,000 locations (a compilation of soil samples from the AfSIS, EthioSIS, One Acre Fund, VitalSigns and legacy soil data) and an extensive stack of remote sensing covariates in addition to landform, lithologic and land cover maps. An ensemble model was then created for each nutrient from two machine learning algorithms

Wageningen University & Research Publications

CGSpace