1,193 research outputs found
Semi-supervised cross-entropy clustering with information bottleneck constraint
In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
Online updating of active function cross-entropy clustering
Gaussian mixture models have many applications in density estimation and data clustering. However, the model does not adapt well to curved and strongly nonlinear data, since many Gaussian components are typically needed to appropriately fit the data that lie around the nonlinear manifold. To solve this problem, the active function cross-entropy clustering (afCEC) method was constructed. In this article, we present an online afCEC algorithm. Thanks to this modification, we obtain a method which is able to remove unnecessary clusters very fast and, consequently, we obtain lower computational complexity. Moreover, we obtain a better minimum (with a lower value of the cost function). The modification allows to process data streams
Designing large quantum key distribution networks via medoid-based algorithms
The current development of quantum mechanics and its applications suppose a threat to modern cryptography as it was conceived. The abilities of quantum computers for solving complex mathematical problems, as a strong computational novelty, is the root of that risk. However, quantum technologies can also prevent this threat by leveraging quantum methods to distribute keys. This field, called Quantum Key Distribution (QKD) is growing, although it still needs more physical basics to become a reality as popular as the Internet. This work proposes a novel methodology that leverages medoid-based clustering techniques to design quantum key distribution networks on commercial fiber optics systems. Our methodology focuses on the current limitations of these communication systems, their error loss and how trusted repeaters can lead to achieve a proper communication with the current technology. We adapt our model to the current data on a wide territory covering an area of almost 100,000 km2, and prove that considering physical limitations of around 45km with 3.1 error loss, our design can provide service to the whole area. This technique is the first to extend the state of the art network’s design, that is focused on up to 10 nodes, to networks dealing with more than 200 nodes
Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning
Spatial predictions of soil macro and micro-nutrient content across Sub-Saharan Africa at
250 m spatial resolution and for 0–30 cm depth interval are presented. Predictions were produced for
15 target nutrients: organic carbon (C) and total (organic) nitrogen (N), total phosphorus (P), and
extractable—phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sulfur (S), sodium (Na), iron
(Fe), manganese (Mn), zinc (Zn), copper (Cu), aluminum (Al) and boron (B). Model training was
performed using soil samples from ca. 59,000 locations (a compilation of soil samples from the AfSIS,
EthioSIS, One Acre Fund, VitalSigns and legacy soil data) and an extensive stack of remote sensing
covariates in addition to landform, lithologic and land cover maps. An ensemble model was then created for
each nutrient from two machine learning algorithms
- …