21,317 research outputs found
Classification methods for Hilbert data based on surrogate density
An unsupervised and a supervised classification approaches for Hilbert random
curves are studied. Both rest on the use of a surrogate of the probability
density which is defined, in a distribution-free mixture context, from an
asymptotic factorization of the small-ball probability. That surrogate density
is estimated by a kernel approach from the principal components of the data.
The focus is on the illustration of the classification algorithms and the
computational implications, with particular attention to the tuning of the
parameters involved. Some asymptotic results are sketched. Applications on
simulated and real datasets show how the proposed methods work.Comment: 33 pages, 11 figures, 6 table
Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm
The identification of different homogeneous groups of observations and their
appropriate analysis in PLS-SEM has become a critical issue in many appli-
cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all
units on which the model is estimated, and approaches of segmentation present
in literature, consist in estimating separate models for each segments of
statistical units, which have been obtained either by assigning the units to
segments a priori defined. However, these approaches are not fully accept- able
because no causal structure among the variables is postulated. In other words,
a modeling approach should be used, where the obtained clusters are homogeneous
with respect to the structural causal relationships. In this paper, a new
methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is
proposed. This methodology is motivated by the fact that the sequential
approach of applying first SEM or PLS-SEM and second the clustering algorithm
such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the
correct clustering structure existing in the data. A simulation study and an
application on real data are included to evaluate the performance of the
proposed methodology
Efficient Decentralized Visual Place Recognition From Full-Image Descriptors
In this paper, we discuss the adaptation of our decentralized place
recognition method described in [1] to full image descriptors. As we had shown,
the key to making a scalable decentralized visual place recognition lies in
exploting deterministic key assignment in a distributed key-value map. Through
this, it is possible to reduce bandwidth by up to a factor of n, the robot
count, by casting visual place recognition to a key-value lookup problem. In
[1], we exploited this for the bag-of-words method [3], [4]. Our method of
casting bag-of-words, however, results in a complex decentralized system, which
has inherently worse recall than its centralized counterpart. In this paper, we
instead start from the recent full-image description method NetVLAD [5]. As we
show, casting this to a key-value lookup problem can be achieved with k-means
clustering, and results in a much simpler system than [1]. The resulting system
still has some flaws, albeit of a completely different nature: it suffers when
the environment seen during deployment lies in a different distribution in
feature space than the environment seen during training.Comment: 3 pages, 4 figures. This is a self-published paper that accompanies
our original work [1] as well as the ICRA 2017 Workshop on Multi-robot
Perception-Driven Control and Planning [2
Kernel Spectral Clustering and applications
In this chapter we review the main literature related to kernel spectral
clustering (KSC), an approach to clustering cast within a kernel-based
optimization setting. KSC represents a least-squares support vector machine
based formulation of spectral clustering described by a weighted kernel PCA
objective. Just as in the classifier case, the binary clustering model is
expressed by a hyperplane in a high dimensional space induced by a kernel. In
addition, the multi-way clustering can be obtained by combining a set of binary
decision functions via an Error Correcting Output Codes (ECOC) encoding scheme.
Because of its model-based nature, the KSC method encompasses three main steps:
training, validation, testing. In the validation stage model selection is
performed to obtain tuning parameters, like the number of clusters present in
the data. This is a major advantage compared to classical spectral clustering
where the determination of the clustering parameters is unclear and relies on
heuristics. Once a KSC model is trained on a small subset of the entire data,
it is able to generalize well to unseen test points. Beyond the basic
formulation, sparse KSC algorithms based on the Incomplete Cholesky
Decomposition (ICD) and , , Group Lasso regularization are
reviewed. In that respect, we show how it is possible to handle large scale
data. Also, two possible ways to perform hierarchical clustering and a soft
clustering method are presented. Finally, real-world applications such as image
segmentation, power load time-series clustering, document clustering and big
data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms
- …