48,024 research outputs found
Dissimilarity-based Ensembles for Multiple Instance Learning
In multiple instance learning, objects are sets (bags) of feature vectors
(instances) rather than individual feature vectors. In this paper we address
the problem of how these bags can best be represented. Two standard approaches
are to use (dis)similarities between bags and prototype bags, or between bags
and prototype instances. The first approach results in a relatively
low-dimensional representation determined by the number of training bags, while
the second approach results in a relatively high-dimensional representation,
determined by the total number of instances in the training set. In this paper
a third, intermediate approach is proposed, which links the two approaches and
combines their strengths. Our classifier is inspired by a random subspace
ensemble, and considers subspaces of the dissimilarity space, defined by
subsets of instances, as prototypes. We provide guidelines for using such an
ensemble, and show state-of-the-art performances on a range of multiple
instance learning problems.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems, Special Issue on Learning in Non-(geo)metric Space
Dealing with non-metric dissimilarities in fuzzy central clustering algorithms
Clustering is the problem of grouping objects on the basis of a similarity measure among them. Relational clustering methods can be employed when a feature-based representation of the objects is not available, and their description is given in terms of pairwise (dis)similarities. This paper focuses on the relational duals of fuzzy central clustering algorithms, and their application in situations when patterns are represented by means of non-metric pairwise dissimilarities. Symmetrization and shift operations have been proposed to transform the dissimilarities among patterns from non-metric to metric. In this paper, we analyze how four popular fuzzy central clustering algorithms are affected by such transformations. The main contributions include the lack of invariance to shift operations, as well as the invariance to symmetrization. Moreover, we highlight the connections between relational duals of central clustering algorithms and central clustering algorithms in kernel-induced spaces. One among the presented algorithms has never been proposed for non-metric relational clustering, and turns out to be very robust to shift operations. (C) 2008 Elsevier Inc. All rights reserved
On aggregation operators of transitive similarity and dissimilarity relations
Similarity and dissimilarity are widely used concepts. One of the most studied matters is their combination or aggregation. However, transitivity property is often ignored when aggregating despite being a highly important property, studied by many authors but from different points of view. We collect here some results in preserving transitivity when aggregating, intending to clarify the relationship between aggregation and transitivity and making it useful to design aggregation operators that keep transitivity property. Some examples of the utility of the results are also shown.Peer ReviewedPostprint (published version
Ranking and significance of variable-length similarity-based time series motifs
The detection of very similar patterns in a time series, commonly called
motifs, has received continuous and increasing attention from diverse
scientific communities. In particular, recent approaches for discovering
similar motifs of different lengths have been proposed. In this work, we show
that such variable-length similarity-based motifs cannot be directly compared,
and hence ranked, by their normalized dissimilarities. Specifically, we find
that length-normalized motif dissimilarities still have intrinsic dependencies
on the motif length, and that lowest dissimilarities are particularly affected
by this dependency. Moreover, we find that such dependencies are generally
non-linear and change with the considered data set and dissimilarity measure.
Based on these findings, we propose a solution to rank those motifs and measure
their significance. This solution relies on a compact but accurate model of the
dissimilarity space, using a beta distribution with three parameters that
depend on the motif length in a non-linear way. We believe the incomparability
of variable-length dissimilarities could go beyond the field of time series,
and that similar modeling strategies as the one used here could be of help in a
more broad context.Comment: 20 pages, 10 figure
Further results on dissimilarity spaces for hyperspectral images RF-CBIR
Content-Based Image Retrieval (CBIR) systems are powerful search tools in
image databases that have been little applied to hyperspectral images.
Relevance feedback (RF) is an iterative process that uses machine learning
techniques and user's feedback to improve the CBIR systems performance. We
pursued to expand previous research in hyperspectral CBIR systems built on
dissimilarity functions defined either on spectral and spatial features
extracted by spectral unmixing techniques, or on dictionaries extracted by
dictionary-based compressors. These dissimilarity functions were not suitable
for direct application in common machine learning techniques. We propose to use
a RF general approach based on dissimilarity spaces which is more appropriate
for the application of machine learning algorithms to the hyperspectral
RF-CBIR. We validate the proposed RF method for hyperspectral CBIR systems over
a real hyperspectral dataset.Comment: In Pattern Recognition Letters (2013
Evidential relational clustering using medoids
In real clustering applications, proximity data, in which only pairwise
similarities or dissimilarities are known, is more general than object data, in
which each pattern is described explicitly by a list of attributes.
Medoid-based clustering algorithms, which assume the prototypes of classes are
objects, are of great value for partitioning relational data sets. In this
paper a new prototype-based clustering method, named Evidential C-Medoids
(ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical
framework of belief functions is proposed. In ECMdd, medoids are utilized as
the prototypes to represent the detected classes, including specific classes
and imprecise classes. Specific classes are for the data which are distinctly
far from the prototypes of other classes, while imprecise classes accept the
objects that may be close to the prototypes of more than one class. This soft
decision mechanism could make the clustering results more cautious and reduce
the misclassification rates. Experiments in synthetic and real data sets are
used to illustrate the performance of ECMdd. The results show that ECMdd could
capture well the uncertainty in the internal data structure. Moreover, it is
more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July
2015, Washington, DC, USA , Jul 2015, Washington, United State
Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm
The Ward error sum of squares hierarchical clustering method has been very
widely used since its first description by Ward in a 1963 publication. It has
also been generalized in various ways. However there are different
interpretations in the literature and there are different implementations of
the Ward agglomerative algorithm in commonly used software systems, including
differing expressions of the agglomerative criterion. Our survey work and case
studies will be useful for all those involved in developing software for data
analysis using Ward's hierarchical clustering method.Comment: 20 pages, 21 citations, 4 figure
- âŠ