4,720 research outputs found
Semi-supervised and Active Image Clustering with Pairwise Constraints from Humans
Clustering images has been an interesting problem for computer vision and machine learning researchers for many years. However as the number of categories increases, image clustering becomes extremely hard and is not possible to use for many practical applications. Researchers have proposed several methods that use semi-supervision from humans to improve clustering. Constrained clustering, where users indicate whether an image pair belong to the same category or not, is a well-known paradigm for semi-supervision. Past research has shown that pairwise constraints have the potential to significantly improve clustering performance.
There are two major components to constrained clustering research: how pairwise constraints can be used to improve clustering (e.g: constrained clustering algorithms, distance or metric learning methods) and determining which constraints are most useful for improving clustering (e.g.: active or interactive clustering methods). In this thesis we propose three different approaches to improve pairwise constrained clustering spanning both of these components. First, we propose a distance learning method in non-vector spaces, where the triangle inequality is used to propagate the pairwise constraints to the unsupervised image pairs. This approach can work with any pairwise distance and does not require any vector representation of images. Second, we propose an algorithm for active image pair selection. A novel method is developed to choose the most useful pairs to show a person, obtaining constraints that improve clustering. Third, we study how pairwise constraints can effectively be used to cluster large image datasets. Complete clustering of large datasets requires an extremely large number of pairwise constraints and may not be feasible in practice. We propose a new algorithm to cluster a subset of the images only (we call this subclustering), which will produce a few examples from each class. Subclustering will produce smaller but purer clusters and can be used for summarization, category discovery, browsing, image search, etc.... Finally, we make use of human input in an active subclustering algorithm to further improve results. We perform experiments on several real world datasets such as faces, leaves, videos and scenes and empirically show that our approaches can advance the state-of-the-art in clustering
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints
Algorithms for detecting communities in complex networks are generally
unsupervised, relying solely on the structure of the network. However, these
methods can often fail to uncover meaningful groupings that reflect the
underlying communities in the data, particularly when those structures are
highly overlapping. One way to improve the usefulness of these algorithms is by
incorporating additional background information, which can be used as a source
of constraints to direct the community detection process. In this work, we
explore the potential of semi-supervised strategies to improve algorithms for
finding overlapping communities in networks. Specifically, we propose a new
method, based on label propagation, for finding communities using a limited
number of pairwise constraints. Evaluations on synthetic and real-world
datasets demonstrate the potential of this approach for uncovering meaningful
community structures in cases where each node can potentially belong to more
than one community.Comment: Fix table
Multi-view constrained clustering with an incomplete mapping between views
Multi-view learning algorithms typically assume a complete bipartite mapping
between the different views in order to exchange information during the
learning process. However, many applications provide only a partial mapping
between the views, creating a challenge for current methods. To address this
problem, we propose a multi-view algorithm based on constrained clustering that
can operate with an incomplete mapping. Given a set of pairwise constraints in
each view, our approach propagates these constraints using a local similarity
measure to those instances that can be mapped to the other views, allowing the
propagated constraints to be transferred across views via the partial mapping.
It uses co-EM to iteratively estimate the propagation within each view based on
the current clustering model, transfer the constraints across views, and then
update the clustering model. By alternating the learning process between views,
this approach produces a unified clustering model that is consistent with all
views. We show that this approach significantly improves clustering performance
over several other methods for transferring constraints and allows multi-view
clustering to be reliably applied when given a limited mapping between the
views. Our evaluation reveals that the propagated constraints have high
precision with respect to the true clusters in the data, explaining their
benefit to clustering performance in both single- and multi-view learning
scenarios
Deep Learning vs Spectral Clustering into an active clustering with pairwise constraints propagation
International audienceIn our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods
- …