Search CORE

66,478 research outputs found

Recommended from our members

Learning with Partial Supervision for Clustering and Classification

Author: Pei Yuanli
Publication venue: 'Oregon State University'
Publication date
Field of study

In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not have the full label but only partial supervision about the data, such as instance similarities or incomplete label assignments. In such cases, traditional clustering and classification methods do not directly apply. To address such problems, this thesis focuses on the task of learning from partial supervision for clustering and classification tasks. For clustering with partial supervision, we investigate three problems: a) constrained clustering in multi-instance multi-label learning, where the goal is to group instances into clusters that respect the background knowledge given by the bag-level labels; b) clustering with constraints, where the partial supervision is expressed as "pairwise constraints" or "relative constraints", regarding similarities about instance pairs and triplets respectively; c) active learning of pairwise constraints for clustering, where the goal is to improve the clustering with minimum human effort by iteratively querying the most informative pairs to an oracle. For classification with partial supervision, we address the problem of multi-label learning where data is associated with a latent label hierarchy and incomplete label assignments, and the goal is to simultaneously discover the latent hierarchy as well as to learn a multi-label classifier that is consistent with the hierarchy.Keywords: Classification, Partial Supervision, Active Learning, Clusterin

ScholarsArchive@OSU

Data Clustering and Partial Supervision with Some Parallel Developments

Author: A. Salem Sameh
Publication venue
Publication date
Field of study

Data Clustering and Partial Supell'ision with SOllie Parallel Developments by Sameh A. Salem Clustering is an important and irreplaceable step towards the search for structures in the data. Many different clustering algorithms have been proposed. Yet, the sources of variability in most clustering algorithms affect the reliability of their results. Moreover, the majority tend to be based on the knowledge of the number of clusters as one of the input parameters. Unfortunately, there are many scenarios, where this knowledge may not be available. In addition, clustering algorithms are very computationally intensive which leads to a major challenging problem in scaling up to large datasets. This thesis gives possible solutions for such problems. First, new measures - called clustering performance measures (CPMs) - for assessing the reliability of a clustering algorithm are introduced. These CPMs can be used to evaluate: I) clustering algorithms that have a structure bias to certain type of data distribution as well as those that have no such biases, 2) clustering algorithms that have initialisation dependency as well as the clustering algorithms that have a unique solution for a given set of parameter values with no initialisation dependency. Then, a novel clustering algorithm, which is a RAdius based Clustering ALgorithm (RACAL), is proposed. RACAL uses a distance based principle to map the distributions of the data assuming that clusters are determined by a distance parameter, without having to specify the number of clusters. Furthermore, RACAL is enhanced by a validity index to choose the best clustering result, i.e. result has compact clusters with wide cluster separations, for a given input parameter. Comparisons with other clustering algorithms indicate the applicability and reliability of the proposed clustering algorithm. Additionally, an adaptive partial supervision strategy is proposed for using in conjunction with RACAL_to make it act as a classifier. Results from RACAL with partial supervision, RACAL-PS, indicate its robustness in classification. Additionally, a parallel version of RACAL (P-RACAL) is proposed. The parallel evaluations of P-RACAL indicate that P-RACAL is scalable in terms of speedup and scaleup, which gives the ability to handle large datasets of high dimensions in a reasonable time. Next, a novel clustering algorithm, which achieves clustering without any control of cluster sizes, is introduced. This algorithm, which is called Nearest Neighbour Clustering, Algorithm (NNCA), uses the same concept as the K-Nearest Neighbour (KNN) classifier with the advantage that the algorithm needs no training set and it is completely unsupervised. Additionally, NNCA is augmented with a partial supervision strategy, NNCA-PS, to act as a classifier. Comparisons with other methods indicate the robustness of the proposed method in classification. Additionally, experiments on parallel environment indicate the suitability and scalability of the parallel NNCA, P-NNCA, in handling large datasets. Further investigations on more challenging data are carried out. In this context, microarray data is considered. In such data, the number of clusters is not clearly defined. This points directly towards the clustering algorithms that does not require the knowledge of the number of clusters. Therefore, the efficacy of one of these algorithms is examined. Finally, a novel integrated clustering performance measure (lCPM) is proposed to be used as a guideline for choosing the proper clustering algorithm that has the ability to extract useful biological information in a particular dataset. Supplied by The British Library - 'The world's knowledge' Supplied by The British Library - 'The world's knowledge

University of Liverpool Repository

Recommended from our members

Orderly Subspace Clustering

Author: Liang Y.
Linchuan X.
Suzuki A.
Tian Feng
Wang J.
Yamanishi K.
Publication venue
Publication date: 24/02/2019
Field of study

Semi-supervised representation-based subspace clustering is to partition data into their underlying subspaces by finding effective data representations with partial supervisions. Essentially, an effective and accurate representation should be able to uncover and preserve the true data structure. Meanwhile, a reliable and easy-to-obtain supervision is desirable for practical learning. To meet these two objectives, in this paper we make the first attempt towards utilizing the orderly relationship, such as the data a is closer to b than to c, as a novel supervision. We propose an orderly subspace clustering approach with a novel regularization term. OSC enforces the learned representations to simultaneously capture the intrinsic subspace structure and reveal orderly structure that is faithful to true data relationship. Experimental results with several benchmarks have demonstrated that aside from more accurate clustering against state-of-the-arts, OSC interprets orderly data structure which is beyond what current approaches can offer

Greenwich Academic Literature Archive

Bournemouth University Research Online

Association for the Advancement of Artificial Intelligence: AAAI Publications

Adaptive constrained clustering with application to dynamic image database categorization and visualization.

Author: Meredith Jason Daron, 1982-
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/08/2010
Field of study

The advent of larger storage spaces, affordable digital capturing devices, and an ever growing online community dedicated to sharing images has created a great need for efficient analysis methods. In fact, analyzing images for the purpose of automatic categorization and retrieval is quickly becoming an overwhelming task even for the casual user. Initially, systems designed for these applications relied on contextual information associated with images. However, it was realized that this approach does not scale to very large data sets and can be subjective. Then researchers proposed methods relying on the content of the images. This approach has also proved to be limited due to the semantic gap between the low-level representation of the image and the high-level user perception. In this dissertation, we introduce a novel clustering technique that is designed to combine multiple forms of information in order to overcome the disadvantages observed while using a single information domain. Our proposed approach, called Adaptive Constrained Clustering (ACC), is a robust, dynamic, and semi-supervised algorithm. It is based on minimizing a single objective function incorporating the abilities to: (i) use multiple feature subsets while learning cluster independent feature relevance weights; (ii) search for the optimal number of clusters; and (iii) incorporate partial supervision in the form of pairwise constraints. The content of the images is used to extract the features used in the clustering process. The context information is used in constructing a set of appropriate constraints. These constraints are used as partial supervision information to guide the clustering process. The ACC algorithm is dynamic in the sense that the number of categories are allowed to expand and contract depending on the distribution of the data and the available set of constraints. We show that the proposed ACC algorithm is able to partition a given data set into meaningful clusters using an adaptive, soft constraint satisfaction methodology for the purpose of automatically categorizing and summarizing an image database. We show that the ACC algorithm has the ability to incorporate various types of contextual information. This contextual information includes: spatial information provided by geo-referenced images that include GPS coordinates pinpointing their location, temporal information provided by each image\u27s time stamp indicating the capture time, and textual information provided by a set of keywords describing the semantics of the associated images

University of Louisville

Fast Gaussian Pairwise Constrained Spectral Clustering

Author: D.J. Klein
J. Shi
L. Hubert
M. Belkin
M. Saerens
S. Guattery
T. Bie De
U. Luxburg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceWe consider the problem of spectral clustering with partial supervision in the form of must-link and cannot-link constraints. Such pairwise constraints are common in problems like coreference resolution in natural language processing. The approach developed in this paper is to learn a new representation space for the data together with a dis-tance in this new space. The representation space is obtained through a constraint-driven linear transformation of a spectral embedding of the data. Constraints are expressed with a Gaussian function that locally reweights the similarities in the projected space. A global, non-convex optimization objective is then derived and the model is learned via gradi-ent descent techniques. Our algorithm is evaluated on standard datasets and compared with state of the art algorithms, like [14,18,31]. Results on these datasets, as well on the CoNLL-2012 coreference resolution shared task dataset, show that our algorithm significantly outperforms related approaches and is also much more scalable

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server