Search CORE

1 research outputs found

An Interactive Tool for Constrained Clustering with Human Sampling

Author: Masayuki Okabe
Seiji Yamada
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-This paper describes an interactive tool for constrained clustering that helps users to select effective constraints efficiently during the constrained clustering process. This tool has some functions such as 2-D visual arrangement of a data set and constraint assignment by mouse manipulation. Moreover, it can execute distance metric learning and kmedoids clustering. In this paper, we show the overview of the tool and how it works, especially in the functions of display arrangement by multi-dimensional scaling and incremental distance metric learning. Eventually we show a preliminary experiment in which human heuristics found through our GUI improve the clustering. This study provides fundamental technologies for interactive clustering of Web page and Web usages. I. INTRODUCTION Constrained clustering is a promising approach for improving the accuracy of clustering by using some prior knowledge about data. As the prior knowledge, we generally use two types of simple constraints about a pair of data. The first constraint is called "must-link" which is a pair of data that must be in the same cluster. The second one is called "cannot-link" which is a pair of data that must be in different clusters. There have been proposed several approaches to utilize these constraints so far. For example, a well-known constrained clustering algorithm the COP-Kmeans [1] uses these constraints as exceptional rules for the data allocation process in a k-means algorithm. A data may not be allocated to the nearest cluster center if the data and a member of the cluster form a cannot-link, or the data and a member of the other cluster form a must-link. Another studies [2], [3], Although the use of constraints is an effective approach, we have some problems in preparing constraints. One problem is the efficiency of the process. Because a human user generally needs to label many constraints with "must-link" or "cannot-link", his/her cognitive cost seems very high. Thus we need an interactive system to help users cut down such an operation cost. The other problem is the effectiveness of the prepared constraints. Many experimental results in recent studies have shown clustering performance does not monotonically improve (sometimes deteriorates) as th

CiteSeerX