5 research outputs found
Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback
Abstract While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user's intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets
Feedback Clustering for Online Travel Agencies Searches: a Case Study
Understanding choices performed by online customers is a growing need in the
travel industry. In many practical situations, the only available information
is the flight search query performed by the customer with no additional profile
knowledge. In general, customer flight bookings are driven by prices, duration,
number of connections, and so on. However, not all customers might assign the
same importance to each of those criteria. Here comes the need of grouping
together all flight searches performed by the same kind of customer, that is
having the same booking criteria. The effectiveness of some set of
recommendations, for a single cluster, can be measured in terms of the number
of bookings historically performed. This effectiveness measure plays the role
of a feedback, that is an external knowledge which can be recombined to
iteratively obtain a final segmentation. In this paper, we describe our Online
Travel Agencies (OTA) flight search use case and highlight its specific
features. We address the flight search segmentation problem motivated above by
proposing a novel algorithm called Split-or-Merge (S/M). This algorithm is a
variation of the Split-Merge-Evolve (SME) method. The SME method has already
been introduced in the community as an iterative process updating a clustering
given by the K-means algorithm by splitting and merging clusters subject to
feedback independent evaluations. No previous application of the SME method to
the real-word data is reported in literature to the best of our knowledge.
Here, we provide experimental evaluations over real-world data to the SME and
the S/M methods. The impact on our domain-specific metrics obtained under the
SME and the S/M methods suggests that feedback clustering techniques can be
very promising in the handling of the domain of OTA flight searches