Location of Repository

Relational clustering models for knowledge discovery and recommender systems

By Tao Li

Abstract

Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining\ud (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as\ud to reflect the natural hidden data structure. Various heuristic or statistical approaches\ud have been developed for analyzing propositional datasets. Nevertheless, in relational\ud clustering the existence of multi-type relationships will greatly degrade the performance\ud of traditional clustering algorithms. This issue motivates us to find more effective algorithms\ud to conduct the cluster analysis upon relational datasets. In this thesis we\ud comprehensively study the idea of Representative Objects for approximating data distribution\ud and then design a multi-phase clustering framework for analyzing relational\ud datasets with high effectiveness and efficiency.\ud The second task considered in this thesis is to provide some better data models for\ud people as well as machines to browse and navigate a dataset. The hierarchical taxonomy\ud is widely used for this purpose. Compared with manually created taxonomies, automatically\ud derived ones are more appealing because of their low creation/maintenance cost\ud and high scalability. Up to now, the taxonomy generation techniques are mainly used\ud to organize document corpus. We investigate the possibility of utilizing them upon relational\ud datasets and then propose some algorithmic improvements. Another non-trivial\ud problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize\ud the content of each node. Unfortunately, this field has not been investigated\ud sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing\ud some novel approaches.\ud The final goal of our cluster analysis and taxonomy generation techniques is\ud to improve the scalability of recommender systems that are developed to tackle the\ud problem of information overload. Recent research in recommender systems integrates\ud the exploitation of domain knowledge to improve the recommendation quality, which\ud however reduces the scalability of the whole system at the same time. We address this\ud issue by applying the automatically derived taxonomy to preserve the pair-wise similarities\ud between items, and then modeling the user visits by another hierarchical structure.\ud Experimental results show that the computational complexity of the recommendation\ud procedure can be greatly reduced and thus the system scalability be improved

Topics: QA
OAI identifier: oai:wrap.warwick.ac.uk:3759

Suggested articles

Preview


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.