Scalable factorization model to discover implicit and explicit similarities across domains

Do, Duc Minh Quan

Scalable factorization model to discover implicit and explicit similarities across domains

Authors: Duc Minh Quan Do
Publication date: 1 January 2018
Publisher

Abstract

University of Technology Sydney. Faculty of Engineering and Information Technology.E-commerce businesses increasingly depend on recommendation systems to introduce personalized services and products to their target customers. Achieving accurate recommendations requires a sufficient understanding of user preferences and item characteristics. Given the current innovations on the Web, coupled datasets are abundantly available across domains. An analysis of these datasets can provide a broader knowledge to understand the underlying relationship between users and items. This thorough understanding results in more collaborative filtering power and leads to a higher recommendation accuracy. However, how to effectively use this knowledge for recommendation is still a challenging problem. In this research, we propose to exploit both explicit and implicit similarities extracted from latent factors across domains with matrix tri-factorization. On the coupled dimensions, common parts of the coupled factors across domains are shared among them. At the same time, their domain-specific parts are preserved. We show that such a configuration of both common and domain-specific parts benefits cross-domain recommendations significantly. Moreover, on the non-coupled dimensions, the middle factor of the tri-factorization is proposed to use to match the closely related clusters across datasets and align the matched ones to transfer cross-domain implicit similarities, further improving the recommendation. Furthermore, when dealing with data coupled from different sources, the scalability of the analytical method is another significant concern. We design a distributed factorization model that can scale up as the observed data across domains increases. Our data parallelism, based on Apache Spark, enables the model to have the smallest communication cost. Also, the model is equipped with an optimized solver that converges faster. We demonstrate that these key features stabilize our model’s performance when the data grows. Validated on real-world datasets, our developed model outperforms the existing algorithms regarding recommendation accuracy and scalability. These empirical results illustrate the potential of our research in exploiting both explicit and implicit similarities across domains for improving recommendation performance

Similar works

Full text

Available Versions

OPUS - University of Technology Sydney

oai:opus.lib.uts.edu.au:10453/...

Last time updated on 18/10/2019