10 research outputs found
Fast Matrix Factorization for Online Recommendation with Implicit Feedback
This paper contributes improvements on both the effectiveness and efficiency
of Matrix Factorization (MF) methods for implicit feedback. We highlight two
critical issues of existing works. First, due to the large space of unobserved
feedback, most existing works resort to assign a uniform weight to the missing
data to reduce computational complexity. However, such a uniform assumption is
invalid in real-world settings. Second, most methods are also designed in an
offline setting and fail to keep up with the dynamic nature of online data. We
address the above two issues in learning MF models from implicit feedback. We
first propose to weight the missing data based on item popularity, which is
more effective and flexible than the uniform-weight assumption. However, such a
non-uniform weighting poses efficiency challenge in learning the model. To
address this, we specifically design a new learning algorithm based on the
element-wise Alternating Least Squares (eALS) technique, for efficiently
optimizing a MF model with variably-weighted missing data. We exploit this
efficiency to then seamlessly devise an incremental update strategy that
instantly refreshes a MF model given new feedback. Through comprehensive
experiments on two public datasets in both offline and online protocols, we
show that our eALS method consistently outperforms state-of-the-art implicit MF
methods. Our implementation is available at
https://github.com/hexiangnan/sigir16-eals.Comment: 10 pages, 8 figure
Regularizing Matrix Factorization with User and Item Embeddings for Recommendation
Following recent successes in exploiting both latent factor and word
embedding models in recommendation, we propose a novel Regularized
Multi-Embedding (RME) based recommendation model that simultaneously
encapsulates the following ideas via decomposition: (1) which items a user
likes, (2) which two users co-like the same items, (3) which two items users
often co-liked, and (4) which two items users often co-disliked. In
experimental validation, the RME outperforms competing state-of-the-art models
in both explicit and implicit feedback datasets, significantly improving
Recall@5 by 5.9~7.0%, NDCG@20 by 4.3~5.6%, and MAP@10 by 7.9~8.9%. In addition,
under the cold-start scenario for users with the lowest number of interactions,
against the competing models, the RME outperforms NDCG@5 by 20.2% and 29.4% in
MovieLens-10M and MovieLens-20M datasets, respectively. Our datasets and source
code are available at: https://github.com/thanhdtran/RME.git.Comment: CIKM 201
ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance
Most of ranking models are trained only with displayed items (most are hot
items), but they are utilized to retrieve items in the entire space which
consists of both displayed and non-displayed items (most are long-tail items).
Due to the sample selection bias, the long-tail items lack sufficient records
to learn good feature representations, i.e. data sparsity and cold start
problems. The resultant distribution discrepancy between displayed and
non-displayed items would cause poor long-tail performance. To this end, we
propose an entire space adaptation model (ESAM) to address this problem from
the perspective of domain adaptation (DA). ESAM regards displayed and
non-displayed items as source and target domains respectively. Specifically, we
design the attribute correlation alignment that considers the correlation
between high-level attributes of the item to achieve distribution alignment.
Furthermore, we introduce two effective regularization strategies, i.e.
\textit{center-wise clustering} and \textit{self-training} to improve DA
process. Without requiring any auxiliary information and auxiliary domains,
ESAM transfers the knowledge from displayed items to non-displayed items for
alleviating the distribution inconsistency. Experiments on two public datasets
and a large-scale industrial dataset collected from Taobao demonstrate that
ESAM achieves state-of-the-art performance, especially in the long-tail space.
Besides, we deploy ESAM to the Taobao search engine, leading to significant
improvement on online performance. The code is available at
\url{https://github.com/A-bone1/ESAM.git}Comment: Accept by SIGIR-202
Wellness Representation of Users in Social Media: Towards Joint Modelling of Heterogeneity and Temporality
The increasing popularity of social media has encouraged health consumers to share, explore, and validate health and wellness information on social networks, which provide a rich repository of Patient Generated Wellness Data (PGWD). While data-driven healthcare has attracted a lot of attention from academia and industry for improving care delivery through personalized healthcare, limited research has been done on harvesting and utilizing PGWD available on social networks. Recently, representation learning has been widely used in many applications to learn low-dimensional embedding of users. However, existing approaches for representation learning are not directly applicable to PGWD due to its domain nature as characterized by longitudinality, incompleteness, and sparsity of observed data as well as heterogeneity of the patient population. To tackle these problems, we propose an approach which directly learns the embedding from longitudinal data of users, instead of vector-based representation. In particular, we simultaneously learn a low-dimensional latent space as well as the temporal evolution of users in the wellness space. The proposed method takes into account two types of wellness prior knowledge: (1) temporal progression of wellness attributes; and (2) heterogeneity of wellness attributes in the patient population. Our approach scales well to large datasets using parallel stochastic gradient descent. We conduct extensive experiments to evaluate our framework at tackling three major tasks in wellness domain: attribute prediction, success prediction, and community detection. Experimental results on two real-world datasets demonstrate the ability of our approach in learning effective user representations
Optimizing E-Management Using Web Data Mining
Today, one of the biggest challenges that E-management systems face is the explosive growth of operating data and to use this data to enhance services. Web usage mining has emerged as an important technique to provide useful management information from user's Web data. One of the areas where such information is needed is the Web-based academic digital libraries. A digital library (D-library) is an information resource system to store resources in digital format and provide access to users through the network. Academic libraries offer a huge amount of information resources, these information resources overwhelm students and makes it difficult for them to access to relevant information. Proposed solutions to alleviate this issue emphasize the need to build Web recommender systems that make it possible to offer each student with a list of resources that they would be interested in. Collaborative filtering is the most successful technique used to offer recommendations to users. Collaborative filtering provides recommendations according to the user relevance feedback that tells the system their preferences. Most recent work on D-library recommender systems uses explicit feedback.
Explicit feedback requires students to rate resources which make the recommendation process not realistic because few students are willing to provide their interests explicitly. Thus, collaborative filtering suffers from “data sparsity” problem. In response to this problem, the study proposed a Web usage mining framework to alleviate the sparsity problem. The framework incorporates clustering mining technique and usage data in the recommendation process. Students perform different actions on D-library, in this study five different actions are identified, including printing, downloading, bookmarking, reading, and viewing the abstract. These actions provide the system with large quantities of implicit feedback data. The proposed framework also utilizes clustering data mining approach to reduce the sparsity problem. Furthermore, generating recommendations based on clusters produce better results because students belonging to the same cluster usually have similar interests.
The proposed framework is divided into two main components: off-line and online components. The off-line component is comprised of two stages: data pre-processing and the derivation of student clusters. The online component is comprised of two stages: building student's profile and generating recommendations. The second stage consists of three steps, in the first step the target student profile is classified to the closest cluster profile using the cosine similarity measure. In the second phase, the Pearson correlation coefficient method is used to select the most similar students to the target student from the chosen cluster to serve as a source of prediction. Finally, a top-list of resources is presented. Using the Book-Crossing dataset the effectiveness of the proposed framework was evaluated based on sparsity level, and Mean Absolute Error (MAE) regarding accuracy. The proposed framework reduced the sparsity level between (0.07% and 26.71%) in the sub-matrices, whereas the sparsity level is between 99.79% and 78.81% using the proposed framework, and 99.86% (for the original matrix) before applying the proposed framework. The experimental results indicated that by using the proposed framework the performance is as much as 13.12% better than clustering-only explicit feedback data, and 21.14% better than the standard K Nearest Neighbours method. The overall results show that the proposed framework can alleviate the Sparsity problem resulting in improving the accuracy of the recommendations