18 research outputs found
Neural recommender models for sparse and skewed behavioral data
Modern online platforms offer recommendations and personalized search and services to a large and diverse user base while still aiming to acquaint users with the broader community on the platform. Prior work backed by large volumes of user data has shown that user retention is reliant on catering to their specific eccentric tastes, in addition to providing them popular services or content on the platform.
Long-tailed distributions are a fundamental characteristic of human activity, owing to the bursty nature of human attention. As a result, we often observe skew in data facets that involve human interaction. While there are superficial similarities to Zipf's law in textual data and other domains, the challenges with user data extend further. Individual words may have skewed frequencies in the corpus, but the long-tail words by themselves do not significantly impact downstream text-mining tasks. On the contrary, while sparse users (a majority on most online platforms) contribute little to the training data, they are equally crucial at inference time. Perhaps more so, since they are likely to churn.
In this thesis, we study platforms and applications that elicit user participation in rich social settings incorporating user-generated content, user-user interaction, and other modalities of user participation and data generation. For instance, users on the Yelp review platform participate in a follower-followee network and also create and interact with review text (two modalities of user data). Similarly, community question-answer (CQA) platforms incorporate user interaction and collaboratively authored content over diverse domains and discussion threads. Since user participation is multimodal, we develop generalizable abstractions beyond any single data modality.
Specifically, we aim to address the distributional mismatch that occurs with user data independent of dataset specifics; While a minority of the users generates most training samples, it is insufficient only to learn the preferences of this subset of users. As a result, the data's overall skew and individual users' sparsity are closely interlinked: sparse users with uncommon preferences are under-represented. Thus, we propose to treat these problems jointly with a skew-aware grouping mechanism that iteratively sharpens the identification of preference groups within the user population. As a result, we improve user characterization; content recommendation and activity prediction (+6-22% AUC, +6-43% AUC, +12-25% RMSE over state-of-the-art baselines), primarily for users with sparse activity.
The size of the item or content inventories compounds the skew problem. Recommendation models can achieve very high aggregate performance while recommending only a tiny proportion of the inventory (as little as 5%) to users. We propose a data-driven solution guided by the aggregate co-occurrence information across items in the dataset. We specifically note that different co-occurrences are not equally significant; For example, some co-occurring items are easily substituted while others are not. We develop a self-supervised learning framework where the aggregate co-occurrences guide the recommendation problem while providing room to learn these variations among the item associations. As a result, we improve coverage to ~100% (up from 5%) of the inventory and increase long-tail item recall up to 25%.
We also note that the skew and sparsity problems repeat across data modalities. For instance, social interactions and review content both exhibit aggregate skew, although individual users who actively generate reviews may not participate socially and vice-versa. It is necessary to differentially weight and merge different data sources for each user towards inference tasks in such cases. We show that the problem is inherently adversarial since the user participation modalities compete to describe a user accurately. We develop a framework to unify these representations while algorithmically tackling mode collapse, a well-known pitfall with adversarial models.
A more challenging but important instantiation of sparsity is the few-shot setting or cross-domain setting. We may only have a single or a few interactions for users or items in the sparse domains or partitions. We show that contextualizing user-item interactions helps us infer behavioral invariants in the dense domain, allowing us to correlate sparse participants to their active counterparts (resulting in 3x faster training, ~19% recall gains in multi-domain settings).
Finally, we consider the multi-task setting, where the platform incorporates multiple distinct recommendations and prediction tasks for each user. A single-user representation is insufficient for users who exhibit different preferences along each dimension. At the same time, it is counter-productive to handle correlated prediction or inference tasks in isolation. We develop a multi-faceted representation approach grounded on residual learning with heterogeneous knowledge graph representations, which provides us an expressive data representation for specialized domains and applications with multimodal user data. We achieve knowledge sharing by unifying task-independent and task-specific representations of each entity with a unified knowledge graph framework.
In each chapter, we also discuss and demonstrate how the proposed frameworks directly incorporate a wide range of gradient-optimizable recommendation and behavior models, maximizing their applicability and pertinence to user-centered inference tasks and platforms
NAIS: Neural Attentive Item Similarity Model for Recommendation
Item-to-item collaborative filtering (aka. item-based CF) has been long used
for building recommender systems in industrial settings, owing to its
interpretability and efficiency in real-time personalization. It builds a
user's profile as her historically interacted items, recommending new items
that are similar to the user's profile. As such, the key to an item-based CF
method is in the estimation of item similarities. Early approaches use
statistical measures such as cosine similarity and Pearson coefficient to
estimate item similarities, which are less accurate since they lack tailored
optimization for the recommendation task. In recent years, several works
attempt to learn item similarities from data, by expressing the similarity as
an underlying model and estimating model parameters by optimizing a
recommendation-aware objective function. While extensive efforts have been made
to use shallow linear models for learning item similarities, there has been
relatively less work exploring nonlinear neural network models for item-based
CF.
In this work, we propose a neural network model named Neural Attentive Item
Similarity model (NAIS) for item-based CF. The key to our design of NAIS is an
attention network, which is capable of distinguishing which historical items in
a user profile are more important for a prediction. Compared to the
state-of-the-art item-based CF method Factored Item Similarity Model (FISM),
our NAIS has stronger representation power with only a few additional
parameters brought by the attention network. Extensive experiments on two
public benchmarks demonstrate the effectiveness of NAIS. This work is the first
attempt that designs neural network models for item-based CF, opening up new
research possibilities for future developments of neural recommender systems
Deep Item-based Collaborative Filtering for Top-N Recommendation
Item-based Collaborative Filtering(short for ICF) has been widely adopted in
recommender systems in industry, owing to its strength in user interest
modeling and ease in online personalization. By constructing a user's profile
with the items that the user has consumed, ICF recommends items that are
similar to the user's profile. With the prevalence of machine learning in
recent years, significant processes have been made for ICF by learning item
similarity (or representation) from data. Nevertheless, we argue that most
existing works have only considered linear and shallow relationship between
items, which are insufficient to capture the complicated decision-making
process of users.
In this work, we propose a more expressive ICF solution by accounting for the
nonlinear and higher-order relationship among items. Going beyond modeling only
the second-order interaction (e.g. similarity) between two items, we
additionally consider the interaction among all interacted item pairs by using
nonlinear neural networks. Through this way, we can effectively model the
higher-order relationship among items, capturing more complicated effects in
user decision-making. For example, it can differentiate which historical
itemsets in a user's profile are more important in affecting the user to make a
purchase decision on an item. We treat this solution as a deep variant of ICF,
thus term it as DeepICF. To justify our proposal, we perform empirical studies
on two public datasets from MovieLens and Pinterest. Extensive experiments
verify the highly positive effect of higher-order item interaction modeling
with nonlinear neural networks. Moreover, we demonstrate that by more
fine-grained second-order interaction modeling with attention network, the
performance of our DeepICF method can be further improved.Comment: 25 pages, submitted to TOI
Discrete Factorization Machines for Fast Feature-based Recommendation
User and item features of side information are crucial for accurate
recommendation. However, the large number of feature dimensions, e.g., usually
larger than 10^7, results in expensive storage and computational cost. This
prohibits fast recommendation especially on mobile applications where the
computational resource is very limited. In this paper, we develop a generic
feature-based recommendation model, called Discrete Factorization Machine
(DFM), for fast and accurate recommendation. DFM binarizes the real-valued
model parameters (e.g., float32) of every feature embedding into binary codes
(e.g., boolean), and thus supports efficient storage and fast user-item score
computation. To avoid the severe quantization loss of the binarization, we
propose a convergent updating rule that resolves the challenging discrete
optimization of DFM. Through extensive experiments on two real-world datasets,
we show that 1) DFM consistently outperforms state-of-the-art binarized
recommendation models, and 2) DFM shows very competitive performance compared
to its real-valued version (FM), demonstrating the minimized quantization loss.
This work is accepted by IJCAI 2018.Comment: Appeared in IJCAI 201
Outer Product-based Neural Collaborative Filtering
In this work, we contribute a new multi-layer neural network architecture
named ONCF to perform collaborative filtering. The idea is to use an outer
product to explicitly model the pairwise correlations between the dimensions of
the embedding space. In contrast to existing neural recommender models that
combine user embedding and item embedding via a simple concatenation or
element-wise product, our proposal of using outer product above the embedding
layer results in a two-dimensional interaction map that is more expressive and
semantically plausible. Above the interaction map obtained by outer product, we
propose to employ a convolutional neural network to learn high-order
correlations among embedding dimensions. Extensive experiments on two public
implicit feedback data demonstrate the effectiveness of our proposed ONCF
framework, in particular, the positive effect of using outer product to model
the correlations between embedding dimensions in the low level of multi-layer
neural recommender model. The experiment codes are available at:
https://github.com/duxy-me/ConvNCFComment: IJCAI 201
TransNets: Learning to Transform for Recommendation
Recently, deep learning methods have been shown to improve the performance of
recommender systems over traditional methods, especially when review text is
available. For example, a recent model, DeepCoNN, uses neural nets to learn one
latent representation for the text of all reviews written by a target user, and
a second latent representation for the text of all reviews for a target item,
and then combines these latent representations to obtain state-of-the-art
performance on recommendation tasks. We show that (unsurprisingly) much of the
predictive value of review text comes from reviews of the target user for the
target item. We then introduce a way in which this information can be used in
recommendation, even when the target user's review for the target item is not
available. Our model, called TransNets, extends the DeepCoNN model by
introducing an additional latent layer representing the target user-target item
pair. We then regularize this layer, at training time, to be similar to another
latent representation of the target user's review of the target item. We show
that TransNets and extensions of it improve substantially over the previous
state-of-the-art.Comment: Accepted for publication in the 11th ACM Conference on Recommender
Systems (RecSys 2017
Adversarial Personalized Ranking for Recommendation
Item recommendation is a personalized ranking task. To this end, many
recommender systems optimize models with pairwise ranking objectives, such as
the Bayesian Personalized Ranking (BPR). Using matrix Factorization (MF) ---
the most widely used model in recommendation --- as a demonstration, we show
that optimizing it with BPR leads to a recommender model that is not robust. In
particular, we find that the resultant model is highly vulnerable to
adversarial perturbations on its model parameters, which implies the possibly
large error in generalization.
To enhance the robustness of a recommender model and thus improve its
generalization performance, we propose a new optimization framework, namely
Adversarial Personalized Ranking (APR). In short, our APR enhances the pairwise
ranking method BPR by performing adversarial training. It can be interpreted as
playing a minimax game, where the minimization of the BPR objective function
meanwhile defends an adversary, which adds adversarial perturbations on model
parameters to maximize the BPR objective function. To illustrate how it works,
we implement APR on MF by adding adversarial perturbations on the embedding
vectors of users and items. Extensive experiments on three public real-world
datasets demonstrate the effectiveness of APR --- by optimizing MF with APR, it
outperforms BPR with a relative improvement of 11.2% on average and achieves
state-of-the-art performance for item recommendation. Our implementation is
available at: https://github.com/hexiangnan/adversarial_personalized_ranking.Comment: SIGIR 201