1,174 research outputs found
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline
Recommender systems constitute the core engine of most social network
platforms nowadays, aiming to maximize user satisfaction along with other key
business objectives. Twitter is no exception. Despite the fact that Twitter
data has been extensively used to understand socioeconomic and political
phenomena and user behaviour, the implicit feedback provided by users on Tweets
through their engagements on the Home Timeline has only been explored to a
limited extent. At the same time, there is a lack of large-scale public social
network datasets that would enable the scientific community to both benchmark
and build more powerful and comprehensive models that tailor content to user
interests. By releasing an original dataset of 160 million Tweets along with
engagement information, Twitter aims to address exactly that. During this
release, special attention is drawn on maintaining compliance with existing
privacy laws. Apart from user privacy, this paper touches on the key challenges
faced by researchers and professionals striving to predict user engagements. It
further describes the key aspects of the RecSys 2020 Challenge that was
organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Recent advancements in deep neural networks for graph-structured data have
led to state-of-the-art performance on recommender system benchmarks. However,
making these methods practical and scalable to web-scale recommendation tasks
with billions of items and hundreds of millions of users remains a challenge.
Here we describe a large-scale deep recommendation engine that we developed and
deployed at Pinterest. We develop a data-efficient Graph Convolutional Network
(GCN) algorithm PinSage, which combines efficient random walks and graph
convolutions to generate embeddings of nodes (i.e., items) that incorporate
both graph structure as well as node feature information. Compared to prior GCN
approaches, we develop a novel method based on highly efficient random walks to
structure the convolutions and design a novel training strategy that relies on
harder-and-harder training examples to improve robustness and convergence of
the model. We also develop an efficient MapReduce model inference algorithm to
generate embeddings using a trained model. We deploy PinSage at Pinterest and
train it on 7.5 billion examples on a graph with 3 billion nodes representing
pins and boards, and 18 billion edges. According to offline metrics, user
studies and A/B tests, PinSage generates higher-quality recommendations than
comparable deep learning and graph-based alternatives. To our knowledge, this
is the largest application of deep graph embeddings to date and paves the way
for a new generation of web-scale recommender systems based on graph
convolutional architectures.Comment: KDD 201
Efficient Graph based Recommender System with Weighted Averaging of Messages
We showcase a novel solution to a recommendation system problem where we face
a perpetual soft item cold start issue. Our system aims to recommend demanded
products to prospective sellers for listing in Amazon stores. These products
always have only few interactions thereby giving rise to a perpetual soft item
cold start situation. Modern collaborative filtering methods solve cold start
using content attributes and exploit the existing implicit signals from warm
start items. This approach fails in our use-case since our entire item set
faces cold start issue always. Our Product Graph has over 500 Million nodes and
over 5 Billion edges which makes training and inference using modern graph
algorithms very compute intensive. To overcome these challenges we propose a
system which reduces the dataset size and employs an improved modelling
technique to reduce storage and compute without loss in performance.
Particularly, we reduce our graph size using a filtering technique and then
exploit this reduced product graph using Weighted Averaging of Messages over
Layers (WAML) algorithm. WAML simplifies training on large graphs and improves
over previous methods by reducing compute time to 1/7 of LightGCN and 1/26 of
Graph Attention Network (GAT) and increasing recall by 66% over LightGCN
and 2.3x over GAT.Comment: Accepted to The Second International Conference on AI-ML Systems
(AIMLSystems, October 12-15, 2022
Semantic data mining and linked data for a recommender system in the AEC industry
Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations
Hessian-aware Quantized Node Embeddings for Recommendation
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in
recommender systems. Nevertheless, the process of searching and ranking from a
large item corpus usually requires high latency, which limits the widespread
deployment of GNNs in industry-scale applications. To address this issue, many
methods compress user/item representations into the binary embedding space to
reduce space requirements and accelerate inference. Also, they use the
Straight-through Estimator (STE) to prevent vanishing gradients during
back-propagation. However, the STE often causes the gradient mismatch problem,
leading to sub-optimal results.
In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an
effective solution for discrete representations of users/items that enable fast
retrieval. HQ-GNN is composed of two components: a GNN encoder for learning
continuous node embeddings and a quantized module for compressing
full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from
both lower memory requirements and faster inference speeds compared to vanilla
GNNs. To address the gradient mismatch problem in STE, we further consider the
quantized errors and its second-order derivatives for better stability. The
experimental results on several large-scale datasets show that HQ-GNN achieves
a good balance between latency and performance
Regression and Learning to Rank Aggregation for User Engagement Evaluation
User engagement refers to the amount of interaction an instance (e.g., tweet,
news, and forum post) achieves. Ranking the items in social media websites
based on the amount of user participation in them, can be used in different
applications, such as recommender systems. In this paper, we consider a tweet
containing a rating for a movie as an instance and focus on ranking the
instances of each user based on their engagement, i.e., the total number of
retweets and favorites it will gain.
For this task, we define several features which can be extracted from the
meta-data of each tweet. The features are partitioned into three categories:
user-based, movie-based, and tweet-based. We show that in order to obtain good
results, features from all categories should be considered. We exploit
regression and learning to rank methods to rank the tweets and propose to
aggregate the results of regression and learning to rank methods to achieve
better performance. We have run our experiments on an extended version of
MovieTweeting dataset provided by ACM RecSys Challenge 2014. The results show
that learning to rank approach outperforms most of the regression models and
the combination can improve the performance significantly.Comment: In Proceedings of the 2014 ACM Recommender Systems Challenge,
RecSysChallenge '1
Discrete Factorization Machines for Fast Feature-based Recommendation
User and item features of side information are crucial for accurate
recommendation. However, the large number of feature dimensions, e.g., usually
larger than 10^7, results in expensive storage and computational cost. This
prohibits fast recommendation especially on mobile applications where the
computational resource is very limited. In this paper, we develop a generic
feature-based recommendation model, called Discrete Factorization Machine
(DFM), for fast and accurate recommendation. DFM binarizes the real-valued
model parameters (e.g., float32) of every feature embedding into binary codes
(e.g., boolean), and thus supports efficient storage and fast user-item score
computation. To avoid the severe quantization loss of the binarization, we
propose a convergent updating rule that resolves the challenging discrete
optimization of DFM. Through extensive experiments on two real-world datasets,
we show that 1) DFM consistently outperforms state-of-the-art binarized
recommendation models, and 2) DFM shows very competitive performance compared
to its real-valued version (FM), demonstrating the minimized quantization loss.
This work is accepted by IJCAI 2018.Comment: Appeared in IJCAI 201
- …