1,174 research outputs found

    Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline

    Full text link
    Recommender systems constitute the core engine of most social network platforms nowadays, aiming to maximize user satisfaction along with other key business objectives. Twitter is no exception. Despite the fact that Twitter data has been extensively used to understand socioeconomic and political phenomena and user behaviour, the implicit feedback provided by users on Tweets through their engagements on the Home Timeline has only been explored to a limited extent. At the same time, there is a lack of large-scale public social network datasets that would enable the scientific community to both benchmark and build more powerful and comprehensive models that tailor content to user interests. By releasing an original dataset of 160 million Tweets along with engagement information, Twitter aims to address exactly that. During this release, special attention is drawn on maintaining compliance with existing privacy laws. Apart from user privacy, this paper touches on the key challenges faced by researchers and professionals striving to predict user engagements. It further describes the key aspects of the RecSys 2020 Challenge that was organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table

    Graph Convolutional Neural Networks for Web-Scale Recommender Systems

    Full text link
    Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.Comment: KDD 201

    Efficient Graph based Recommender System with Weighted Averaging of Messages

    Full text link
    We showcase a novel solution to a recommendation system problem where we face a perpetual soft item cold start issue. Our system aims to recommend demanded products to prospective sellers for listing in Amazon stores. These products always have only few interactions thereby giving rise to a perpetual soft item cold start situation. Modern collaborative filtering methods solve cold start using content attributes and exploit the existing implicit signals from warm start items. This approach fails in our use-case since our entire item set faces cold start issue always. Our Product Graph has over 500 Million nodes and over 5 Billion edges which makes training and inference using modern graph algorithms very compute intensive. To overcome these challenges we propose a system which reduces the dataset size and employs an improved modelling technique to reduce storage and compute without loss in performance. Particularly, we reduce our graph size using a filtering technique and then exploit this reduced product graph using Weighted Averaging of Messages over Layers (WAML) algorithm. WAML simplifies training on large graphs and improves over previous methods by reducing compute time to 1/7 of LightGCN and 1/26 of Graph Attention Network (GAT) and increasing recall@100@100 by 66% over LightGCN and 2.3x over GAT.Comment: Accepted to The Second International Conference on AI-ML Systems (AIMLSystems, October 12-15, 2022

    Semantic data mining and linked data for a recommender system in the AEC industry

    Get PDF
    Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations

    Hessian-aware Quantized Node Embeddings for Recommendation

    Full text link
    Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods compress user/item representations into the binary embedding space to reduce space requirements and accelerate inference. Also, they use the Straight-through Estimator (STE) to prevent vanishing gradients during back-propagation. However, the STE often causes the gradient mismatch problem, leading to sub-optimal results. In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an effective solution for discrete representations of users/items that enable fast retrieval. HQ-GNN is composed of two components: a GNN encoder for learning continuous node embeddings and a quantized module for compressing full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from both lower memory requirements and faster inference speeds compared to vanilla GNNs. To address the gradient mismatch problem in STE, we further consider the quantized errors and its second-order derivatives for better stability. The experimental results on several large-scale datasets show that HQ-GNN achieves a good balance between latency and performance

    Regression and Learning to Rank Aggregation for User Engagement Evaluation

    Full text link
    User engagement refers to the amount of interaction an instance (e.g., tweet, news, and forum post) achieves. Ranking the items in social media websites based on the amount of user participation in them, can be used in different applications, such as recommender systems. In this paper, we consider a tweet containing a rating for a movie as an instance and focus on ranking the instances of each user based on their engagement, i.e., the total number of retweets and favorites it will gain. For this task, we define several features which can be extracted from the meta-data of each tweet. The features are partitioned into three categories: user-based, movie-based, and tweet-based. We show that in order to obtain good results, features from all categories should be considered. We exploit regression and learning to rank methods to rank the tweets and propose to aggregate the results of regression and learning to rank methods to achieve better performance. We have run our experiments on an extended version of MovieTweeting dataset provided by ACM RecSys Challenge 2014. The results show that learning to rank approach outperforms most of the regression models and the combination can improve the performance significantly.Comment: In Proceedings of the 2014 ACM Recommender Systems Challenge, RecSysChallenge '1

    Discrete Factorization Machines for Fast Feature-based Recommendation

    Full text link
    User and item features of side information are crucial for accurate recommendation. However, the large number of feature dimensions, e.g., usually larger than 10^7, results in expensive storage and computational cost. This prohibits fast recommendation especially on mobile applications where the computational resource is very limited. In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation. DFM binarizes the real-valued model parameters (e.g., float32) of every feature embedding into binary codes (e.g., boolean), and thus supports efficient storage and fast user-item score computation. To avoid the severe quantization loss of the binarization, we propose a convergent updating rule that resolves the challenging discrete optimization of DFM. Through extensive experiments on two real-world datasets, we show that 1) DFM consistently outperforms state-of-the-art binarized recommendation models, and 2) DFM shows very competitive performance compared to its real-valued version (FM), demonstrating the minimized quantization loss. This work is accepted by IJCAI 2018.Comment: Appeared in IJCAI 201
    corecore