7 research outputs found

    On the instability of embeddings for recommender systems: the case of Matrix Factorization

    Get PDF
    Most state-of-the-art top-N collaborative recommender systems work by learning embeddings to jointly represent users and items. Learned embeddings are considered to be effective to solve a variety of tasks. Among others, providing and explaining recommendations. In this paper we question the reliability of the embeddings learned by Matrix Factorization (MF). We empirically demonstrate that, by simply changing the initial values assigned to the latent factors, the same MF method generates very different embeddings of items and users, and we highlight that this effect is stronger for less popular items. To overcome these drawbacks, we present a generalization of MF, called Nearest Neighbors Matrix Factorization (NNMF). The new method propagates the information about items and users to their neighbors, speeding up the training procedure and extending the amount of information that supports recommendations and representations. We describe the NNMF variants of three common MF approaches, and with extensive experiments on five different datasets we show that they strongly mitigate the instability issues of the original MF versions and they improve the accuracy of recommendations on the long-tail

    A novel graph-based model for hybrid recommendations in cold-start scenarios

    No full text
    Cold-start is a very common and still open problem in the Recommender Systems literature. Since cold start items do not have any interaction, collaborative algorithms are not applicable. One of the main strategies is to use pure or hybrid content-based approaches, which usually yield to lower recommendation quality than collaborative ones. Some techniques to optimize performance of this type of approaches have been studied in recent past. One of them is called feature weighting, which assigns to every feature a real value, called weight, that estimates its importance. Statistical techniques for feature weighting commonly used in Information Retrieval, like TF-IDF, have been adapted for Recommender Systems, but they often do not provide sufficient quality improvements. More recent approaches, FBSM and LFW, estimate weights by leveraging collaborative information via machine learning, in order to learn the importance of a feature based on other users opinions. This type of models have shown promising results compared to classic statistical analyzes cited previously. We propose a novel graph, feature-based machine learning model to face the cold-start item scenario, learning the relevance of features from probabilities of item-based collaborative filtering algorithms

    Analyzing and improving stability of matrix factorization for recommender systems

    No full text
    Thanks to their flexibility and scalability, collaborative embedding-based models are widely employed for the top-N recommendation task. Their goal is to jointly represent users and items in a common low-dimensional embedding space where users are represented close to items for which they expressed a positive preference. The training procedure of these techniques is influenced by several sources of randomness, that can have a strong impact on the embeddings learned by the models. In this paper we analyze this impact on Matrix Factorization (MF). In particular, we focus on the effects of training the same model on the same data, but with different initial values for the latent representations of users and items. We perform several experiments employing three well known MF implementations over five datasets. We show that different random initializations lead the same MF technique to generate very different latent representations and recommendation lists. We refer to these inconsistencies as instability of representations and instability of recommendations, respectively. We report that stability of item representations is positively correlated to the accuracy of the model. We show that the stability issues affect also the items for which the recommender correctly predicts positive preferences. Moreover, we highlight that the effect is stronger for less popular items. To overcome these drawbacks, we present a generalization of MF called Nearest Neighbors Matrix Factorization (NNMF). The new framework learns the embedding of each user and item as a weighted linear combination of the representations of the respective nearest neighbors. This strategy has the effect to propagate the information about items and users also to their neighbors and allows the embeddings of users and items with few interactions to be supported by a higher amount of information. To empirically demonstrate the advantages of the new framework, we provide a detailed description of the NNMF variants of three common MF techniques. We show that NNMF models, compared to their MF counterparts, largely improve the stability of both representations and recommendations, obtain a higher and more stable accuracy performance, especially on long-tail items, and reach convergence in a fraction of epochs

    Lightweight and Scalable Model for Tweet Engagements Predictions in a Resource-constrained Environment

    Get PDF
    In this paper we provide an overview of the approach we used as team Trial&Error for the ACM RecSys Challenge 2021. The competition, organized by Twitter, addresses the problem of predicting different categories of user engagements (Like, Reply, Retweet and Retweet with Comment), given a dataset of previous interactions on the Twitter platform. Our proposed method relies on efficiently leveraging the massive amount of data, crafting a wide variety of features and designing a lightweight solution. This results in a significant reduction of computational resources requirements, both during the training and inference phase. The final model, an optimized LightGBM, allowed our team to reach the 4th position in the final leaderboard and to rank 1st among the academic teams

    From Data Analysis to Intent-based Recommendation: an Industrial Case Study in the Video Domain

    Get PDF
    This work presents a comprehensive study, from an industrial perspective, of the process between the collection of raw data, and the generation of next-item recommendation, in the domain of Video-on-Demand (VoD). Most research papers focus their efforts on analyzing recommender systems on already-processed datasets, but they do not face the same challenges that occur naturally in industry, e.g., processing raw interactions logs to create datasets for testing. This paper describes the whole process between data collection and recommendation, including cleaning, processing, feature engineering, session inferring, and all the challenges that a dataset provided by an industrial player in the domain posed. Then, a comparison on the new dataset of several intent-based recommendation techniques in the next-item recommendation task follows, studying the impact of different factors like the session length, and the number of previous sessions available for a user. The results show that taking advantage of the sequential data available in the dataset benefits recommendation quality, since deep learning algorithms for session-aware recommendation are consistently the most accurate recommenders. Lastly, a summary of the different challenges in the VoD domain is proposed, discussing on the best algorithmic solutions found, and proposing future research directions to be conducted based on the results obtained
    corecore