180 research outputs found

    AutoSVD++: An Efficient Hybrid Collaborative Filtering Model via Contractive Auto-encoders

    Full text link
    Collaborative filtering (CF) has been successfully used to provide users with personalized products and services. However, dealing with the increasing sparseness of user-item matrix still remains a challenge. To tackle such issue, hybrid CF such as combining with content based filtering and leveraging side information of users and items has been extensively studied to enhance performance. However, most of these approaches depend on hand-crafted feature engineering, which are usually noise-prone and biased by different feature extraction and selection schemes. In this paper, we propose a new hybrid model by generalizing contractive auto-encoder paradigm into matrix factorization framework with good scalability and computational efficiency, which jointly model content information as representations of effectiveness and compactness, and leverage implicit user feedback to make accurate recommendations. Extensive experiments conducted over three large scale real datasets indicate the proposed approach outperforms the compared methods for item recommendation.Comment: 4 pages, 3 figure

    Benchmarking news recommendations: the CLEF NewsREEL use case

    Get PDF
    The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016

    Why Do Cascade Sizes Follow a Power-Law?

    Full text link
    We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201

    False Discovery Rate Controlled Heterogeneous Treatment Effect Detection for Online Controlled Experiments

    Full text link
    Online controlled experiments (a.k.a. A/B testing) have been used as the mantra for data-driven decision making on feature changing and product shipping in many Internet companies. However, it is still a great challenge to systematically measure how every code or feature change impacts millions of users with great heterogeneity (e.g. countries, ages, devices). The most commonly used A/B testing framework in many companies is based on Average Treatment Effect (ATE), which cannot detect the heterogeneity of treatment effect on users with different characteristics. In this paper, we propose statistical methods that can systematically and accurately identify Heterogeneous Treatment Effect (HTE) of any user cohort of interest (e.g. mobile device type, country), and determine which factors (e.g. age, gender) of users contribute to the heterogeneity of the treatment effect in an A/B test. By applying these methods on both simulation data and real-world experimentation data, we show how they work robustly with controlled low False Discover Rate (FDR), and at the same time, provides us with useful insights about the heterogeneity of identified user groups. We have deployed a toolkit based on these methods, and have used it to measure the Heterogeneous Treatment Effect of many A/B tests at Snap

    Affinity Uncertainty-based Hard Negative Mining in Graph Contrastive Learning

    Full text link
    Hard negative mining has shown effective in enhancing self-supervised contrastive learning (CL) on diverse data types, including graph CL (GCL). The existing hardness-aware CL methods typically treat negative instances that are most similar to the anchor instance as hard negatives, which helps improve the CL performance, especially on image data. However, this approach often fails to identify the hard negatives but leads to many false negatives on graph data. This is mainly due to that the learned graph representations are not sufficiently discriminative due to oversmooth representations and/or non-independent and identically distributed (non-i.i.d.) issues in graph data. To tackle this problem, this article proposes a novel approach that builds a discriminative model on collective affinity information (i.e., two sets of pairwise affinities between the negative instances and the anchor instance) to mine hard negatives in GCL. In particular, the proposed approach evaluates how confident/uncertain the discriminative model is about the affinity of each negative instance to an anchor instance to determine its hardness weight relative to the anchor instance. This uncertainty information is then incorporated into the existing GCL loss functions via a weighting term to enhance their performance. The enhanced GCL is theoretically grounded that the resulting GCL loss is equivalent to a triplet loss with an adaptive margin being exponentially proportional to the learned uncertainty of each negative instance. Extensive experiments on ten graph datasets show that our approach does the following: 1) consistently enhances different state-of-the-art (SOTA) GCL methods in both graph and node classification tasks and 2) significantly improves their robustness against adversarial attacks. Code is available at https://github.com/mala-lab/AUGCL.Comment: Accepted to TNNL
    • …