180 research outputs found
AutoSVD++: An Efficient Hybrid Collaborative Filtering Model via Contractive Auto-encoders
Collaborative filtering (CF) has been successfully used to provide users with
personalized products and services. However, dealing with the increasing
sparseness of user-item matrix still remains a challenge. To tackle such issue,
hybrid CF such as combining with content based filtering and leveraging side
information of users and items has been extensively studied to enhance
performance. However, most of these approaches depend on hand-crafted feature
engineering, which are usually noise-prone and biased by different feature
extraction and selection schemes. In this paper, we propose a new hybrid model
by generalizing contractive auto-encoder paradigm into matrix factorization
framework with good scalability and computational efficiency, which jointly
model content information as representations of effectiveness and compactness,
and leverage implicit user feedback to make accurate recommendations. Extensive
experiments conducted over three large scale real datasets indicate the
proposed approach outperforms the compared methods for item recommendation.Comment: 4 pages, 3 figure
Benchmarking news recommendations: the CLEF NewsREEL use case
The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016
Why Do Cascade Sizes Follow a Power-Law?
We introduce random directed acyclic graph and use it to model the
information diffusion network. Subsequently, we analyze the cascade generation
model (CGM) introduced by Leskovec et al. [19]. Until now only empirical
studies of this model were done. In this paper, we present the first
theoretical proof that the sizes of cascades generated by the CGM follow the
power-law distribution, which is consistent with multiple empirical analysis of
the large social networks. We compared the assumptions of our model with the
Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201
False Discovery Rate Controlled Heterogeneous Treatment Effect Detection for Online Controlled Experiments
Online controlled experiments (a.k.a. A/B testing) have been used as the
mantra for data-driven decision making on feature changing and product shipping
in many Internet companies. However, it is still a great challenge to
systematically measure how every code or feature change impacts millions of
users with great heterogeneity (e.g. countries, ages, devices). The most
commonly used A/B testing framework in many companies is based on Average
Treatment Effect (ATE), which cannot detect the heterogeneity of treatment
effect on users with different characteristics. In this paper, we propose
statistical methods that can systematically and accurately identify
Heterogeneous Treatment Effect (HTE) of any user cohort of interest (e.g.
mobile device type, country), and determine which factors (e.g. age, gender) of
users contribute to the heterogeneity of the treatment effect in an A/B test.
By applying these methods on both simulation data and real-world
experimentation data, we show how they work robustly with controlled low False
Discover Rate (FDR), and at the same time, provides us with useful insights
about the heterogeneity of identified user groups. We have deployed a toolkit
based on these methods, and have used it to measure the Heterogeneous Treatment
Effect of many A/B tests at Snap
Affinity Uncertainty-based Hard Negative Mining in Graph Contrastive Learning
Hard negative mining has shown effective in enhancing self-supervised
contrastive learning (CL) on diverse data types, including graph CL (GCL). The
existing hardness-aware CL methods typically treat negative instances that are
most similar to the anchor instance as hard negatives, which helps improve the
CL performance, especially on image data. However, this approach often fails to
identify the hard negatives but leads to many false negatives on graph data.
This is mainly due to that the learned graph representations are not
sufficiently discriminative due to oversmooth representations and/or
non-independent and identically distributed (non-i.i.d.) issues in graph data.
To tackle this problem, this article proposes a novel approach that builds a
discriminative model on collective affinity information (i.e., two sets of
pairwise affinities between the negative instances and the anchor instance) to
mine hard negatives in GCL. In particular, the proposed approach evaluates how
confident/uncertain the discriminative model is about the affinity of each
negative instance to an anchor instance to determine its hardness weight
relative to the anchor instance. This uncertainty information is then
incorporated into the existing GCL loss functions via a weighting term to
enhance their performance. The enhanced GCL is theoretically grounded that the
resulting GCL loss is equivalent to a triplet loss with an adaptive margin
being exponentially proportional to the learned uncertainty of each negative
instance. Extensive experiments on ten graph datasets show that our approach
does the following: 1) consistently enhances different state-of-the-art (SOTA)
GCL methods in both graph and node classification tasks and 2) significantly
improves their robustness against adversarial attacks. Code is available at
https://github.com/mala-lab/AUGCL.Comment: Accepted to TNNL
- …