102,680 research outputs found
Graph Contrastive Learning with Multi-Objective for Personalized Product Retrieval in Taobao Search
In e-commerce search, personalized retrieval is a crucial technique for
improving user shopping experience. Recent works in this domain have achieved
significant improvements by the representation learning paradigm, e.g.,
embedding-based retrieval (EBR) and collaborative filtering (CF). EBR methods
do not sufficiently exploit the useful collaborative signal and are difficult
to learn the representations of long-tail item well. Graph-based CF methods
improve personalization by modeling collaborative signal within the user click
graph. However, existing Graph-based methods ignore user's multiple behaviours,
such as click/purchase and the relevance constraint between user behaviours and
items.In this paper, we propose a Graph Contrastive Learning with
Multi-Objective (GCL-MO) collaborative filtering model, which solves the
problems of weak relevance and incomplete personalization in e-commerce search.
Specifically, GCL-MO builds a homogeneous graph of items and then optimizes a
multi-objective function of personalization and relevance. Moreover, we propose
a modified contrastive loss for multi-objectives graph learning, which avoids
the mutual suppression among positive samples and thus improves the
generalization and robustness of long-tail item representations. These learned
item embeddings are then used for personalized retrieval by constructing an
efficient offline-to-online inverted table. GCL-MO outperforms the online
collaborative filtering baseline in both offline/online experimental metrics
and shows a significant improvement in the online A/B testing of Taobao search
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
We develop a learning principle and an efficient algorithm for batch learning
from logged bandit feedback. This learning setting is ubiquitous in online
systems (e.g., ad placement, web search, recommendation), where an algorithm
makes a prediction (e.g., ad ranking) for a given input (e.g., query) and
observes bandit feedback (e.g., user clicks on presented ads). We first address
the counterfactual nature of the learning problem through propensity scoring.
Next, we prove generalization error bounds that account for the variance of the
propensity-weighted empirical risk estimator. These constructive bounds give
rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM
can be used to derive a new learning method -- called Policy Optimizer for
Exponential Models (POEM) -- for learning stochastic linear rules for
structured output prediction. We present a decomposition of the POEM objective
that enables efficient stochastic gradient optimization. POEM is evaluated on
several multi-label classification problems showing substantially improved
robustness and generalization performance compared to the state-of-the-art.Comment: 10 page
- …