3 research outputs found
An adversarial imitation click model for information retrieval
Modern information retrieval systems, including web search, ads placement, and recommender systems, typically rely on learning from user feedback. Click models, which study how users interact with a ranked list of items, provide a useful understanding of user feedback for learning ranking models. Constructing "right"dependencies is the key of any successful click model. However, probabilistic graphical models (PGMs) have to rely on manually assigned dependencies, and oversimplify user behaviors. Existing neural network based methods promote PGMs by enhancing the expressive ability and allowing flexible dependencies, but still suffer from exposure bias and inferior estimation. In this paper, we propose a novel framework, Adversarial Imitation Click Model (AICM), based on imitation learning. Firstly, we explicitly learn the reward function that recovers users' intrinsic utility and underlying intentions. Secondly, we model user interactions with a ranked list as a dynamic system instead of one-step click prediction, alleviating the exposure bias problem. Finally, we minimize the JS divergence through adversarial training and learn a stable distribution of click sequences, which makes AICM generalize well across different distributions of ranked lists. A theoretical analysis has indicated that AICM reduces the exposure bias from O(T2) to O(T). Our studies on a public web search dataset show that AICM not only outperforms state-of-the-art models in traditional click metrics but also achieves superior performance in addressing the exposure bias and recovering the underlying patterns of click sequences
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Ranking interfaces are everywhere in online platforms. There is thus an ever
growing interest in their Off-Policy Evaluation (OPE), aiming towards an
accurate performance evaluation of ranking policies using logged data. A
de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides
an unbiased and consistent value estimate. However, it becomes extremely
inaccurate in the ranking setup due to its high variance under large action
spaces. To deal with this problem, previous studies assume either independent
or cascade user behavior, resulting in some ranking versions of IPS. While
these estimators are somewhat effective in reducing the variance, all existing
estimators apply a single universal assumption to every user, causing excessive
bias and variance. Therefore, this work explores a far more general formulation
where user behavior is diverse and can vary depending on the user context. We
show that the resulting estimator, which we call Adaptive IPS (AIPS), can be
unbiased under any complex user behavior. Moreover, AIPS achieves the minimum
variance among all unbiased estimators based on IPS. We further develop a
procedure to identify the appropriate user behavior model to minimize the mean
squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments
demonstrate that the empirical accuracy improvement can be significant,
enabling effective OPE of ranking systems even under diverse user behavior.Comment: KDD2023 Research trac
Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems
Recommender systems are expected to be assistants that help human users find
relevant information automatically without explicit queries. As recommender
systems evolve, increasingly sophisticated learning techniques are applied and
have achieved better performance in terms of user engagement metrics such as
clicks and browsing time. The increase in the measured performance, however,
can have two possible attributions: a better understanding of user preferences,
and a more proactive ability to utilize human bounded rationality to seduce
user over-consumption. A natural following question is whether current
recommendation algorithms are manipulating user preferences. If so, can we
measure the manipulation level? In this paper, we present a general framework
for benchmarking the degree of manipulations of recommendation algorithms, in
both slate recommendation and sequential recommendation scenarios. The
framework consists of four stages, initial preference calculation, training
data collection, algorithm training and interaction, and metrics calculation
that involves two proposed metrics. We benchmark some representative
recommendation algorithms in both synthetic and real-world datasets under the
proposed framework. We have observed that a high online click-through rate does
not necessarily mean a better understanding of user initial preference, but
ends in prompting users to choose more documents they initially did not favor.
Moreover, we find that the training data have notable impacts on the
manipulation degrees, and algorithms with more powerful modeling abilities are
more sensitive to such impacts. The experiments also verified the usefulness of
the proposed metrics for measuring the degree of manipulations. We advocate
that future recommendation algorithm studies should be treated as an
optimization problem with constrained user preference manipulations.Comment: 33 pages, 11 figures, 4 tables, ACM Transactions on Information
System