TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in
  CTR Prediction at Kuaishou

Chang, Jianxin; Fu, Zhiyi; Gai, Kun; Guan, Lin; Hui, Yiqun; Leng, Dewei; Lu, Jing; Niu, Yanan; Song, Yang; Zang, Xiaoxue; Zhang, Chenbin

TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Authors: Jianxin Chang
Zhiyi Fu
Kun Gai
Lin Guan
Yiqun Hui
Dewei Leng
Jing Lu
Yanan Niu
Yang Song
Xiaoxue Zang
Chenbin Zhang
Publication date: 26 June 2023
Publisher

Abstract

Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length

10^2

to length

10^4-10^5

, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.Comment: Accepted by KDD 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.02352

Last time updated on 02/03/2023