663 research outputs found
UFIN: Universal Feature Interaction Network for Multi-Domain Click-Through Rate Prediction
Click-Through Rate (CTR) prediction, which aims to estimate the probability
of a user clicking on an item, is a key task in online advertising. Numerous
existing CTR models concentrate on modeling the feature interactions within a
solitary domain, thereby rendering them inadequate for fulfilling the
requisites of multi-domain recommendations in real industrial scenarios. Some
recent approaches propose intricate architectures to enhance knowledge sharing
and augment model training across multiple domains. However, these approaches
encounter difficulties when being transferred to new recommendation domains,
owing to their reliance on the modeling of ID features (e.g., item id). To
address the above issue, we propose the Universal Feature Interaction Network
(UFIN) approach for CTR prediction. UFIN exploits textual data to learn
universal feature interactions that can be effectively transferred across
diverse domains. For learning universal feature representations, we regard the
text and feature as two different modalities and propose an encoder-decoder
network founded on a Large Language Model (LLM) to enforce the transfer of data
from the text modality to the feature modality. Building upon the above
foundation, we further develop a mixtureof-experts (MoE) enhanced adaptive
feature interaction model to learn transferable collaborative patterns across
multiple domains. Furthermore, we propose a multi-domain knowledge distillation
framework to enhance feature interaction learning. Based on the above methods,
UFIN can effectively bridge the semantic gap to learn common knowledge across
various domains, surpassing the constraints of ID-based models. Extensive
experiments conducted on eight datasets show the effectiveness of UFIN, in both
multidomain and cross-platform settings. Our code is available at
https://github.com/RUCAIBox/UFIN
Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]
The field of urban spatial-temporal prediction is advancing rapidly with the
development of deep learning techniques and the availability of large-scale
datasets. However, challenges persist in accessing and utilizing diverse urban
spatial-temporal datasets from different sources and stored in different
formats, as well as determining effective model structures and components with
the proliferation of deep learning models. This work addresses these challenges
and provides three significant contributions. Firstly, we introduce "atomic
files", a unified storage format designed for urban spatial-temporal big data,
and validate its effectiveness on 40 diverse datasets, simplifying data
management. Secondly, we present a comprehensive overview of technological
advances in urban spatial-temporal prediction models, guiding the development
of robust models. Thirdly, we conduct extensive experiments using diverse
models and datasets, establishing a performance leaderboard and identifying
promising research directions. Overall, this work effectively manages urban
spatial-temporal data, guides future efforts, and facilitates the development
of accurate and efficient urban spatial-temporal prediction models. It can
potentially make long-term contributions to urban spatial-temporal data
management and prediction, ultimately leading to improved urban living
standards.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:2304.1434
EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction
Learning effective high-order feature interactions is very crucial in the CTR
prediction task. However, it is very time-consuming to calculate high-order
feature interactions with massive features in online e-commerce platforms. Most
existing methods manually design a maximal order and further filter out the
useless interactions from them. Although they reduce the high computational
costs caused by the exponential growth of high-order feature combinations, they
still suffer from the degradation of model capability due to the suboptimal
learning of the restricted feature orders. The solution to maintain the model
capability and meanwhile keep it efficient is a technical challenge, which has
not been adequately addressed. To address this issue, we propose an adaptive
feature interaction learning model, named as EulerNet, in which the feature
interactions are learned in a complex vector space by conducting space mapping
according to Euler's formula. EulerNet converts the exponential powers of
feature interactions into simple linear combinations of the modulus and phase
of the complex features, making it possible to adaptively learn the high-order
feature interactions in an efficient way. Furthermore, EulerNet incorporates
the implicit and explicit feature interactions into a unified architecture,
which achieves the mutual enhancement and largely boosts the model
capabilities. Such a network can be fully learned from data, with no need of
pre-designed form or order for feature interactions. Extensive experiments
conducted on three public datasets have demonstrated the effectiveness and
efficiency of our approach. Our code is available at:
https://github.com/RUCAIBox/EulerNet.Comment: 10 pages, 7 figures, accepted for publication in SIGIR'2
Dense Text Retrieval based on Pretrained Language Models: A Survey
Text retrieval is a long-standing research topic on information seeking,
where a system is required to return relevant information resources to user's
queries in natural language. From classic retrieval methods to learning-based
ranking functions, the underlying retrieval models have been continually
evolved with the ever-lasting technical innovation. To design effective
retrieval models, a key point lies in how to learn the text representation and
model the relevance matching. The recent success of pretrained language models
(PLMs) sheds light on developing more capable text retrieval approaches by
leveraging the excellent modeling capacity of PLMs. With powerful PLMs, we can
effectively learn the representations of queries and texts in the latent
representation space, and further construct the semantic matching function
between the dense vectors for relevance modeling. Such a retrieval approach is
referred to as dense retrieval, since it employs dense vectors (a.k.a.,
embeddings) to represent the texts. Considering the rapid progress on dense
retrieval, in this survey, we systematically review the recent advances on
PLM-based dense retrieval. Different from previous surveys on dense retrieval,
we take a new perspective to organize the related work by four major aspects,
including architecture, training, indexing and integration, and summarize the
mainstream techniques for each aspect. We thoroughly survey the literature, and
include 300+ related reference papers on dense retrieval. To support our
survey, we create a website for providing useful resources, and release a code
repertory and toolkit for implementing dense retrieval models. This survey aims
to provide a comprehensive, practical reference focused on the major progress
for dense text retrieval
MVP: Multi-task Supervised Pre-training for Natural Language Generation
Pre-trained language models (PLMs) have achieved remarkable success in
natural language generation (NLG) tasks. Up to now, most NLG-oriented PLMs are
pre-trained in an unsupervised manner using the large-scale general corpus. In
the meanwhile, an increasing number of models pre-trained with labeled data
(i.e. "supervised pre-training") showcase superior performance compared to
unsupervised pre-trained models. Motivated by the success of supervised
pre-training, we propose Multi-task superVised Pre-training (MVP) for natural
language generation. We collect a large-scale natural language generation
corpus, MVPCorpus, from datasets over diverse NLG tasks. Then we
unify these examples into a general text-to-text format to pre-train the text
generation model MVP in a supervised manner. For each task, we further
pre-train specific soft prompts to stimulate the model's capacity to perform a
specific task. Our MVP model can be seen as a practice that utilizes recent
instruction tuning on relatively small PLMs. Extensive experiments have
demonstrated the effectiveness and generality of our MVP model in a number of
NLG tasks, which achieves state-of-the-art performance on out of
datasets, outperforming BART by and Flan-T5 by .Comment: Accepted by ACL 202
- …