77 research outputs found
Pareto-based Multi-Objective Recommender System with Forgetting Curve
Recommender systems with cascading architecture play an increasingly
significant role in online recommendation platforms, where the approach to
dealing with negative feedback is a vital issue. For instance, in short video
platforms, users tend to quickly slip away from candidates that they feel
aversive, and recommender systems are expected to receive these explicit
negative feedbacks and make adjustments to avoid these recommendations.
Considering recency effect in memories, we propose a forgetting model based on
Ebbinghaus Forgetting Curve to cope with negative feedback. In addition, we
introduce a Pareto optimization solver to guarantee a better trade-off between
recency and model performance. In conclusion, we propose Pareto-based
Multi-Objective Recommender System with forgetting curve (PMORS), which can be
applied to any multi-objective recommendation and show sufficiently superiority
when facing explicit negative feedback. We have conducted evaluations of PMORS
and achieved favorable outcomes in short-video scenarios on both public dataset
and industrial dataset. After being deployed on an online short video platform
named WeChat Channels in May, 2023, PMORS has not only demonstrated promising
results for both consistency and recency but also achieved an improvement of up
to +1.45% GMV
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters
Recently, Large Language Models (LLMs) have achieved amazing zero-shot
learning performance over a variety of Natural Language Processing (NLP) tasks,
especially for text generative tasks. Yet, the large size of LLMs often leads
to the high computational cost of model training and online deployment. In our
work, we present ALTER, a system that effectively builds the multi-tAsk
Learners with mixTure-of-task-adaptERs upon small language models (with <1B
parameters) to address multiple NLP tasks simultaneously, capturing the
commonalities and differences between tasks, in order to support
domain-specific applications. Specifically, in ALTER, we propose the
Mixture-of-Task-Adapters (MTA) module as an extension to the transformer
architecture for the underlying model to capture the intra-task and inter-task
knowledge. A two-stage training method is further proposed to optimize the
collaboration between adapters at a small computational cost. Experimental
results over a mixture of NLP tasks show that our proposed MTA architecture and
the two-stage training method achieve good performance. Based on ALTER, we have
also produced MTA-equipped language models for various domains
Automatically Extracting Information in Medical Dialogue: Expert System And Attention for Labelling
Medical dialogue information extraction is becoming an increasingly
significant problem in modern medical care. It is difficult to extract key
information from electronic medical records (EMRs) due to their large numbers.
Previously, researchers proposed attention-based models for retrieving features
from EMRs, but their limitations were reflected in their inability to recognize
different categories in medical dialogues. In this paper, we propose a novel
model, Expert System and Attention for Labelling (ESAL). We use mixture of
experts and pre-trained BERT to retrieve the semantics of different categories,
enabling the model to fuse the differences between them. In our experiment,
ESAL was applied to a public dataset and the experimental results indicated
that ESAL significantly improved the performance of Medical Information
Classification
Curriculum Modeling the Dependence among Targets with Multi-task Learning for Financial Marketing
Multi-task learning for various real-world applications usually involves
tasks with logical sequential dependence. For example, in online marketing, the
cascade behavior pattern of is usually modeled as multiple tasks in a multi-task manner, where
the sequential dependence between tasks is simply connected with an explicitly
defined function or implicitly transferred information in current works. These
methods alleviate the data sparsity problem for long-path sequential tasks as
the positive feedback becomes sparser along with the task sequence. However,
the error accumulation and negative transfer will be a severe problem for
downstream tasks. Especially, at the beginning stage of training, the
optimization for parameters of former tasks is not converged yet, and thus the
information transferred to downstream tasks is negative. In this paper, we
propose a prior information merged model (\textbf{PIMM}), which explicitly
models the logical dependence among tasks with a novel prior information merged
(\textbf{PIM}) module for multiple sequential dependence task learning in a
curriculum manner. Specifically, the PIM randomly selects the true label
information or the prior task prediction with a soft sampling strategy to
transfer to the downstream task during the training. Following an
easy-to-difficult curriculum paradigm, we dynamically adjust the sampling
probability to ensure that the downstream task will get the effective
information along with the training. The offline experimental results on both
public and product datasets verify that PIMM outperforms state-of-the-art
baselines. Moreover, we deploy the PIMM in a large-scale FinTech platform, and
the online experiments also demonstrate the effectiveness of PIMM
Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning
Multi-task learning (MTL) has been widely applied in online advertising and
recommender systems. To address the negative transfer issue, recent studies
have proposed optimization methods that thoroughly focus on the gradient
alignment of directions or magnitudes. However, since prior study has proven
that both general and specific knowledge exist in the limited shared capacity,
overemphasizing on gradient alignment may crowd out task-specific knowledge,
and vice versa. In this paper, we propose a transference-driven approach CoGrad
that adaptively maximizes knowledge transference via Coordinated Gradient
modification. We explicitly quantify the transference as loss reduction from
one task to another, and then derive an auxiliary gradient from optimizing it.
We perform the optimization by incorporating this gradient into original task
gradients, making the model automatically maximize inter-task transfer and
minimize individual losses. Thus, CoGrad can harmonize between general and
specific knowledge to boost overall performance. Besides, we introduce an
efficient approximation of the Hessian matrix, making CoGrad computationally
efficient and simple to implement. Both offline and online experiments verify
that CoGrad significantly outperforms previous methods.Comment: 5 pages, 3 figure
Reweighting Clicks with Dwell Time in Recommendation
The click behavior is the most widely-used user positive feedback in
recommendation. However, simply considering each click equally in training may
suffer from clickbaits and title-content mismatching, and thus fail to
precisely capture users' real satisfaction on items. Dwell time could be viewed
as a high-quality quantitative indicator of user preferences on each click,
while existing recommendation models do not fully explore the modeling of dwell
time. In this work, we focus on reweighting clicks with dwell time in
recommendation. Precisely, we first define a new behavior named valid read,
which helps to select high-quality click instances for different users and
items via dwell time. Next, we propose a normalized dwell time function to
reweight click signals in training, which could better guide our model to
provide a high-quality and efficient reading. The Click reweighting model
achieves significant improvements on both offline and online evaluations in a
real-world system.Comment: 5 pages, under revie
- …