116 research outputs found
Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue
Motivated by the observation that overexposure to unwanted marketing
activities leads to customer dissatisfaction, we consider a setting where a
platform offers a sequence of messages to its users and is penalized when users
abandon the platform due to marketing fatigue. We propose a novel sequential
choice model to capture multiple interactions taking place between the platform
and its user: Upon receiving a message, a user decides on one of the three
actions: accept the message, skip and receive the next message, or abandon the
platform. Based on user feedback, the platform dynamically learns users'
abandonment distribution and their valuations of messages to determine the
length of the sequence and the order of the messages, while maximizing the
cumulative payoff over a horizon of length T. We refer to this online learning
task as the sequential choice bandit problem. For the offline combinatorial
optimization problem, we show that an efficient polynomial-time algorithm
exists. For the online problem, we propose an algorithm that balances
exploration and exploitation, and characterize its regret bound. Lastly, we
demonstrate how to extend the model with user contexts to incorporate
personalization
Zero-Resource Hallucination Prevention for Large Language Models
The prevalent use of large language models (LLMs) in various domains has
drawn attention to the issue of "hallucination," which refers to instances
where LLMs generate factually inaccurate or ungrounded information. Existing
techniques for hallucination detection in language assistants rely on intricate
fuzzy, specific free-language-based chain of thought (CoT) techniques or
parameter-based methods that suffer from interpretability issues. Additionally,
the methods that identify hallucinations post-generation could not prevent
their occurrence and suffer from inconsistent performance due to the influence
of the instruction format and model style. In this paper, we introduce a novel
pre-detection self-evaluation technique, referred to as {\method}, which
focuses on evaluating the model's familiarity with the concepts present in the
input instruction and withholding the generation of response in case of
unfamiliar concepts. This approach emulates the human ability to refrain from
responding to unfamiliar topics, thus reducing hallucinations. We validate
{\method} across four different large language models, demonstrating
consistently superior performance compared to existing techniques. Our findings
propose a significant shift towards preemptive strategies for hallucination
mitigation in LLM assistants, promising improvements in reliability,
applicability, and interpretability
Fatigue-aware Bandits for Dependent Click Models
As recommender systems send a massive amount of content to keep users
engaged, users may experience fatigue which is contributed by 1) an
overexposure to irrelevant content, 2) boredom from seeing too many similar
recommendations. To address this problem, we consider an online learning
setting where a platform learns a policy to recommend content that takes user
fatigue into account. We propose an extension of the Dependent Click Model
(DCM) to describe users' behavior. We stipulate that for each piece of content,
its attractiveness to a user depends on its intrinsic relevance and a discount
factor which measures how many similar contents have been shown. Users view the
recommended content sequentially and click on the ones that they find
attractive. Users may leave the platform at any time, and the probability of
exiting is higher when they do not like the content. Based on user's feedback,
the platform learns the relevance of the underlying content as well as the
discounting effect due to content fatigue. We refer to this learning task as
"fatigue-aware DCM Bandit" problem. We consider two learning scenarios
depending on whether the discounting effect is known. For each scenario, we
propose a learning algorithm which simultaneously explores and exploits, and
characterize its regret bound
A Benchmark Dataset for Understandable Medical Language Translation
In this paper, we introduce MedLane -- a new human-annotated Medical Language
translation dataset, to align professional medical sentences with
layperson-understandable expressions. The dataset contains 12,801 training
samples, 1,015 validation samples, and 1,016 testing samples. We then evaluate
one naive and six deep learning-based approaches on the MedLane dataset,
including directly copying, a statistical machine translation approach Moses,
four neural machine translation approaches (i.e., the proposed PMBERT-MT model,
Seq2Seq and its two variants), and a modified text summarization model
PointerNet. To compare the results, we utilize eleven metrics, including three
new measures specifically designed for this task. Finally, we discuss the
limitations of MedLane and baselines, and point out possible research
directions for this task
Microwave-induced phase escape in a Josephson tunnel junction
This is the published version, also available here: http://dx.doi.org/10.1103/PhysRevB.77.104531.We perform both theoretical and experimental investigations on the phase escape of a current-biased Josephson tunnel junction under microwave irradiation. The switching current distributions exhibit abundant nonlinear behaviors depending on the power and frequency of the applied microwave. We present a model to describe the behavior of the primary peak in the switching current distribution, which is confirmed by our experimental results. The obtained features can be used to characterize the damping parameter of Josephson junctions
- …