212 research outputs found
The Assistive Multi-Armed Bandit
Learning preferences implicit in the choices humans make is a well studied
problem in both economics and computer science. However, most work makes the
assumption that humans are acting (noisily) optimally with respect to their
preferences. Such approaches can fail when people are themselves learning about
what they want. In this work, we introduce the assistive multi-armed bandit,
where a robot assists a human playing a bandit task to maximize cumulative
reward. In this problem, the human does not know the reward function but can
learn it through the rewards received from arm pulls; the robot only observes
which arms the human pulls but not the reward associated with each pull. We
offer sufficient and necessary conditions for successfully assisting the human
in this framework. Surprisingly, better human performance in isolation does not
necessarily lead to better performance when assisted by the robot: a human
policy can do better by effectively communicating its observed rewards to the
robot. We conduct proof-of-concept experiments that support these results. We
see this work as contributing towards a theory behind algorithms for
human-robot interaction.Comment: Accepted to HRI 201
Language models in molecular discovery
The success of language models, especially transformer-based architectures,
has trickled into other domains giving rise to "scientific language models"
that operate on small molecules, proteins or polymers. In chemistry, language
models contribute to accelerating the molecule discovery cycle as evidenced by
promising recent findings in early-stage drug discovery. Here, we review the
role of language models in molecular discovery, underlining their strength in
de novo drug design, property prediction and reaction chemistry. We highlight
valuable open-source software assets thus lowering the entry barrier to the
field of scientific language modeling. Last, we sketch a vision for future
molecular design that combines a chatbot interface with access to computational
chemistry tools. Our contribution serves as a valuable resource for
researchers, chemists, and AI enthusiasts interested in understanding how
language models can and will be used to accelerate chemical discovery.Comment: Under revie
A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems
Conversational recommender systems (CRS) generate recommendations through an
interactive process. However, not all CRS approaches use human conversations as
their source of interaction data; the majority of prior CRS work simulates
interactions by exchanging entity-level information. As a result, claims of
prior CRS work do not generalise to real-world settings where conversations
take unexpected turns, or where conversational and intent understanding is not
perfect. To tackle this challenge, the research community has started to
examine holistic CRS, which are trained using conversational data collected
from real-world scenarios. Despite their emergence, such holistic approaches
are under-explored.
We present a comprehensive survey of holistic CRS methods by summarizing the
literature in a structured manner. Our survey recognises holistic CRS
approaches as having three components: 1) a backbone language model, the
optional use of 2) external knowledge, and/or 3) external guidance. We also
give a detailed analysis of CRS datasets and evaluation methods in real
application scenarios. We offer our insight as to the current challenges of
holistic CRS and possible future trends.Comment: Accepted by 5th KaRS Workshop @ ACM RecSys 2023, 8 page
A personality aware recommendation system
Les systèmes de recommandation conversationnels (CRSs) sont des systèmes qui fournissent
des recommandations personnalisées par le biais d’une session de dialogue en langage
naturel avec les utilisateurs. Contrairement aux systèmes de recommandation traditionnels
qui ne prennent comme vérité de base que les préférences anciennes des utilisateurs, les
CRS impliquent aussi les préférences actuelles des utilisateurs durant la conversation. Des
recherches récentes montrent que la compréhension de la signification contextuelle des
préférences des utilisateurs et des dialogues peut améliorer de manière significative les
performances du système de recommandation. Des chercheurs ont également montré un
lien fort entre les traits de personnalité des utilisateurs et les systèmes de recommandation.
La personnalité et les préférences sont des variables essentielles en sciences sociales. Elles
décrivent les différences entre les personnes, que ce soit au niveau individuel ou collectif.
Les approches récentes de recommandation basées sur la personnalité sont des systèmes non
conversationnels. Par conséquent, il est extrêmement important de détecter et d’utiliser les
traits de personnalité des individus dans les systèmes conversationnels afin d’assurer une
performance de recommandation et de dialogue plus personnalisée. Pour ce faire, ce travail
propose un système de recommandation conversationnel sensible à la personnalité qui est
basé sur des modules qui assurent une session de dialogue et recommandation personnalisée
en utilisant les traits de personnalité des utilisateurs. Nous proposons également une
nouvelle approche de détection de la personnalité, qui est un modèle de langage spécifique
au contexte pour détecter les traits des individus en utilisant leurs données publiées sur les
réseaux sociaux. Les résultats montrent que notre système proposé a surpassé les approches
existantes dans différentes mesures.A Conversational Recommendation System (CRS) is a system that provides personalized
recommendations through a session of natural language dialogue turns with users. Unlike
traditional one-shot recommendation systems, which only assume the user’s previous
preferences as the ground truth, CRS uses both previous and current user preferences.
Recent research shows that understanding the contextual meaning of user preferences and
dialogue turns can significantly improve recommendation performance. It also shows a
strong link between users’ personality traits and recommendation systems. Personality
and preferences are essential variables in computational sociology and social science.
They describe the differences between people, both at the individual and collective level.
Recent personality-based recommendation approaches are traditional one-shot systems, or
“non conversational systems”. Therefore, there is a significant need to detect and employ
individuals’ personality traits within the CRS paradigm to ensure a better and more
personalized dialogue recommendation performance.
Driven by the aforementioned facts, this study proposes a modularized, personality-
aware CRS that ensures a personalized dialogue recommendation session using the users’
personality traits. We also propose a novel personality detection approach, which is a
context-specific language model for detecting individuals’ personality traits using their
social media data. The goal is to create a personality-aware and topic-guided CRS model
that performs better than the standard CRS models. Experimental results show that our
personality-aware conversation recommendation system has outperformed state-of-the-art
approaches in different considered metrics on the topic-guided conversation recommendation
dataset
Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications
If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback.
We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective.
In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections.
In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning
Towards Personalized Learning using Counterfactual Inference for Randomized Controlled Trials
Personalized learning considers that the causal effects of a studied learning intervention may differ for the individual student (e.g., maybe girls do better with video hints while boys do better with text hints). To evaluate a learning intervention inside ASSISTments, we run a randomized control trial (RCT) by randomly assigning students into either a control condition or a treatment condition. Making the inference about causal effects of studies interventions is a central problem. Counterfactual inference answers “What if� questions, such as Would this particular student benefit more if the student were given the video hint instead of the text hint when the student cannot solve a problem? . Counterfactual prediction provides a way to estimate the individual treatment effects and helps us to assign the students to a learning intervention which leads to a better learning. A variant of Michael Jordan\u27s Residual Transfer Networks was proposed for the counterfactual inference. The model first uses feed-forward neural networks to learn a balancing representation of students by minimizing the distance between the distributions of the control and the treated populations, and then adopts a residual block to estimate the individual treatment effect. Students in the RCT usually have done a number of problems prior to participating it. Each student has a sequence of actions (performance sequence). We proposed a pipeline to use the performance sequence to improve the performance of counterfactual inference. Since deep learning has achieved a huge amount of success in learning representations from raw logged data, student representations were learned by applying the sequence autoencoder to performance sequences. Then, incorporate these representations into the model for counterfactual inference. Empirical results showed that the representations learned from the sequence autoencoder improved the performance of counterfactual inference
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Large Language Models (LLMs) have taken Knowledge Representation -- and the
world -- by storm. This inflection point marks a shift from explicit knowledge
representation to a renewed focus on the hybrid representation of both explicit
knowledge and parametric knowledge. In this position paper, we will discuss
some of the common debate points within the community on LLMs (parametric
knowledge) and Knowledge Graphs (explicit knowledge) and speculate on
opportunities and visions that the renewed focus brings, as well as related
research topics and challenges.Comment: 30 page
- …