212 research outputs found

    The Assistive Multi-Armed Bandit

    Full text link
    Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.Comment: Accepted to HRI 201

    Language models in molecular discovery

    Full text link
    The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.Comment: Under revie

    A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems

    Full text link
    Conversational recommender systems (CRS) generate recommendations through an interactive process. However, not all CRS approaches use human conversations as their source of interaction data; the majority of prior CRS work simulates interactions by exchanging entity-level information. As a result, claims of prior CRS work do not generalise to real-world settings where conversations take unexpected turns, or where conversational and intent understanding is not perfect. To tackle this challenge, the research community has started to examine holistic CRS, which are trained using conversational data collected from real-world scenarios. Despite their emergence, such holistic approaches are under-explored. We present a comprehensive survey of holistic CRS methods by summarizing the literature in a structured manner. Our survey recognises holistic CRS approaches as having three components: 1) a backbone language model, the optional use of 2) external knowledge, and/or 3) external guidance. We also give a detailed analysis of CRS datasets and evaluation methods in real application scenarios. We offer our insight as to the current challenges of holistic CRS and possible future trends.Comment: Accepted by 5th KaRS Workshop @ ACM RecSys 2023, 8 page

    A personality aware recommendation system

    Full text link
    Les systèmes de recommandation conversationnels (CRSs) sont des systèmes qui fournissent des recommandations personnalisées par le biais d’une session de dialogue en langage naturel avec les utilisateurs. Contrairement aux systèmes de recommandation traditionnels qui ne prennent comme vérité de base que les préférences anciennes des utilisateurs, les CRS impliquent aussi les préférences actuelles des utilisateurs durant la conversation. Des recherches récentes montrent que la compréhension de la signification contextuelle des préférences des utilisateurs et des dialogues peut améliorer de manière significative les performances du système de recommandation. Des chercheurs ont également montré un lien fort entre les traits de personnalité des utilisateurs et les systèmes de recommandation. La personnalité et les préférences sont des variables essentielles en sciences sociales. Elles décrivent les différences entre les personnes, que ce soit au niveau individuel ou collectif. Les approches récentes de recommandation basées sur la personnalité sont des systèmes non conversationnels. Par conséquent, il est extrêmement important de détecter et d’utiliser les traits de personnalité des individus dans les systèmes conversationnels afin d’assurer une performance de recommandation et de dialogue plus personnalisée. Pour ce faire, ce travail propose un système de recommandation conversationnel sensible à la personnalité qui est basé sur des modules qui assurent une session de dialogue et recommandation personnalisée en utilisant les traits de personnalité des utilisateurs. Nous proposons également une nouvelle approche de détection de la personnalité, qui est un modèle de langage spécifique au contexte pour détecter les traits des individus en utilisant leurs données publiées sur les réseaux sociaux. Les résultats montrent que notre système proposé a surpassé les approches existantes dans différentes mesures.A Conversational Recommendation System (CRS) is a system that provides personalized recommendations through a session of natural language dialogue turns with users. Unlike traditional one-shot recommendation systems, which only assume the user’s previous preferences as the ground truth, CRS uses both previous and current user preferences. Recent research shows that understanding the contextual meaning of user preferences and dialogue turns can significantly improve recommendation performance. It also shows a strong link between users’ personality traits and recommendation systems. Personality and preferences are essential variables in computational sociology and social science. They describe the differences between people, both at the individual and collective level. Recent personality-based recommendation approaches are traditional one-shot systems, or “non conversational systems”. Therefore, there is a significant need to detect and employ individuals’ personality traits within the CRS paradigm to ensure a better and more personalized dialogue recommendation performance. Driven by the aforementioned facts, this study proposes a modularized, personality- aware CRS that ensures a personalized dialogue recommendation session using the users’ personality traits. We also propose a novel personality detection approach, which is a context-specific language model for detecting individuals’ personality traits using their social media data. The goal is to create a personality-aware and topic-guided CRS model that performs better than the standard CRS models. Experimental results show that our personality-aware conversation recommendation system has outperformed state-of-the-art approaches in different considered metrics on the topic-guided conversation recommendation dataset

    Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications

    Get PDF
    If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback. We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective. In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections. In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning

    Towards Personalized Learning using Counterfactual Inference for Randomized Controlled Trials

    Get PDF
    Personalized learning considers that the causal effects of a studied learning intervention may differ for the individual student (e.g., maybe girls do better with video hints while boys do better with text hints). To evaluate a learning intervention inside ASSISTments, we run a randomized control trial (RCT) by randomly assigning students into either a control condition or a treatment condition. Making the inference about causal effects of studies interventions is a central problem. Counterfactual inference answers “What if� questions, such as Would this particular student benefit more if the student were given the video hint instead of the text hint when the student cannot solve a problem? . Counterfactual prediction provides a way to estimate the individual treatment effects and helps us to assign the students to a learning intervention which leads to a better learning. A variant of Michael Jordan\u27s Residual Transfer Networks was proposed for the counterfactual inference. The model first uses feed-forward neural networks to learn a balancing representation of students by minimizing the distance between the distributions of the control and the treated populations, and then adopts a residual block to estimate the individual treatment effect. Students in the RCT usually have done a number of problems prior to participating it. Each student has a sequence of actions (performance sequence). We proposed a pipeline to use the performance sequence to improve the performance of counterfactual inference. Since deep learning has achieved a huge amount of success in learning representations from raw logged data, student representations were learned by applying the sequence autoencoder to performance sequences. Then, incorporate these representations into the model for counterfactual inference. Empirical results showed that the representations learned from the sequence autoencoder improved the performance of counterfactual inference

    Large Language Models and Knowledge Graphs: Opportunities and Challenges

    Full text link
    Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.Comment: 30 page
    • …
    corecore