Search CORE

390 research outputs found

Implementing ChatBots using Neural Machine Translation techniques

Author: Nuez Ezquerra Alvaro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

Conversational agents or chatbots (short for chat robot) are a branch of Natural Language Processing (NLP) that has arisen a lot of interest nowadays due to the extent number of applications in company services such as customer support or automatized FAQS and personal asistent services, for instance Siri or Cortana. There are three types: rule-based models, retrieval-based models and generative-based models. The difference between them is the freedom they have at the time of generating an answer given a question. The chatbot models usually used in public services are rule-based or retrieval-based given the need to guarantee quality and adecuate answers to users. But these models can handle only conversations aligned with their previous written answers and, therefore, conversations can sometimes sound artificial if it goes out of the topic. Generative-based models can handle better an open conversation which makes them a more generalizable approach. Promising results have been achieved in generative-based models by applying neural machine translation techniques with the recurrent encoder/decoder architecture. In this project is implemented, compared and analyzed two generative models that constitute the state of the art in neural machine translation applied to chatbots. One model is based on recurrence with attention and the other is based exclusively in attention. Additionally, the model based exclusively on recurrence has been used as a reference. Experiments show that, as in translation, an architecture based only in attention mechanisms obtains better results than the recurrence based models.Los agentes conversacionales o chatbots (abreviación de chat robot) es una rama del Procesado de Lenguaje Natural (PLN o en inglés NLP) que ha despertado gran interés hoy en día debido a la gran cantidad de aplicaciones en servicios como atención al cliente, FAQs y sistemas de asistencia personal como Siri o Cortana. Existen tres tipos: los sistemas basados en reglas, los modelos basados en recuperaci ?on y los modelos generativos. La diferencia entre ellos reside en la libertad que tienen a la hora de generar una respuesta dada una pregunta. Los modelos de chatbot utilizados comúnmente en servicios públicos son de tipo recuperación o basado en reglas dada la necesidad de garantizar respuestas correctas y de calidad a los usuarios. El problema de estos modelos es que tan solo pueden mantener conversaciones relacionadas con sus respuestas pre-escritas y, por tanto, las conversaciones pueden llegar a ser muy artificiales si un usuario se desvía del tema. Los modelos generativos, por otro lado, pueden desenvolverse mucho mejor en conversaciones abiertas, lo que los convierte en un enfoque más generalizable. Se han logrado prometedores resultados en el ámbito de los modelos generativos gracias a la aplicación de técnicas de traducción neuronal con arquitecturas encoder/decoder basadas en recurrencia. En este proyecto se implementan, comparan y analizan dos tipos de modelos generativos que constituyen el estado del arte en traducción neuronal aplicados a chatbots. Uno de los modelos está basado en recurrencia y mecanismos de atención y el otro está basado exclusivamente en atención. Adicionalmente, el modelo basado exclusivamente en recurrencia se ha utilizado como referencia para los experimentos. Los experimentos demuestran que, como sucedía en traducción, una arquitectura basada exclusivamente en mecanismos de atención obtiene mejores resultados que aquellos basados en recurrencia.Els agents conversacionals o chatbots (abreviació de chat robot) és una branca del Processat de Llenguatge Natural (PLN o en anglès NLP) que ha despertat gran interès avui en dia degut a la gran quantitat d'aplicacions en serveis com atenció al client, FAQs i sistemes d'assistència personal com Siri o Cortana. Hi ha de tres tipus: els sistemes basats en regles, els models basats en recuperació i els models generatius. La diferència entre ells resideix en la llibertat que tenen a l´hora de generar una resposta donada una pregunta. Els models de chatbots utilizats comunament en serveis públics són de tipus recuperació o basats en regles a causa de la necessitat de garantir respostes correctes i de qualitat als usuaris. El problema d'aquests models és que tan sols poden mantenir converses relacionades amb les seves respostes escrites prèviament i, aleshores, les converses poden ser molt artificials si un usuari es desvia del tema. Els models generatius, per altra banda, poden desenvolupar-se molt millor en converses obertes, el que els converteix en un enfocament més generalitzable. S'han aconseguit prometedors resultats en l'àmbit dels models generatius gràcies a l'aplicació de tècniques de traducció neuronal amb arquitectures encoder/decoder basades en recurrència. En aquest projecte s'implementa, es compara i s'analitza dos tipus de models generatius que constitueixen l'estat de la qüestió en traducció neuronal aplicats a chatbots. Un dels models es basa en recurrència i mecanismes d'atenció i l'altre es basa exclusivament en atenció. Addicionalment, el model basat exclusivament en recurrència s'ha utilizat com a referència per als experiments. Els experiments demostren que, com succeïa en traducció, una arquitectura basada exclusivament en mecanismes d'atenció obté millors resultats que aquells basats en recurrència

UPCommons. Portal del coneixement obert de la UPC

Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection

Author: Feng Yansong
Tao Chongyang
Wu Wei
Xu Can
Yan Rui
Zhao Dongyan
Publication venue
Publication date: 04/06/2019
Field of study

In this paper, we study context-response matching with pre-trained contextualized representations for multi-turn response selection in retrieval-based chatbots. Existing models, such as Cove and ELMo, are trained with limited context (often a single sentence or paragraph), and may not work well on multi-turn conversations, due to the hierarchical nature, informal language, and domain-specific words. To address the challenges, we propose pre-training hierarchical contextualized representations, including contextual word-level and sentence-level representations, by learning a dialogue generation model from large-scale conversations with a hierarchical encoder-decoder architecture. Then the two levels of representations are blended into the input and output layer of a matching model respectively. Experimental results on two benchmark conversation datasets indicate that the proposed hierarchical contextualized representations can bring significantly and consistently improvement to existing matching models for response selection.Comment: 6 pages, 1 figur

arXiv.org e-Print Archive

Crossref

A Conditional Generative Chatbot using Transformer Model

Author: Esfandiari Nura
Kiani Kourosh
Rastgoo Razieh
Publication venue
Publication date: 03/06/2023
Field of study

A Chatbot serves as a communication tool between a human user and a machine to achieve an appropriate answer based on the human input. In more recent approaches, a combination of Natural Language Processing and sequential models are used to build a generative Chatbot. The main challenge of these models is their sequential nature, which leads to less accurate results. To tackle this challenge, in this paper, a novel end-to-end architecture is proposed using conditional Wasserstein Generative Adversarial Networks and a transformer model for answer generation in Chatbots. While the generator of the proposed model consists of a full transformer model to generate an answer, the discriminator includes only the encoder part of a transformer model followed by a classifier. To the best of our knowledge, this is the first time that a generative Chatbot is proposed using the embedded transformer in both generator and discriminator models. Relying on the parallel computing of the transformer model, the results of the proposed model on the Cornell Movie-Dialog corpus and the Chit-Chat datasets confirm the superiority of the proposed model compared to state-of-the-art alternatives using different evaluation metrics

arXiv.org e-Print Archive

A Differentiable Generative Adversarial Network for Open Domain Dialogue

Author: De Velasco Vázquez Mikel
López Zorrilla Asier
Torres Barañano María Inés
Publication venue
Publication date: 01/04/2019
Field of study

Paper presented at the IWSDS 2019: International Workshop on Spoken Dialogue Systems Technology, Siracusa, Italy, April 24-26, 2019This work presents a novel methodology to train open domain neural dialogue systems within the framework of Generative Adversarial Networks with gradient-based optimization methods. We avoid the non-differentiability related to text-generating networks approximating the word vector corresponding to each generated token via a top-k softmax. We show that a weighted average of the word vectors of the most probable tokens computed from the probabilities resulting of the top-k softmax leads to a good approximation of the word vector of the generated token. Finally we demonstrate through a human evaluation process that training a neural dialogue system via adversarial learning with this method successfully discourages it from producing generic responses. Instead it tends to produce more informative and variate ones.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357, by the University of the Basque Country UPV/EHU under grant PIF17/310, and by the H2020 RIA EMPATHIC (Grant N: 769872)

Archivo Digital para la Docencia y la Investigación

Language models in molecular discovery

Author: Born Jannis
Erdmann Tim
Janakarajan Nikita
Laino Teodoro
Swaminathan Sarath
Publication venue
Publication date: 28/09/2023
Field of study

The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.Comment: Under revie

arXiv.org e-Print Archive

Dialogue systems based on pre-trained language models

Author: Zeng Yan
Publication venue
Publication date: 01/07/2021
Field of study

Les modèles de langue pré-entraînés ont montré leur efficacité dans beaucoup de tâches de traitement de la langue naturelle. Ces modèles peuvent capter des régularités générales d'une langue à partir d'un grand ensemble de textes, qui sont utiles dans la plupart des applications en traitement de langue naturelle. Dans ce mémoire, nous étudions les problèmes de dialogue, i.e. générer une réponse à un énoncé de l'utilisateur. Nous exploitons les modèles de langue pré-entraînés pour traiter différents aspects des systèmes de dialogue. Premièrement, les modèles de langue pré-entraînés sont entraînés and utilisés dans les systèmes de dialogue de différentes façons. Il n'est pas clair quelle façon est la plus appropriée. Pour le dialogue orienté-tâche, l’approche de l'état de l'art pour le suivi de l'état de dialogue (Dialogue State Tracking) utilise BERT comme encodeur et empile un autre réseau de neurones récurrent (RNN) sur les sorties de BERT comme décodeur. Dans ce cas, seul l'encodeur peut bénéficier des modèles de langue pré-entraînés. Dans la première partie de ce mémoire, nous proposons une méthode qui utilise un seul modèle BERT pour l'encodeur et le décodeur, permettant ainsi un ajustement de paramètres plus efficace. Notre méthode atteint une performance qui dépasse l'état de l'art. Pour la tâche de génération de réponses dans un chatbot, nous comparons 4 approches communément utilisées. Elles sont basées sur des modèles pré-entraînés et utilisent des objectifs et des mécanismes d'attention différents. En nous appuyant sur des expérimentations, nous observons l'impact de deux types de disparité qui sont largement ignorées dans la littérature: disparité entre pré-entraînement et peaufinage, et disparité entre peaufinage et génération de réponse. Nous montrons que l'impact de ces disparités devient évident quand le volume de données d’entraînement est limité. Afin de remédier à ce problème, nous proposons deux méthodes qui réduisent les disparités, permettant d'améliorer la performance. Deuxièmement, même si les méthodes basées sur des modèles pré-entraînés ont connu de grands succès en dialogue général, nous devons de plus en plus traiter le problème de dialogue conditionné, c'est-à-dire dialogue en relation à une certaine condition (qui peut désigner un personnage, un sujet, etc.). Des chercheurs se sont aussi intéressés aux systèmes de chatbot avec des habiletés de conversation multiples, i.e. chatbot capable de confronter différentes situations de dialogues conditionnés. Ainsi, dans la seconde partie de ce mémoire, nous étudions le problème de génération de dialogue conditionné. D'abord, nous proposons une méthode générale qui exploite non seulement des données de dialogues conditionnées, mais aussi des données non-dialogues (textes) conditionnées. Ces dernières sont beaucoup plus faciles à acquérir en pratique. Ceci nous permet d'atténuer le problème de rareté de données. Ensuite, nous proposons des méthodes qui utilisent le concept d'adaptateur proposé récemment dans la littérature. Un adaptateur permet de renforcer un système de dialogue général en lui donnant une habileté spécifique. Nous montrons que les adaptateurs peuvent encoder des habiletés de dialogue conditionné de façon stricte ou flexible, tout en utilisant seulement 6% plus de paramètres. Ce mémoire contient 4 travaux sur deux grands problèmes de dialogue: l'architecture inhérente du modèle de dialogue basé sur des modèles de langue pré-entraînés, et l'enrichissement d'un système de dialogue général pour avoir des habiletés spécifiques. Ces travaux non seulement nous permettent d'obtenir des performances dépassant de l'état de l'art, mais aussi soulignent l'importance de concevoir l'architecture du modèle pour bien correspondre à la tâche, plutôt que simplement augmenter le volume de données d'entraînement et la puissance de calcul brute.Pre-trained language models (LMs) have shown to be effective in many NLP tasks. They can capture general language regularities from a large amount of texts, which are useful for most applications related to natural languages. In this thesis, we study the problems of dialogue, i.e. to generate a response to a user's utterance. We exploit pre-trained language models to deal with different aspects of dialogue systems. First, pre-trained language models have been trained and used in different ways in dialogue systems and it is unclear what is the best way to use pre-trained language models in dialogue. For task-oriented dialogue systems, the state-of-the-art framework for Dialogue State Tracking (DST) uses BERT as the encoder and stacks an RNN upon BERT outputs as the decoder. Pre-trained language models are only leveraged for the encoder. In the first part of the thesis, we investigate methods using a single BERT model for both the encoder and the decoder, allowing for more effective parameter updating. Our method achieves new state-of-the-art performance. For the task of response generation in generative chatbot systems, we further compare the 4 commonly used frameworks based on pre-trained LMs, which use different training objectives and attention mechanisms. Through extensive experiments, we observe the impact of two types of discrepancy: pretrain-finetune discrepancy and finetune-generation discrepancy (i.e. differences between pre-training and fine-tuning, and between fine-tuning and generation), which have not been paid attention to. We show that the impact of the discrepancies will surface when limited amount of training data is available. To alleviate the problem, we propose two methods to reduce discrepancies, yielding improved performance. Second, even though pre-training based methods have shown excellent performance in general dialogue generation, we are more and more faced with the problem of conditioned conversation, i.e. conversation in relation with some condition (persona, topic, etc.). Researchers are also interested in multi-skill chatbot systems, namely equipping a chatbot with abilities to confront different conditioned generation tasks. Therefore, in the second part of the thesis, we investigate the problem of conditioned dialogue generation. First, we propose a general method that leverages not only conditioned dialogue data, but also conditioned non-dialogue text data, which are much easier to collect, in order to alleviate the data scarcity issue of conditioned dialogue generation. Second, the concept of Adapter has been recently proposed, which adapts a general dialogue system to enhance some dialogue skill. We investigate the ways to learn a dialogue skill. We show that Adapter has enough capacity to model a dialogue skill for either loosely-conditioned or strictly-conditioned response generation, while using only 6% more parameters. This thesis contains 4 pieces of work relating to the two general problems in dialogue systems: the inherent architecture for dialogue systems based on pre-trained LMs, and enhancement of a general dialogue system for some specific skills. The studies not only propose new approaches that outperform the current state of the art, but also stress the importance of carefully designing the model architecture to fit the task, instead of simply increasing the amount of training data and the raw computation power

Dépôt Institutionnel Numérique