4,428 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Can we Help the Bots? Towards an Evaluation of their Performance and the Creation of Human Enhanced Artifact for Emotions De-escalation

    Get PDF
    We propose a hybrid intelligence socio-technical artifact that identifies a threshold where the chatbot requires human intervention in order to continue to perform at an appropriate level to achieve the pre-defined objective of the system. We leverage the Yield Shift Theory of Satisfaction, the Intervention Theory and the Nudge Theory to develop meta requirements and design principles for this system. We discuss the first iteration of implementation and evaluation of the artifact components

    Understanding and Predicting User Satisfaction with Conversational Recommender Systems

    Get PDF
    User satisfaction depicts the effectiveness of a system from the user’s perspective. Understanding and predicting user satisfaction is vital for the design of user-oriented evaluation methods for conversational recommender systems (CRSs). Current approaches rely on turn-level satisfaction ratings to predict a user’s overall satisfaction with CRS. These methods assume that all users perceive satisfaction similarly, failing to capture the broader dialogue aspects that influence overall user satisfaction. We investigate the effect of several dialogue aspects on user satisfaction when interacting with a CRS. To this end, we annotate dialogues based on six aspects (i.e., relevance, interestingness, understanding, task-completion, interest-arousal, and efficiency) at the turn and dialogue levels. We find that the concept of satisfaction varies per user. At the turn level, a system’s ability to make relevant recommendations is a significant factor in satisfaction. We adopt these aspects as features for predicting response quality and user satisfaction. We achieve an F1-score of 0.80 in classifying dissatisfactory dialogues, and a Pearson’s r of 0.73 for turn-level response quality estimation, demonstrating the effectiveness of the proposed dialogue aspects in predicting user satisfaction and being able to identify dialogues where the system is failing

    Optimising user experience with: conversational Interfaces

    Get PDF
    Dissertação de Mestrado em Engenharia InformáticaUser Experience is one of the main aspects that maintain a customer loyal to cloud based solutions or SaaS (Software as a Service). With the rise of the natural language processing techniques, the industry is looking at automated chatbot solutions to boost and expand their services. This thesis presents a practical case study of the implementation of a chatbot solution to complement a CRM (Customer Relationship Management) software called FOXAIO, and then quantify, following the most appropriate guides and solutions available, the User Experience (UX) optimisation. In order to create a robust and scalable solution based on the constraints created by the company in the case, we reviewed the current deep learning techniques, tools and libraries available to help the development process. The most proven techniques in the field of Natural Language Processing (NLP) will be introduced. To achieve the goals of this solution without "reinventing the wheel", we present possible architectures to use at the top of some open source and available tools on the market, with a special relief in the framework RASA. Also we discussed some of possible techniques to create the intent classifier, where we detail the better performance in the top of the rasa tensorflow embedding pipeline for this particular case. The conversational system, also, required a channel to interact with the final user. To achieve that, we also implemented a basic chat interface created on the top of the socket protocol, which communicate with the conversation system. In any case, it would be possible to extend to the other channel’s available on the market, like messenger, slack, telegram. Finally, we detail with a few use cases, that’s hypothetically possible to improve the user experience of an existing software system (FOXAIO) using a conversational interface on the top of that. Also, we achieved some highlights about the preference to use a conversational interface because of his simplicity, defended by a better score in the SUS scale, 70 against 58 to the traditional UI, and good indicatives by the HEART framework.O User Experience é possivelmente um dos principais aspetos para fidelizar um cliente numa solução cloud, as chamadas soluções SaaS (Software as a Service). O crescimento acentuado deste tipo de soluções aquece a rivalidade entre competidores e cada vez mais pretende-se oferecer as formas mais revolucionárias para premiar a qualidade de um serviço. Com o crescimento acentuado das técnicas na área do NLP (Natural Language Processing) a indústria começa a olhar para os chatbots como uma possível solução de automatizar, impulsionar e expandir as suas ofertas. A presente tese visa a apresentar uma implementação prática de um chatbot sobre um software com semelhanças de um CRM (Customer Relationship Management) existente intitulado por FOXAIO. Com o objetivo de desenvolver uma solução robusta e escalável tendo em atenção as condições elaboradas pela empresa em questão, um longo e detalhado estudo foi elaborado sobre as mais diversas técnicas de deep learning usadas no ramo de Processamento de Linguagem Natural (NLP). Atribuindo um particular ênfase às redes neurais recorrentes (RNN) e com a devida extensão Long Short Term Memory (LSTM) que juntas, formam e trabalham muito bem na resolução dos problemas de um sistema de inteligência artificial, como é o caso. Para a sua implementação sobre um software já existente, foi necessário o desenvolvimento de uma pequena interface conversacional com o objetivo de mais tarde a complementar sobre a interface do utilizador do mesmo. Para esse efeito, foi implementado um canal sobre o sistema conversacional de comunicação em protocolo de socket, criando uma classe para o efeito que mais tarde seria útil para gerar logs de análise. Durante a implementação do sistema conversacional foram feitas várias comparações sobre as variantes dos seus módulos desde o Dialog Management (DM) ao Intent Classifier onde várias arquiteturas foram expostas e comparadas com o intuito de corresponder à melhor solução possível para um chatbot de língua portuguesa em primeira instância, foi optado pela escolha de um Dialog Management híbrido face ao domínio e à existência de conversas contextuais contínuas onde, por exemplo, se torna bastante difícil de desenvolver sobre outros paradigmas. Quanto ao Intent Classifier, foi usada a técnica rasa tensorflow embedding, esta técnica (que treina palavras do princípio) usada obteve melhores resultados para o particular caso estudado na presente tese (CRM), do que por exemplo o uso um modelo de dados com palavras já treinadas. Finalmente, conseguimos apresentar hipoteticamente, possíveis melhorias do UX no uso de uma interface conversacional sobre uma interface tradicional, usando as várias ferramentas de análise disponíveis, onde por exemplo com o auxílio da framework HEART (criada pelo Google), conseguimos obter indicativos bastante satisfatórios por 34 pessoas que fizeram os primeiros testes no chatbot desenvolvido. Examinando o feedback desses mesmos utilizadores em ambiente de teste, conseguimos obter um resultado na escala de SUS (System Usability Scale) com um valor de 70, enquanto a interface tradicional arrecadou 58, notando então que as pessoas se sentiram mais capazes no uso do sistema conversacional
    corecore