459 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Personalized Memory Transfer for Conversational Recommendation Systems

    Get PDF
    Dialogue systems are becoming an increasingly common part of many users\u27 daily routines. Natural language serves as a convenient interface to express our preferences with the underlying systems. In this work, we implement a full-fledged Conversational Recommendation System, mainly focusing on learning user preferences through online conversations. Compared to the traditional collaborative filtering setting where feedback is provided quantitatively, conversational users may only indicate their preferences at a high level with inexact item mentions in the form of natural language chit-chat. This makes it harder for the system to correctly interpret user intent and in turn provide useful recommendations to the user. To tackle the ambiguities in natural language conversations, we propose Personalized Memory Transfer (PMT) which learns a personalized model in an online manner by leveraging a key-value memory structure to distill user feedback directly from conversations. This memory structure enables the integration of prior knowledge to transfer existing item representations/preferences and natural language representations. We also implement a retrieval based response generation module, where the system in addition to recommending items to the user, also responds to the user, either to elicit more information regarding the user intent or just for a casual chit-chat. The experiments were conducted on two public datasets and the results demonstrate the effectiveness of the proposed approach

    Detecting Team Conflict From Multiparty Dialogue

    Get PDF
    The emergence of online collaboration platforms has dramatically changed the dynamics of human teamwork, creating a veritable army of virtual teams composed of workers in different physical locations. The global world requires a tremendous amount of collaborative problem solving, primarily virtual, making it an excellent domain for computer scientists and team cognition researchers who seek to understand the dynamics involved in collaborative tasks to provide a solution that can support effective collaboration. Mining and analyzing data from collaborative dialogues can yield insights into virtual teams\u27 thought processes and help develop virtual agents to support collaboration. Good communication is indubitably the foundation of effective collaboration. Over time teams develop their own communication styles and often exhibit entrainment, a conversational phenomenon in which humans synchronize their linguistic choices. This dissertation presents several technical innovations in the usage of machine learning towards analyzing, monitoring, and predicting collaboration success from multiparty dialogue by successfully handling the problems of resource scarcity and natural distribution shifts. First, we examine the problem of predicting team performance from embeddings learned from multiparty dialogues such that teams with similar conflict scores lie close to one another in vector space. We extract the embeddings from three types of features: 1) dialogue acts 2) sentiment polarity 3) syntactic entrainment. Although all of these features can be used to predict team performance effectively, their utility varies by the teamwork phase. We separate the dialogues of players playing a cooperative game into stages: 1) early (knowledge building), 2) middle (problem-solving), and 3) late (culmination). Unlike syntactic entrainment, both dialogue act and sentiment embeddings effectively classify team performance, even during the initial phase. Second, we address the problem of learning generalizable models of collaboration. Machine learning models often suffer domain shifts; one advantage of encoding the semantic features is their adaptability across multiple domains. We evaluate the generalizability of different embeddings to other goal-oriented teamwork dialogues. Finally, in addition to identifying the features predictive of successful collaboration, we propose multi-feature embedding (MFeEmb) to improve the generalizability of collaborative task success prediction models under natural distribution shifts and resource scarcity. MFeEmb leverages the strengths of semantic, structural, and textual features of the dialogues by incorporating the most meaningful information from dialogue acts (DAs), sentiment polarities, and vocabulary of the dialogues. To further enhance the performance of MFeEmb under a resource-scarce scenario, we employ synthetic data generation and few-shot learning. We use the method proposed by Bailey and Chopra (2018) for few-shot learning from the FsText python library. We replaced the universal embedding with our proposed multi-feature embedding to compare the performance of the two. For data augmentation, we propose using synonym replacement from collaborative dialogue vocabulary instead of synonym replacement from WordNet. The research was conducted on several multiparty dialogue datasets, including ASIST, SwDA, Hate Speech, Diplomacy, Military, SAMSum, AMI, and GitHub. Results show that the proposed multi-feature embedding is an excellent choice for the meta-training stage of the few-shot learning, even if it learns from a small train set of size as small as 62 samples. Also, our proposed data augmentation method showed significant performance improvement. Our research has potential ramifications for the development of conversational agents that facilitate teaming as well as towards the creation of more effective social coding platforms to better support teamwork between software engineers

    Artificial Intelligence Chatbots: A Survey of Classical versus Deep Machine Learning Techniques

    Get PDF
    Artificial Intelligence (AI) enables machines to be intelligent, most importantly using Machine Learning (ML) in which machines are trained to be able to make better decisions and predictions. In particular, ML-based chatbot systems have been developed to simulate chats with people using Natural Language Processing (NLP) techniques. The adoption of chatbots has increased rapidly in many sectors, including, Education, Health Care, Cultural Heritage, Supporting Systems and Marketing, and Entertainment. Chatbots have the potential to improve human interaction with machines, and NLP helps them understand human language more clearly and thus create proper and intelligent responses. In addition to classical ML techniques, Deep Learning (DL) has attracted many researchers to develop chatbots using more sophisticated and accurate techniques. However, research has paid chatbots have widely been developed for English, there is relatively less research on Arabic, which is mainly due to its complexity and lack of proper corpora compared to English. Though there have been several survey studies that reviewed the state-of-the-art of chatbot systems, these studies (a) did not give a comprehensive overview of how different the techniques used for Arabic chatbots in comparison with English chatbots; and (b) paid little attention to the application of ANN for developing chatbots. Therefore, in this paper, we conduct a literature survey of chatbot studies to highlight differences between (1) classical and deep ML techniques for chatbots; and (2) techniques employed for Arabic chatbots versus those for other languages. To this end, we propose various comparison criteria of the techniques, extract data from collected studies accordingly, and provide insights on the progress of chatbot development for Arabic and what still needs to be done in the future

    Learning to merge - language and vision: A deep evaluation of the encoder, the role of the two modalities, the role of the training task.

    Get PDF
    Most human language understanding is grounded in perception. There is thus growing interest in combining information from language and vision. Multiple models based on Neural Networks have been proposed to merge language and vision information. All the models share a common backbone consisting of an encoder which learns to merge the two types of representation to perform a specific task. While some models have seemed extremely successful on those tasks, it remains unclear how the reported results should be interpreted and what those models are actually learning. Our contribution is three-fold. We have proposed (a) a new model of Visually Grounded Dialogue; (b) a diagnostic dataset to evaluate the encoder ability to merge visual and language input; (c) a method to evaluate the quality of the multimodal representation computed by the encoder as general purposed representations. We have proposed and analyzed a cognitive plausible architecture in which dialogue system modules are connected through a common \emph{grounded dialogue state encoder}. Our in-depth analysis of the dialogues shows the importance of going beyond task-success in the evaluation of Visual Dialogues: the dialogues themselves should play a crucial role in such evaluation. We have proposed a diagnostic dataset, \emph{FOIL} which consists of images associated with incorrect captions that the model has to detect and correct. Finally, we have used FOIL to evaluate the quality of the multimodal representation produced by an encoder trained on different multimodal tasks. We have shown how the training task used effects the stability of the representation, their transferability and the model confidence

    Deep learning and reinforcement learning methods for grounded goal-oriented dialogue

    Full text link
    Les systèmes de dialogues sont à même de révolutionner l'interaction entre l'homme et la machine. Pour autant, les efforts pour concevoir des agents conversationnels se sont souvent révélés infructueux, et ceux, malgré les dernières avancées en apprentissage profond et par renforcement. Les systèmes de dialogue palissent de devoir opérer sur de nombreux domaines d'application mais pour lesquels aucune mesure d'évaluation claire n'a été définie. Aussi, cette thèse s'attache à étudier les dialogues débouchant sur un objectif clair (goal-oriented dialogue) permettant de guider l'entrainement, et ceci, dans des environnements multimodaux. Plusieurs raisons expliquent ce choix : (i) cela contraint le périmètre de la conversation, (ii) cela introduit une méthode d'évaluation claire, (iii) enfin, l'aspect multimodal enrichie la représentation linguistique en reliant l'apprentissage du langage avec des expériences sensorielles. En particulier, nous avons développé GuessWhat?! (Qu-est-ce donc?!), un jeu imagé coopératif où deux joueurs tentent de retrouver un objet en posant une série de questions. Afin d’apprendre aux agents de répondre aux questions sur les images, nous avons développés une méthode dites de normalisation conditionnée des données (Conditional Batch Nornalization). Ainsi, cette méthode permet d'adapter simplement mais efficacement des noyaux de convolutions visuels en fonction de la question en cours. Enfin, nous avons étudié les tâches de navigation guidée par dialogue, et introduit la tâche Talk the Walk (Raconte-moi le Chemin) à cet effet. Dans ce jeu, deux agents, un touriste et un guide, s'accordent afin d'aider le touriste à traverser une reconstruction virtuelle des rues de New-York et atteindre une position prédéfinie.While dialogue systems have the potential to fundamentally change human-machine interaction, developing general chatbots with deep learning and reinforce-ment learning techniques has proven difficult. One challenging aspect is that these systems are expected to operate in broad application domains for which there is not a clear measure of evaluation. This thesis investigates goal-oriented dialogue tasks in multi-modal environments because it (i) constrains the scope of the conversa-tion, (ii) comes with a better-defined objective, and (iii) enables enriching language representations by grounding them to perceptual experiences. More specifically, we develop GuessWhat, an image-based guessing game in which two agents cooper-ate to locate an unknown object through asking a sequence of questions. For the subtask of visual question answering, we propose Conditional Batch Normalization layers as a simple but effective conditioning method that adapts the convolutional activations to the specific question at hand. Finally, we investigate the difficulty of dialogue-based navigation by introducing Talk The Walk, a new task where two agents (a “tourist” and a “guide”) collaborate to have the tourist navigate to target locations in the virtual streets of New York City

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    EDM 2011: 4th international conference on educational data mining : Eindhoven, July 6-8, 2011 : proceedings

    Get PDF

    Management: A bibliography for NASA managers

    Get PDF
    This bibliography lists 706 reports, articles, and other documents introduced into the NASA scientific and technical information system in 1984. Entries, which include abstracts, are arranged in the following categories: human factors and personnel issues; management theory and techniques; industrial management and manufacturing; robotics and expert systems; computers and information management; research and development; economics, costs, and markets; logistics and operations management; reliability and quality control; and legality, legislation, and policy. Subject, personal author, corporate source, contract number, report number, and accession number indexes are included
    corecore