31 research outputs found

    The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

    Full text link
    We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as " burchak " for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17

    Quantifying mutual-understanding in dialogue

    Get PDF
    PhDThere are two components of communication that provide a natural index of mutual-understanding in dialogue. The first is Repair; the ways in which people detect and deal with problems with understanding. The second is Ellipsis/Anaphora; the use of expressions that depend directly on the accessibility of the local context for their interpretation. This thesis explores the use of these two phenomena in systematic comparative analyses of human-human dialogue under different task and media conditions. In order to do this it is necessary to a) develop reliable, valid protocols for coding the different Repair and Ellipsis/Anaphora phenomena b) establish their baseline patterns of distribution in conversation and c) model their basic statistical inter-relationships and their predictive value. Two new protocols for coding Repair and Ellipsis/Anaphora phenomena are presented and applied to two dialogue corpora, one of ordinary 'everyday' conversations and one of task-oriented dialogues. These data illustrate that there are significant differences in how understanding is created and negotiated across conditions. Repair is shown to be a ubiquitous feature in all dialogue. The goals of the speaker directly affect the type of Repair used. Giving instructions leads to a higher rate of self-editing; following instructions increases corrections and requests for clarification. Medium and familiarity also influence Repair; when eye contact is not possible there are a greater number of repeats and clarifications. Anaphora are used less frequently in task-oriented dialogue whereas types of Ellipsis increase. The use of Elliptical phrases that check, confirm or acknowledge is higher when there is no eye contact. Familiar pairs use more elliptical expressions, especially endophora and elliptical questions. Following instructions leads to greater use of elliptical (non-sentential) phrases. Medium, task and social norms all have a measureable effect on the components of dialogue that underpin mutual-understanding

    Robust Dialog Management Through A Context-centric Architecture

    Get PDF
    This dissertation presents and evaluates a method of managing spoken dialog interactions with a robust attention to fulfilling the human user’s goals in the presence of speech recognition limitations. Assistive speech-based embodied conversation agents are computer-based entities that interact with humans to help accomplish a certain task or communicate information via spoken input and output. A challenging aspect of this task involves open dialog, where the user is free to converse in an unstructured manner. With this style of input, the machine’s ability to communicate may be hindered by poor reception of utterances, caused by a user’s inadequate command of a language and/or faults in the speech recognition facilities. Since a speech-based input is emphasized, this endeavor involves the fundamental issues associated with natural language processing, automatic speech recognition and dialog system design. Driven by ContextBased Reasoning, the presented dialog manager features a discourse model that implements mixed-initiative conversation with a focus on the user’s assistive needs. The discourse behavior must maintain a sense of generality, where the assistive nature of the system remains constant regardless of its knowledge corpus. The dialog manager was encapsulated into a speech-based embodied conversation agent platform for prototyping and testing purposes. A battery of user trials was performed on this agent to evaluate its performance as a robust, domain-independent, speech-based interaction entity capable of satisfying the needs of its users

    REVISITING RECOGNIZING TEXTUAL ENTAILMENT FOR EVALUATING NATURAL LANGUAGE PROCESSING SYSTEMS

    Get PDF
    Recognizing Textual Entailment (RTE) began as a unified framework to evaluate the reasoning capabilities of Natural Language Processing (NLP) models. In recent years, RTE has evolved in the NLP community into a task that researchers focus on developing models for. This thesis revisits the tradition of RTE as an evaluation framework for NLP models, especially in the era of deep learning. Chapter 2 provides an overview of different approaches to evaluating NLP sys- tems, discusses prior RTE datasets, and argues why many of them do not serve as satisfactory tests to evaluate the reasoning capabilities of NLP systems. Chapter 3 presents a new large-scale diverse collection of RTE datasets (DNC) that tests how well NLP systems capture a range of semantic phenomena that are integral to un- derstanding human language. Chapter 4 demonstrates how the DNC can be used to evaluate reasoning capabilities of NLP models. Chapter 5 discusses the limits of RTE as an evaluation framework by illuminating how existing datasets contain biases that may enable crude modeling approaches to perform surprisingly well. The remaining aspects of the thesis focus on issues raised in Chapter 5. Chapter 6 addresses issues in prior RTE datasets focused on paraphrasing and presents a high-quality test set that can be used to analyze how robust RTE systems are to paraphrases. Chapter 7 demonstrates how modeling approaches on biases, e.g. adversarial learning, can enable RTE models overcome biases discussed in Chapter 5. Chapter 8 applies these methods to the task of discovering emergency needs during disaster events

    Incrementally resolving references in order to identify visually present objects in a situated dialogue setting

    Get PDF
    Kennington C. Incrementally resolving references in order to identify visually present objects in a situated dialogue setting. Bielefeld: Universität Bielefeld; 2016.The primary concern of this thesis is to model the resolution of spoken referring expressions made in order to identify objects; in particular, everyday objects that can be perceived visually and distinctly from other objects. The practical goal of such a model is for it to be implemented as a component for use in a live, interactive, autonomous spoken dialogue system. The requirement of interaction imposes an added complication; one that has been ignored in previous models and approaches to automatic reference resolution: the model must attempt to resolve the reference incrementally as it unfolds–not wait until the end of the referring expression to begin the resolution process. Beyond components in dialogue systems, reference has been a major player in the philosophy of meaning for longer than a century. For example, Gottlob Frege (1892) has distinguished between Sinn (sense) and Bedeutung (reference), discussed how they are related and how they relate to the meaning of words and expressions. It has furthermore been argued (e.g., Dahlgren (1976)) that reference to entities in the actual world is not just a fundamental notion of semantic theory, but the fundamental notion; for an individual acquiring a language, understanding the meaning of many words and concepts is done via the task of reference, beginning in early childhood. In this thesis, we pursue an account of word meaning that is based on perception of objects; for example, the meaning of the word red is based on visual features that are selected as distinguishing red objects from non-red ones. This thesis proposes two statistical models of incremental reference resolution. Given ex- amples of referring expressions and visual aspects of the objects to which those expressions referred, both model components learn a functional mapping between the words of the refer- ring expressions and the visual aspects. A generative model, the simple incremental update model, presented in Chapter 5, uses a mediating variable to learn the mapping, whereas a dis- criminative model, the words-as-classifiers model, presented in Chapter 6, learns the mapping directly and improves over the generative model. Both models have been evaluated in various reference resolution tasks to objects in virtual scenes as well as real, tangible objects. This thesis shows that both models work robustly and are able to resolve referring expressions made in reference to visually present objects despite realistic, noisy conditions of speech and object recognition. A theoretical and practical comparison is also provided. Special emphasis is given to the discriminative model in this thesis because of its simplicity and ability to represent word meanings. It is in the learning and application of this model that gives credence to the above claim that reference is the fundamental notion for semantic theory and that meanings of (visual) words is done through experiencing referring expressions made to objects that are visually perceivable

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented
    corecore