35 research outputs found

    Towards Finding and Fixing Fragments–-Using ML to Identify Non-Sentential Utterances and their Antecedents in Multi-Party Dialogue

    Get PDF
    Schlangen D. Towards Finding and Fixing Fragments–-Using ML to Identify Non-Sentential Utterances and their Antecedents in Multi-Party Dialogue. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL05). Ann Arbor, Michigan: Association for Computational Linguistics; 2005: 247-254

    Answering questions about archived, annotated meetings

    Get PDF
    Retrieving information from archived meetings is a new domain of information retrieval that has received increasing attention in the past few years. Search in spontaneous spoken conversations has been recognized as more difficult than text-based document retrieval because meeting discussions contain two levels of information: the content itself, i.e. what topics are discussed, but also the argumentation process, i.e. what conflicts are resolved and what decisions are made. To capture the richness of information in meetings, current research focuses on recording meetings in Smart-Rooms, transcribing meeting discussion into text and annotating discussion with semantic higher-level structures to allow for efficient access to the data. However, it is not yet clear what type of user interface is best suited for searching and browsing such archived, annotated meetings. Content-based retrieval with keyword search is too naive and does not take into account the semantic annotations on the data. The objective of this thesis is to assess the feasibility and usefulness of a natural language interface to meeting archives that allows users to ask complex questions about meetings and retrieve episodes of meeting discussions based on semantic annotations. The particular issues that we address are: the need of argumentative annotation to answer questions about meetings; the linguistic and domain-specific natural language understanding techniques required to interpret such questions; and the use of visual overviews of meeting annotations to guide users in formulating questions. To meet the outlined objectives, we have annotated meetings with argumentative structure and built a prototype of a natural language understanding engine that interprets questions based on those annotations. Further, we have performed two sets of user experiments to study what questions users ask when faced with a natural language interface to annotated meeting archives. For this, we used a simulation method called Wizard of Oz, to enable users to express questions in their own terms without being influenced by limitations in speech recognition technology. Our experimental results show that technically it is feasible to annotate meetings and implement a deep-linguistic NLU engine for questions about meetings, but in practice users do not consistently take advantage of these features. Instead they often search for keywords in meetings. When visual overviews of the available annotations are provided, users refer to those annotations in their questions, but the complexity of questions remains simple. Users search with a breadth-first approach, asking questions in sequence instead of a single complex question. We conclude that natural language interfaces to meeting archives are useful, but that more experimental work is needed to find ways to incent users to take advantage of the expressive power of natural language when asking questions about meetings

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented

    Quantifying mutual-understanding in dialogue

    Get PDF
    PhDThere are two components of communication that provide a natural index of mutual-understanding in dialogue. The first is Repair; the ways in which people detect and deal with problems with understanding. The second is Ellipsis/Anaphora; the use of expressions that depend directly on the accessibility of the local context for their interpretation. This thesis explores the use of these two phenomena in systematic comparative analyses of human-human dialogue under different task and media conditions. In order to do this it is necessary to a) develop reliable, valid protocols for coding the different Repair and Ellipsis/Anaphora phenomena b) establish their baseline patterns of distribution in conversation and c) model their basic statistical inter-relationships and their predictive value. Two new protocols for coding Repair and Ellipsis/Anaphora phenomena are presented and applied to two dialogue corpora, one of ordinary 'everyday' conversations and one of task-oriented dialogues. These data illustrate that there are significant differences in how understanding is created and negotiated across conditions. Repair is shown to be a ubiquitous feature in all dialogue. The goals of the speaker directly affect the type of Repair used. Giving instructions leads to a higher rate of self-editing; following instructions increases corrections and requests for clarification. Medium and familiarity also influence Repair; when eye contact is not possible there are a greater number of repeats and clarifications. Anaphora are used less frequently in task-oriented dialogue whereas types of Ellipsis increase. The use of Elliptical phrases that check, confirm or acknowledge is higher when there is no eye contact. Familiar pairs use more elliptical expressions, especially endophora and elliptical questions. Following instructions leads to greater use of elliptical (non-sentential) phrases. Medium, task and social norms all have a measureable effect on the components of dialogue that underpin mutual-understanding
    corecore