47 research outputs found

    Dimensionality of Dialogue Act Tagsets: An Empirical Analysis of Large Corpora

    Get PDF
    This article compares one-dimensional and multi-dimensional dialogue act tagsets used for automatic labeling of utterances. The influence of tagset dimensionality on tagging accuracy is first discussed theoretically, then based on empirical data from human and automatic annotations of large scale resources, using four existing tagsets: DAMSL, SWBD-DAMSL, ICSI-MRDA and MALTUS. The Dominant Function Approximation proposes that automatic dialogue act taggers could focus initially on finding the main dialogue function of each utterance, which is empirically acceptable and has significant practical relevance

    Surface and Contextual Linguistic Cues in Dialog Act Classification: A Cognitive Science View

    Full text link
    What role do linguistic cues on a surface and contextual level have in identifying the intention behind an utterance? Drawing on the wealth of studies and corpora from the computational task of dialog act classification, we studied this question from a cognitive science perspective. We first reviewed the role of linguistic cues in dialog act classification studies that evaluated model performance on three of the most commonly used English dialog act corpora. Findings show that frequency‐based, machine learning, and deep learning methods all yield similar performance. Classification accuracies, moreover, generally do not explain which specific cues yield high performance. Using a cognitive science approach, in two analyses, we systematically investigated the role of cues in the surface structure of the utterance and cues of the surrounding context individually and combined. By comparing the explained variance, rather than the prediction accuracy of these cues in a logistic regression model, we found that (1) while surface and contextual linguistic cues can complement each other, surface linguistic cues form the backbone in human dialog act identification, (2) with word frequency statistics being particularly important for the dialog act, and (3) the similar trends across corpora, despite differences in the type of dialog, corpus setup, and dialog act tagset. The importance of surface linguistic cues in dialog act classification sheds light on how both computers and humans take advantage of these cues in speech act recognition

    Recording and transcription of speech and gesture in the narration of Polish adults and children

    Get PDF
    In the present paper, the experimental procedure, the details of sound and video recording set-up as well as the system for speech and gesture transciption and coding used in the Polish Cartoon Narration Corpus (PCNC) project are described. The audio-visual data come from a cartoon narration task performed by both children and adults. The recordings are transcribed orthographically and phonemically, and labelled for selected phenomena on a number of levels, including gesture, lexicon, prosody, and dialogue acts.In the present paper, the experimental procedure, the details of sound and video recording set-up as well as the system for speech and gesture transciption and coding used in the Polish Cartoon Narration Corpus (PCNC) project are described. The audio-visual data come from a cartoon narration task performed by both children and adults. The recordings are transcribed orthographically and phonemically, and labelled for selected phenomena on a number of levels, including gesture, lexicon, prosody, and dialogue acts

    「日本語日常会話コーパス」への談話行為アノテーションの試み : タグ選択が困難な事例に焦点を当てて

    Get PDF
    National Institute for Japanese Language and LinguisticsWaseda UniversityChiba University,National Institute for Japanese Language and Linguistics会議名: 言語資源活用ワークショップ2018, 開催地: 国立国語研究所, 会期: 2018年9月4日-5日, 主催: 国立国語研究所 コーパス開発センター本研究では日常生活の中に生じた,具体的な文脈の中に埋め込まれた会話を扱った「日本語日常会話コーパス(CEJC)」に対する談話行為アノテーションの試みについて報告を行う。現在試行中の枠組みについて紹介した上で,実際のアノテーション作業の中で見出された談話行為の判断が困難な事例を示し,その要因についてCEJCの特性を参照しながら議論する

    What Determines Inter-Coder Agreement in Manual Annotations? A Meta-Analytic Investigation

    Get PDF
    Recent discussions of annotator agreement have mostly centered around its calculation and interpretation, and the correct choice of indices. Although these discussions are important, they only consider the "back-end" of the story, namely, what to do once the data are collected. Just as important in our opinion is to know how agreement is reached in the first place and what factors influence coder agreement as part of the annotation process or setting, as this knowledge can provide concrete guidelines for the planning and set-up of annotation projects. To investigate whether there are factors that consistently impact annotator agreement we conducted a meta-analytic investigation of annotation studies reporting agreement percentages. Our meta-analysis synthesized factors reported in 96 annotation studies from three domains (word-sense disambiguation, prosodic transcriptions, and phonetic transcriptions) and was based on a total of 346 agreement indices. Our analysis identified seven factors that influence reported agreement values: annotation domain, number of categories in a coding scheme, number of annotators in a project, whether annotators received training, the intensity of annotator training, the annotation purpose, and the method used for the calculation of percentage agreements. Based on our results we develop practical recommendations for the assessment, interpretation, calculation, and reporting of coder agreement. We also briefly discuss theoretical implications for the concept of annotation quality

    Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

    Get PDF
    Peer reviewe

    On Making in the Digital Humanities

    Get PDF
    On Making in the Digital Humanities fills a gap in our understanding of digital humanities projects and craft by exploring the processes of making as much as the products that arise from it. The volume draws focus to the interwoven layers of human and technological textures that constitute digital humanities scholarship. To do this, it assembles a group of well-known, experienced and emerging scholars in the digital humanities to reflect on various forms of making (we privilege here the creative and applied side of the digital humanities). The volume honours the work of John Bradley, as it is totemic of a practice of making that is deeply informed by critical perspectives. A special chapter also honours the profound contributions that this volume’s co-editor, Stéfan Sinclair, made to the creative, applied and intellectual praxis of making and the digital humanities. Stéfan Sinclair passed away on 6 August 2020. The chapters gathered here are individually important, but together provide a very human view on what it is to do the digital humanities, in the past, present and future. This book will accordingly be of interest to researchers, teachers and students of the digital humanities; creative humanities, including maker spaces and culture; information studies; the history of computing and technology; and the history of science and the humanities

    Exploring Higher Order Dependency Parsers

    Get PDF
    Syntakticka analyza jejednim z nejdulezitejsich kroku pocitacoveho zpracovani pfirozenych jazyku. V teto praci se zamefujeme na formalismus zavislostni gramatiky, protoze jeho hlavnf principy, zejmena vztah fidiciho a zavisleho uzlu, se ukazaly uzitecne pro fadu rozdilnych jazyku, se zvlastnim zfetelem na vysvetleni slovosledu a vztahu mezi povrchovou strukturou a vyznamem. Vetsina modernich efektivnich algoritmu zavislostni syntakticke analyzy je zalozena na faktorizaci zavislostnich stromu. Ve vetsine techto pffstupu analyzator (parser) ztraci znacnou cast kontextove informace behem procesu faktorizace. V teto praci zkoumame, jak syntakticko-semanticke rysy ovlivnuji metody diskriminativniho strojoveho uceni vyssiho fadu pro zavislostni syntaktickou analyzu. Ukazujeme, ze lingvisticke rysy v mnoha pfipadech pfinaseji vyznamne zlepseni lispesnosti. Nejdrive pfinasime pfehled nekolika diskriminativnich metod uceni pro grafove statisticke zavislostni parsery a vysvetlujeme koncept vyssiho fadu, coz je zobecneni prace (Koo a Collins 2010) a (McDonald et al. 2006). Tonas dovede kjadru prace - rysovemu inzenyrstvi pro zavislostni parsery vyssiho fadu. Experimentujeme s nekolika syntakticko-semantickymi rysy a snazime se vysvetlit jejich teoreticke zaklady. Pokusy provadime na dvou odlisnych jazycich -...Most of the recent efficient algorithms for dependency parsing work by factoring the dependency trees. In most of these approaches, the parser loses much of the contextual information during the process of factorization. There have been approaches to build higher order dependency parsers - second order, [Carreras2007] and third order [Koo and Collins2010]. In the thesis, the approach by Koo and Collins should be further exploited in one or more ways. Possible directions of further exploitation include but are not limited to: investigating possibilities of extension of the approach to non-projective parsing; integrating labeled parsing; joining word-senses during the parsing phase [Eisner2000]Institute of Formal and Applied LinguisticsÚstav formální a aplikované lingvistikyFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Towards collaborative dialogue in Minecraft

    Get PDF
    This dissertation describes our work in building interactive agents that can communicate with humans to collaboratively solve tasks in grounded scenarios. To investigate the challenges of building such agents, we define a novel instantiation of a situated, Minecraft-based, Collaborative Building Task in which one player (A, the Architect) is shown a target structure, denoted Target, and needs to instruct the other player (B, the Builder) to build a copy of this structure, denoted Built, in a predefined build region. While both players can interact asynchronously via a chat interface, we define the roles to be asymmetric: A can observe B and Target, but is invisible and cannot place blocks; meanwhile, B can freely place and remove blocks, but has no explicit knowledge of the target structure. Each agent requires a different set of abilities in order to be successful at this task: specifically, A's main challenges arise in the task of generating situated instructions by comparing Built and Target, while B's responsibilities focus mainly on comprehending A's situated instructions using both dialogue and world context. Both agents must be able to interact asynchronously in an evolving dialogue context and a dynamic world state within which they are embodied. In this work, we specifically examine how well end-to-end neural models can learn to be instruction givers (i.e., Architects) from a limited amount of real human-human data. In order to examine how humans complete the Collaborative Building Task, as well as use human-human data as a gold standard for training and evaluating models, we present the Minecraft Dialogue Corpus, a collection of 509 conversations and game logs. We then introduce baseline models for the challenging subtask of Architect utterance generation, and evaluate them offline, using both automated metrics and human evaluation. We show that while conditioning our model on a simple representation of the world gives our model improved ability to generate correct instructions, there are still many obvious shortcomings, and it is difficult for these models to learn the large variety of abilities needed to be successful Architects in an entirely end-to-end manner. To combat this, we show that including meaningful, structured inputs about the world and discourse state as additional inputs -- specifically, by adding oracle information about the Builder's next actions, as well as enriching our linguistic representation with Architect dialogue acts -- improves the performance of our utterance generation models. We also augment the data with shape information by pretraining 3D shape localization models on synthetically generated block configurations. Finally, we integrate Architect utterance generation models into actual Minecraft agents and evaluate them in a fully interactive setting
    corecore