6 research outputs found

    Automatic annotation of context and speech acts for dialogue corpora

    Get PDF
    Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds ‘Information State Update' (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human-machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly availabl

    Automatic Annotation of Context and Speech Acts for Dialogue Corpora.

    Get PDF
    Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds ‘Information State Update’ (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human–machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available

    A classification scheme for annotating speech acts in a business email corpus

    Get PDF
    This paper reports on the process of manual annotation of speech acts in a corpus of business emails, in the context of the PROBE project (PRagmatics of Business English). The project aims to bring together corpus, computational, and theoretical linguistics by drawing on the insights made available by the annotated corpus. The corpus data sheds light on the linguistic and discourse structures of speech act use in business email communication. This enhanced linguistic description can be compared to theoretical linguistic representations of speech act categories to assess how well traditional distinctions relate to real-world, naturally occurring data. From a computational perspective, the annotated data is required for the development of an automated speech act tagging tool. Central to this research is the creation of a high quality, manually annotated speech act corpus, using an easily interpretable classification scheme. We discuss the scheme chosen for the project and the training guidelines given to the annotators, and describe the main challenges identified by the annotators

    Anatomy of dialogue in out-of-hospital cardiac arrest resuscitation

    Get PDF
    Research on medical teams constantly recognise the crucial value of communication. Studies on various medical teams, such as surgery and trauma, provide evidence for how communication either affects or is affected by a range of outcomes and variables. Nevertheless, much of this work has focused on in-hospital communication. Less is known about the patterns of communication amongst medical practitioners in high-stakes emergency care outside of the hospital. This thesis presents an investigation of dialogue during pre-hospital resuscitations when paramedics are responding to out-of-hospital cardiac arrest (OHCA). A bespoke dialogue annotation system, called the Dialogue Annotation for Resuscitation coding scheme (DARe), is developed for this purpose. DARe is used to annotate four simulated and 40 real-life OHCA resuscitation attempts by paramedics who are based in Edinburgh, Scotland. We examine (1) the distributions of communicative functions and subject matters (threads); (2) specific statements used by team members to align themselves; (3) the prevalence and forms of mitigated directives; (4) the verbal manners of planning; (5) the occurrence of closed-loop communication and other structures of verbal communication loops; and (6) the prevalence of socioemotionally-related utterances. For the real-life resuscitation dialogues, the study additionally investigates (7) the correlations between the distributions of the dialogue patterns with the assessed performance of resuscitation team leaders and with the time taken to successfully deploy a mechanical chest compression device (AutoPulse). Analysis for the simulation dialogues was performed from the start of simulation until the end or near the end of the procedure, whilst analysis for the real-life dialogues concentrated on the first five minutes. Despite this difference in timing, the results showed that simulated and real-life OHCA dialogues comprised similarly high frequencies of statements, directives, acceptances, and acknowledgments. Both simulated and real-life dialogues also contained sociolinguistic influences from the linguistic context that these were derived from, i.e. Scottish English. In considering the threads across both settings, the largest proportion of threads revolved around planning and execution of tasks, followed by threads on patient history and related instrument/equipment. Dialogues during real-life OHCA resuscitations differed from the simulated resuscitations in the additional presence of two communicative techniques, namely Alerters (used to attract hearer’s attention) and Affective performatives (used to convey affective or socioemotional statements). Additionally, real-life resuscitation dialogues contained a larger proportion of threads pertaining to patient positioning due to the use of the AutoPulse. Resuscitation team members often used a statement structure called State-awareness to align themselves with one another in terms of their current state or task. Directives were frequently mitigated, with strategies ranging from simple use of softeners (e.g. please) to less straightforward directive structures (e.g. suggestion). Plans were verbalised in temporal clusters, i.e. distinguishable in terms of the immediacy of the task to be performed. Few verbal affective behaviours (e.g. humour, gratitude, compliments) were observed. Team members also used very few exchanges that resembled the standard, three-level closed-loop communication structure typically required from professionals in other high-stakes dialogue environments. Correlation analyses revealed that the frequencies of both the communicative functions and threads were associated with the performance scores of resuscitation team leaders. Teams led by higher rated leaders (the ideal score group) showed higher proportions of Alerters, Affective performatives, State-awareness, and Plan of action in their dialogues compared to teams led by lower rated leaders (the low score group). There were also variations in the concentrations of chest compressions, patient history, and rhythm threads in the two groups, indicating that both discussed the same threads but at different junctures of the procedure. Meanwhile, the time taken to deploy the AutoPulse was positively correlated with the communicative function Acknowledge and the threads Patient history and Movement other than patient, and negatively correlated with the communicative function Open-option and the threads Ventilation and Airway access. Based on these results, several potential measures for optimising OHCA resuscitation are proposed: the use of sewn-on name badges for paramedics; shorter time dedicated for the extraction of patient history; verbal reports of vital points throughout the procedure; the use of non or less mitigated directives; and standardisation of resuscitation phrases. Each suggestion is also discussed in terms of anticipated challenges and possible solutions. The results presented in this thesis provide grounds for further research on the features of pre-hospital resuscitation dialogues. DARe has been demonstrated to be useful in discriminating linguistic patterns, suggesting that dialogue annotation analysis can be utilised to further investigate this area and ultimately contribute to resuscitation performance
    corecore