6 research outputs found
Automatic annotation of context and speech acts for dialogue corpora
Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds ‘Information State Update' (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human-machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly availabl
Automatic Annotation of Context and Speech Acts for Dialogue Corpora.
Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds ‘Information State Update’ (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human–machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available
A classification scheme for annotating speech acts in a business email corpus
This paper reports on the process of manual annotation of speech acts in a corpus
of business emails, in the context of the PROBE project (PRagmatics of
Business English). The project aims to bring together corpus, computational,
and theoretical linguistics by drawing on the insights made available by the
annotated corpus. The corpus data sheds light on the linguistic and discourse
structures of speech act use in business email communication. This enhanced
linguistic description can be compared to theoretical linguistic representations
of speech act categories to assess how well traditional distinctions relate to
real-world, naturally occurring data. From a computational perspective, the
annotated data is required for the development of an automated speech act tagging
tool. Central to this research is the creation of a high quality, manually
annotated speech act corpus, using an easily interpretable classification
scheme. We discuss the scheme chosen for the project and the training guidelines
given to the annotators, and describe the main challenges identified by the
annotators
Recommended from our members
A classification scheme for annotating speech acts in a business email corpus
This paper reports on the process of manual annotation of speech acts in a corpus of business emails, in the context of the PROBE project (PRagmatics of Business English). The project aims to bring together corpus, computational, and theoretical linguistics by drawing on the insights made available by the annotated corpus. The corpus data sheds light on the linguistic and discourse structures of speech act use in business email communication. This enhanced linguistic description can be compared to theoretical linguistic representations of speech act categories to assess how well traditional distinctions relate to real-world, naturally occurring data. From a computational perspective, the annotated data is required for the development of an automated speech act tagging tool. Central to this research is the creation of a high quality, manually annotated speech act corpus, using an easily interpretable classification scheme. We discuss the scheme chosen for the project and the training guidelines given to the annotators, and describe the main challenges identified by the annotators
Anatomy of dialogue in out-of-hospital cardiac arrest resuscitation
Research on medical teams constantly recognise the crucial value of communication. Studies
on various medical teams, such as surgery and trauma, provide evidence for how
communication either affects or is affected by a range of outcomes and variables.
Nevertheless, much of this work has focused on in-hospital communication. Less is known
about the patterns of communication amongst medical practitioners in high-stakes
emergency care outside of the hospital. This thesis presents an investigation of dialogue
during pre-hospital resuscitations when paramedics are responding to out-of-hospital
cardiac arrest (OHCA).
A bespoke dialogue annotation system, called the Dialogue Annotation for Resuscitation
coding scheme (DARe), is developed for this purpose. DARe is used to annotate four
simulated and 40 real-life OHCA resuscitation attempts by paramedics who are based in
Edinburgh, Scotland. We examine (1) the distributions of communicative functions and
subject matters (threads); (2) specific statements used by team members to align
themselves; (3) the prevalence and forms of mitigated directives; (4) the verbal manners of
planning; (5) the occurrence of closed-loop communication and other structures of verbal
communication loops; and (6) the prevalence of socioemotionally-related utterances. For
the real-life resuscitation dialogues, the study additionally investigates (7) the correlations
between the distributions of the dialogue patterns with the assessed performance of
resuscitation team leaders and with the time taken to successfully deploy a mechanical chest
compression device (AutoPulse).
Analysis for the simulation dialogues was performed from the start of simulation until the
end or near the end of the procedure, whilst analysis for the real-life dialogues concentrated
on the first five minutes. Despite this difference in timing, the results showed that simulated
and real-life OHCA dialogues comprised similarly high frequencies of statements, directives,
acceptances, and acknowledgments. Both simulated and real-life dialogues also contained
sociolinguistic influences from the linguistic context that these were derived from, i.e.
Scottish English.
In considering the threads across both settings, the largest proportion of threads revolved
around planning and execution of tasks, followed by threads on patient history and related
instrument/equipment. Dialogues during real-life OHCA resuscitations differed from the
simulated resuscitations in the additional presence of two communicative techniques,
namely Alerters (used to attract hearer’s attention) and Affective performatives (used to
convey affective or socioemotional statements). Additionally, real-life resuscitation dialogues
contained a larger proportion of threads pertaining to patient positioning due to the use of
the AutoPulse.
Resuscitation team members often used a statement structure called State-awareness to
align themselves with one another in terms of their current state or task. Directives were
frequently mitigated, with strategies ranging from simple use of softeners (e.g. please) to
less straightforward directive structures (e.g. suggestion). Plans were verbalised in temporal
clusters, i.e. distinguishable in terms of the immediacy of the task to be performed. Few
verbal affective behaviours (e.g. humour, gratitude, compliments) were observed. Team
members also used very few exchanges that resembled the standard, three-level closed-loop
communication structure typically required from professionals in other high-stakes dialogue
environments.
Correlation analyses revealed that the frequencies of both the communicative functions and
threads were associated with the performance scores of resuscitation team leaders. Teams
led by higher rated leaders (the ideal score group) showed higher proportions of Alerters,
Affective performatives, State-awareness, and Plan of action in their dialogues compared to
teams led by lower rated leaders (the low score group). There were also variations in the
concentrations of chest compressions, patient history, and rhythm threads in the two
groups, indicating that both discussed the same threads but at different junctures of the
procedure. Meanwhile, the time taken to deploy the AutoPulse was positively correlated
with the communicative function Acknowledge and the threads Patient history and
Movement other than patient, and negatively correlated with the communicative function
Open-option and the threads Ventilation and Airway access.
Based on these results, several potential measures for optimising OHCA resuscitation are
proposed: the use of sewn-on name badges for paramedics; shorter time dedicated for the
extraction of patient history; verbal reports of vital points throughout the procedure; the use
of non or less mitigated directives; and standardisation of resuscitation phrases. Each
suggestion is also discussed in terms of anticipated challenges and possible solutions.
The results presented in this thesis provide grounds for further research on the features of
pre-hospital resuscitation dialogues. DARe has been demonstrated to be useful in
discriminating linguistic patterns, suggesting that dialogue annotation analysis can be utilised
to further investigate this area and ultimately contribute to resuscitation performance