1,343 research outputs found
A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications
Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS âturntakingâ behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems
Modelling a conversational agent (Botocrates) for promoting critical thinking and argumentation skills
Students in higher education institutions are often advised to think critically, yet without being guided to do so. The study investigated the use of a conversational agent (Botocrates) for supporting critical thinking and academic argumentation skills. The overarching research questions were: can a conversational agent support critical thinking and academic argumentation skills? If so, how?
The study was carried out in two stages: modelling and evaluating Botocrates' prototype. The prototype was a Wizard-of-Oz system where a human plays Botocrates' role by following a set of instructions and knowledge-base to guide generation of responses. Both stages were conducted at the School of Education at the University of Leeds.
In the first stage, the study analysed 13 logs of online seminars in order to define the tasks and dialogue strategies needed to be performed by Botocrates. The study identified two main tasks of Botocrates: providing answers to students' enquiries and engaging students in the argumentation process. Botocratesâ dialogue strategies and contents were built to achieve these two tasks. The novel theoretical framework of the âchallenge to explainâ process and the notion of the âconstructive expansion of exchange structureâ were produced during this stage and incorporated into Botocratesâ prototype. The aim of the âchallenge to explainâ process is to engage users in repeated and constant cycles of reflective thinking processes. The âconstructive expansion of exchange structureâ is the practical application of the âchallenge to explainâ process.
In the second stage, the study used the Wizard-of-Oz (WOZ) experiments and interviews to evaluate Botocratesâ prototype. 7 students participated in the evaluation stage and each participant was immediately interviewed after chatting with Botocrates. The analysis of the data gathered from the WOZ and interviews showed encouraging results in terms of studentsâ engagement in the process of argumentation. As a result of the role of âcriticâ played by Botocrates during the interactions, users actively and positively adopted the roles of explainer, clarifier, and evaluator. However, the results also showed negative experiences that occurred to users during the interaction. Improving Botocratesâ performance and training users could decrease usersâ unsuccessful and negative experiences. The study identified the critical success and failure factors related to achieving the tasks of Botocrates
Accessing spoken interaction through dialogue processing [online]
Zusammenfassung
Unser Leben, unsere Leistungen und unsere Umgebung, alles wird
derzeit durch Schriftsprache dokumentiert. Die rasante
Fortentwicklung der technischen Möglichkeiten Audio, Bilder und
Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt
werden um die schriftliche Dokumentation von menschlicher
Kommunikation, zum Beispiel Meetings, zu unterstĂŒtzen, zu
ergÀnzen oder gar zu ersetzen. Diese neuen Technologien können
uns in die Lage versetzen Information aufzunehmen, die
anderweitig verloren gehen, die Kosten der Dokumentation zu
senken und hochwertige Dokumente mit audiovisuellem Material
anzureichern. Die Indizierung solcher Aufnahmen stellt die
Kerntechnologie dar um dieses Potential auszuschöpfen. Diese
Arbeit stellt effektive Alternativen zu schlĂŒsselwortbasierten
Indizes vor, die SuchraumeinschrÀnkungen bewirken und teilweise
mit einfachen Mitteln zu berechnen sind.
Die Indizierung von Sprachdokumenten kann auf verschiedenen
Ebenen erfolgen: Ein Dokument gehört stilistisch einer
bestimmten Datenbasis an, welche durch sehr einfache Merkmale
bei hoher Genauigkeit automatisch bestimmt werden kann.
Durch diese Art von Klassifikation kann eine Reduktion des
Suchraumes um einen Faktor der GröĂenordnung 4Â10 erfolgen. Die
Anwendung von thematischen Merkmalen zur Textklassifikation
bei einer Nachrichtendatenbank resultiert in einer Reduktion um
einen Faktor 18. Da Sprachdokumente sehr lang sein können mĂŒssen
sie in thematische Segmente unterteilt werden. Ein neuer
probabilistischer Ansatz sowie neue Merkmale (SprecherinitiaÂ
tive und Stil) liefern vergleichbare oder bessere Resultate als
traditionelle schlĂŒsselwortbasierte AnsĂ€tze. Diese thematische
Segmente können durch die vorherrschende AktivitÀt
charakterisiert werden (erzÀhlen, diskutieren, planen, ...),
die durch ein neuronales Netz detektiert werden kann. Die
Detektionsraten sind allerdings begrenzt da auch Menschen
diese AktivitÀten nur ungenau bestimmen. Eine maximale
Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten
Daten theoretisch möglich. Eine thematische Klassifikation
dieser Segmente wurde ebenfalls auf einer Datenbasis
durchgefĂŒhrt, die Detektionsraten fĂŒr diesen Index sind jedoch
gering.
Auf der Ebene der einzelnen ĂuĂerungen können Dialogakte wie
Aussagen, Fragen, RĂŒckmeldungen (aha, ach ja, echt?, ...) usw.
mit einem diskriminativ trainierten Hidden Markov Model erkannt
werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen
wie Frage/AntwortÂSpielen erweitert werden (Dialogspiele).
Dialogakte und Âspiele können eingesetzt werden um
Klassifikatoren fĂŒr globale Sprechstile zu bauen. Ebenso
könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz
erinnern und versuchen, diese in einer grafischen
ReprÀsentation wiederzufinden.
In einer Studie mit sehr pessimistischen Annahmen konnten
Benutzer eines aus vier Àhnlichen und gleichwahrscheinlichen
GesprÀchen mit einer Genauigkeit von ~ 43% durch eine graphische
ReprÀsentation von AktivitÀt bestimmt.
Dialogakte könnte in diesem Szenario ebenso nĂŒtzlich sein, die
Benutzerstudie konnte aufgrund der geringen Datenmenge darĂŒber
keinen endgĂŒltigen AufschluĂ geben. Die Studie konnte allerdings
fĂŒr detailierte Basismerkmale wie FormalitĂ€t und
SprecheridentitÀt keinen Effekt zeigen.
Abstract
Written language is one of our primary means for documenting our
lives, achievements, and environment. Our capabilities to
record, store and retrieve audio, still pictures, and video are
undergoing a revolution and may support, supplement or even
replace written documentation. This technology enables us to
record information that would otherwise be lost, lower the cost
of documentation and enhance highÂquality documents with
original audiovisual material.
The indexing of the audio material is the key technology to
realize those benefits. This work presents effective
alternatives to keyword based indices which restrict the search
space and may in part be calculated with very limited resources.
Indexing speech documents can be done at a various levels:
Stylistically a document belongs to a certain database which can
be determined automatically with high accuracy using very simple
features. The resulting factor in search space reduction is in
the order of 4Â10 while topic classification yielded a factor
of 18 in a news domain.
Since documents can be very long they need to be segmented into
topical regions. A new probabilistic segmentation framework as
well as new features (speaker initiative and style) prove to be
very effective compared to traditional keyword based methods. At
the topical segment level activities (storytelling, discussing,
planning, ...) can be detected using a machine learning approach
with limited accuracy; however even human annotators do not
annotate them very reliably. A maximum search space reduction
factor of 6 is theoretically possible on the databases used. A
topical classification of these regions has been attempted
on one database, the detection accuracy for that index, however,
was very low.
At the utterance level dialogue acts such as statements,
questions, backchannels (aha, yeah, ...), etc. are being
recognized using a novel discriminatively trained HMM procedure.
The procedure can be extended to recognize short sequences such
as question/answer pairs, so called dialogue games.
Dialog acts and games are useful for building classifiers for
speaking style. Similarily a user may remember a certain dialog
act sequence and may search for it in a graphical
representation.
In a study with very pessimistic assumptions users are able to
pick one out of four similar and equiprobable meetings correctly
with an accuracy ~ 43% using graphical activity information.
Dialogue acts may be useful in this situation as well but the
sample size did not allow to draw final conclusions. However the
user study fails to show any effect for detailed basic features
such as formality or speaker identity
"Shouldn't I use a polarquestion?" Proper Question Forms Disentangling Inconsistencies in Dialogue Systems
This work reports on the description of a specific class of clarification requests, adopted for the negotiation of pieces of information part of the common ground for argumentation strategies in human-machine interaction. Two studies are carried out to prove the adequateness of a specific form of polar question in a specific pragmatic situation, where a presupposition is contradicted by a new evidence. Whereas the first one proves the appropriateness of the negative form, the second one also demonstrate how the use of such a form, in the aforementioned pragmatic situation, can affect the principle of robustness, in terms of observability and recoverability, important in humanâmachine interaction applications. Given the results obtained in the two studies, dialogue systems with such capabilities are, therefore, a desirable goal, as they are expected to lead to improved usability and naturalness in conversation. For this reason, I present here a system capable of detecting conflicts and of using argumentation strategies to signal them consistently with previous observations
On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces
Multimodal systems have attained increased attention in recent years, which has made possible important
improvements in the technologies for recognition, processing, and generation of multimodal information.
However, there are still many issues related to multimodality which are not clear, for example, the
principles that make it possible to resemble human-human multimodal communication. This chapter
focuses on some of the most important challenges that researchers have recently envisioned for future
multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable
and affective multimodal interfaces
Using contextual information to understand searching and browsing behavior
There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications
- âŠ