Search CORE

150 research outputs found

System-initiated digressions and hidden menu options in automated spoken dialogue systems

Author: Wilkie Jenny G.M.
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Edinburgh Research Archive

Usability engineering of surname capture strategies in automated telephony and multimodal spoken language dialogue services

Author: Davidson Nancie
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Temporal entrainment in overlapping speech

Author: Wlodarczak Marcin
Publication venue: Bielefeld University
Publication date: 01/01/2014
Field of study

Wlodarczak M. Temporal entrainment in overlapping speech. Bielefeld: Bielefeld University; 2014

Publications at Bielefeld University

A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

Author: Kousidis Spyridon, [Thesis]
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2010
Field of study

Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

Arrow@TUDublin

Cognitive aspects of embodied conversational agents

Author: Smith Cameron
Publication venue
Publication date: 20/03/2013
Field of study

Teeside University's Research Repository

VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH

Author: Miley Wilson Erin Victoria
Publication venue: Queen Margaret University, Edinburgh
Publication date: 01/01/2021
Field of study

Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods. Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression. Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly. Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science

Queen Margaret University eResearch

How Was Your Day? evaluating a conversational companion

Author: Benyon David
Hansen Preben
Mival Oli
Webb Nick
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2012
Field of study

The “How Was Your Day” (HWYD) Companion is an embodied conversational agent that can discuss work-related issues, entering free-form dialogues that lack any clearly defined tasks and goals. The development of this type of Companion technology requires new models of evaluation. Here, we describe a paradigm and methodology for evaluating the main aspects of such functionality in conjunction with overall system behaviour, with respect to three parameters: functional ability (i.e., does it do the ‘right’ thing), content (i.e., does it respond appropriately to the semantic context), and emotional behaviour (i.e., given the emotional input from the user, does it respond in an emotionally appropriate way). We demonstrate the functionality of our evaluation paradigm as a method for both grading current system performance, and targeting areas for particular performance review. We show correlation between, for example, ASR performance and overall system performance (as is expected in systems of this type) but beyond this, we show where individual utterances or responses, which are indicated as positive or negative, show an immediate response from the user, and demonstrate how our combination evaluation approach highlights issues (both positive and negative) in the Companion system’s interaction behaviou

Repository@Napier

A model for mobile, context-aware in-car communication systems to reduce driver distractions

Author: Tchankue-Sielinou Patrick
Publication venue: 'University of Zagreb, Faculty of Science, Department of Mathematics'
Publication date: 01/01/2015
Field of study

Driver distraction remains a matter of concern throughout the world as the number of car accidents caused by distracted driving is still unacceptably high. Industry and academia are working intensively to design new techniques that will address all types of driver distraction including visual, manual, auditory and cognitive distraction. This research focuses on an existing technology, namely in-car communication systems (ICCS). ICCS allow drivers to interact with their mobile phones without touching or looking at them. Previous research suggests that ICCS have reduced visual and manual distraction. Two problems were identified in this research: existing ICCS are still expensive and only available in limited models of car. As a result of that, only a small number of drivers can obtain a car equipped with an ICCS, especially in developing countries. The second problem is that existing ICCS are not aware of the driving context, which plays a role in distracting drivers. This research project was based on the following thesis statement: A mobile, context-aware model can be designed to reduce driver distraction caused by the use of ICCS. A mobile ICCS is portable and can be used in any car, addressing the first problem. Context-awareness will be used to detect possible situations that contribute to distracting drivers and the interaction with the mobile ICCS will be adapted so as to avert calls and text messages. This will address the second problem. As the driving context is dynamic, drivers may have to deal with critical safety-related tasks while they are using an existing ICCS. The following steps were taken in order to validate the thesis statement. An investigation was conducted into the causes and consequences of driver distraction. A review of literature was conducted on context-aware techniques that could potentially be used. The design of a model was proposed, called the Multimodal Interface for Mobile Info-communication with Context (MIMIC) and a preliminary usability evaluation was conducted in order to assess the feasibility of a speech-based, mobile ICCS. Despite some problems with the speech recognition, the results were satisfying and showed that the proposed model for mobile ICCS was feasible. Experiments were conducted in order to collect data to perform supervised learning to determine the driving context. The aim was to select the most effective machine learning techniques to determine the driving context. Decision tree and instance-based algorithms were found to be the best performing algorithms. Variables such as speed, acceleration and linear acceleration were found to be the most important variables according to an analysis of the decision tree. The initial MIMIC model was updated to include several adaptation effects and the resulting model was implemented as a prototype mobile application, called MIMIC-Prototype

Nelson Mandela University

South East Academic Libraries System (SEALS)

Recommended from our members

Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue

Author: Gravano Agustin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 15/12/2001
Field of study

As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as 'okay' or 'alright' that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination

Columbia University Academic Commons

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Recommended from our members

Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue

Author: Gravano Agustin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Columbia University Academic Commons

ProQuest OAI Repository