150 research outputs found

    Temporal entrainment in overlapping speech

    Get PDF
    Wlodarczak M. Temporal entrainment in overlapping speech. Bielefeld: Bielefeld University; 2014

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH

    Get PDF
    Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods. Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression. Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly. Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science

    How Was Your Day? evaluating a conversational companion

    Get PDF
    The “How Was Your Day” (HWYD) Companion is an embodied conversational agent that can discuss work-related issues, entering free-form dialogues that lack any clearly defined tasks and goals. The development of this type of Companion technology requires new models of evaluation. Here, we describe a paradigm and methodology for evaluating the main aspects of such functionality in conjunction with overall system behaviour, with respect to three parameters: functional ability (i.e., does it do the ‘right’ thing), content (i.e., does it respond appropriately to the semantic context), and emotional behaviour (i.e., given the emotional input from the user, does it respond in an emotionally appropriate way). We demonstrate the functionality of our evaluation paradigm as a method for both grading current system performance, and targeting areas for particular performance review. We show correlation between, for example, ASR performance and overall system performance (as is expected in systems of this type) but beyond this, we show where individual utterances or responses, which are indicated as positive or negative, show an immediate response from the user, and demonstrate how our combination evaluation approach highlights issues (both positive and negative) in the Companion system’s interaction behaviou

    A model for mobile, context-aware in-car communication systems to reduce driver distractions

    Get PDF
    Driver distraction remains a matter of concern throughout the world as the number of car accidents caused by distracted driving is still unacceptably high. Industry and academia are working intensively to design new techniques that will address all types of driver distraction including visual, manual, auditory and cognitive distraction. This research focuses on an existing technology, namely in-car communication systems (ICCS). ICCS allow drivers to interact with their mobile phones without touching or looking at them. Previous research suggests that ICCS have reduced visual and manual distraction. Two problems were identified in this research: existing ICCS are still expensive and only available in limited models of car. As a result of that, only a small number of drivers can obtain a car equipped with an ICCS, especially in developing countries. The second problem is that existing ICCS are not aware of the driving context, which plays a role in distracting drivers. This research project was based on the following thesis statement: A mobile, context-aware model can be designed to reduce driver distraction caused by the use of ICCS. A mobile ICCS is portable and can be used in any car, addressing the first problem. Context-awareness will be used to detect possible situations that contribute to distracting drivers and the interaction with the mobile ICCS will be adapted so as to avert calls and text messages. This will address the second problem. As the driving context is dynamic, drivers may have to deal with critical safety-related tasks while they are using an existing ICCS. The following steps were taken in order to validate the thesis statement. An investigation was conducted into the causes and consequences of driver distraction. A review of literature was conducted on context-aware techniques that could potentially be used. The design of a model was proposed, called the Multimodal Interface for Mobile Info-communication with Context (MIMIC) and a preliminary usability evaluation was conducted in order to assess the feasibility of a speech-based, mobile ICCS. Despite some problems with the speech recognition, the results were satisfying and showed that the proposed model for mobile ICCS was feasible. Experiments were conducted in order to collect data to perform supervised learning to determine the driving context. The aim was to select the most effective machine learning techniques to determine the driving context. Decision tree and instance-based algorithms were found to be the best performing algorithms. Variables such as speed, acceleration and linear acceleration were found to be the most important variables according to an analysis of the decision tree. The initial MIMIC model was updated to include several adaptation effects and the resulting model was implemented as a prototype mobile application, called MIMIC-Prototype
    • 

    corecore