7,379 research outputs found

    Observations on listener responses from multiple perspectives

    Get PDF
    Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), 48–55. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/22532

    A Survey on Evaluation Metrics for Backchannel Prediction Models

    Get PDF
    In this paper we give an overview of the evaluation metrics used to measure the performance of backchannel prediction models. Both objective and subjective evaluation metrics are discussed. The survey shows that almost every backchannel prediction model is evaluated with a different evaluation metric. This makes comparison between developed models unreliable, even beside the other variables in play, such as different corpora, language, conversational setting, amount of data and/or definition of the term backchannel

    Iterative Perceptual Learning for Social Behavior Synthesis

    Get PDF
    We introduce Iterative Perceptual Learning (IPL), a novel approach for learning computational models for social behavior synthesis from corpora of human-human interactions. The IPL approach combines perceptual evaluation with iterative model refinement. Human observers rate the appropriateness of synthesized individual behaviors in the context of a conversation. These ratings are in turn used to refine the machine learning models. As the ratings correspond to those moments in the conversation where the production of a specific social behavior is inappropriate, we can regard features extracted at these moments as negative samples for the training of a machine learning classifier. This is an advantage over traditional corpusbased approaches, in which negative samples at extracted at random from moments in the conversation where the specific social behavior does not occur. We perform a comparison between the IPL approach and the traditional corpus-based approach on the timing of backchannels for a listener in speaker-listener dialogs. While both models perform similarly in terms of precision and recall scores, the results of the IPL model are rated as more appropriate in the perceptual evaluation.We additionally investigate the effect of the amount of available training data and the variation of training data on the outcome of the models

    Exploring the influence of suprasegmental features of speech on rater judgements of intelligibility

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophyThe importance of suprasegmental features of speech to pronunciation proficiency is well known, yet limited research has been undertaken to identify how raters attend to suprasegmental features in the English-language speaking test encounter. Currently, such features appear to be underrepresented in language learning frameworks and are not always satisfactorily incorporated into the analytical rating scales that are used by major language testing organisations. This thesis explores the influence of lexical stress, rhythm and intonation on rater decision making in order to provide insight into their proper place in rating scales and frameworks. Data were collected from 30 raters, half of whom were experienced professional raters and half of whom lacked rater training and a background in language learning or teaching. The raters were initially asked to score 12 test taker performances using a 9-point intelligibility scale. The performances were taken from the long turn of Cambridge English Main Suite exams and were selected on the basis of the inclusion of a range of notable suprasegmental features. Following scoring, the raters took part in a stimulated recall procedure to report the features that influenced their decisions. The resulting scores were quantitatively analysed using many-facet Rasch measurement analysis. Transcriptions of the verbal reports were analysed using qualitative methods. Finally, an integrated analysis of the quantitative and qualitative data was undertaken to develop a series of suprasegmental rating scale descriptors. The results showed that experienced raters do appear to attend to specific suprasegmental features in a reliable way, and that their decisions have a great deal in common with the way non-experienced raters regard such features. This indicates that stress, rhythm, and intonation may be somewhat underrepresented on current speaking proficiency scales and frameworks. The study concludes with the presentation of a series of suprasegmental rating scale descriptors

    Speaker-adaptive multimodal prediction model for listener responses

    Get PDF
    The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this ``communicative style". Central to our approach is the idea of ``speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker-adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adapta-tion, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style

    Trainee therapist responses to the discussion of trauma in therapy

    Get PDF
    Responses to disclosures/discussions of trauma can have lasting impacts on survivors who choose to share their experiences and historically have been categorized as positive, negative, and/or neutral responses with corresponding effects on the survivor. Literature recommends the use of tenets and techniques reminiscent of therapeutic common factors (e.g., listening skills, empathy, support, validation, creating a safe environment and strong therapeutic alliance) when responding to trauma. However, existing research focuses on reactions to survivors\u27 disclosures outside of therapy and there is little research focusing on therapists\u27 responses. Specifically, there are no studies that investigate how therapists or trainees are actually responding in psychotherapy sessions (e.g., frequency and rate of such responses). Accordingly, the purpose of the present study was to qualitatively explore the responses of student therapists in psychotherapy sessions with trauma survivors. A sample of 5 therapist-participants from university-based community counseling centers were selected and transcribed videotaped sessions in which client- and trainee therapist-participants discussed trauma were analyzed using a qualitative and deductive content analysis. A coding system was created to categorize responses based on extant literature. Results indicated that trainee therapist-participants responded in all proposed categories (positive: validating, supportive, empathic; negative: invalidating, unsupportive, unempathetic; and neutral: clarifying questions, and reflection/summary statements). Of these, neutral responses tended to occur more frequently than positive or negative responses. Overall, positive responses followed as next most frequent and negative responses as least frequent. Other findings included that in 2 of the 5 individual sessions, negative responses were more frequent than positive responses; empathic responses were the least frequent code across all 10 coding categories; and 2 sessions had 0 recorded empathic responses. Finally, there were numerous missed opportunities for positive responding throughout the sessions. It is hoped that this study will raise awareness around the importance of therapeutic responses to trauma survivors\u27 discussions in psychotherapy sessions and provide insight as to how trainee therapists might apply their existing competencies to respond to clients in positive ways. Findings have implications for both future studies and clinical training practices, for example in graduate programs for trainee therapists, an area of study that is currently under-researched

    Miscommunication in the institutional context of the broadcast news interview : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Psychology at Massey University, Palmerston North, New Zealand

    Get PDF
    This study examined the pattern and relative success of linguistic interaction in the Broadcast News Interview (BNI). BNI is modelled as a genre of institutional communication. The psychological and functional characteristics of the BNI were examined from the viewpoint of how communicative conventions that normally regulate interview performance may, at times, impede effective communication. The BNI is intended to transfer information from an expert witness to an interested, though relatively uninformed audience. The interviewer is supposed to act as both conduit and catalyst. Pragmatic properties of the interlocutors' speech as they orient themselves towards the context of the conversation was analysed in order to reveal the manner in which prior assumptions or beliefs may lead to faulty inferences. The notion of miscommunication is used to describe and explain the faults associated with processes of representing the illocutionary force of an utterance, rather than deficiencies in pronunciation or auditory sensation and perception. Opting for a qualitative analysis, an attempt was made to ground explanations in relevant theoretical models of interpersonal communication and communication failure. Results indicate that the conventions that distinguish the BNI from more mundane types of interaction impede successful communication. The study highlights that participants who wish to attain their communicative goal must be more aware of the functional procedures of the BNI and anticipate impediments to successful communication
    corecore