1,297 research outputs found

    Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype

    Get PDF

    Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

    Get PDF
    Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Factors influencing the efficacy of delayed auditory feedback in treating dysarthria associated with Parkinson\u27s disease

    Get PDF
    Parkinson\u27s disease patients exhibit a high prevalence of speech deficits including excessive speech rate, reduced intelligibility, and disfluencies. The present study examined the effects of delayed auditory feedback (DAF) as a rate control intervention for dysarthric speakers with Parkinson\u27s disease. Adverse reactions to relatively long delay intervals are commonly observed during clinical use of DAF, and seem to result from improper matching of the delayed signal. To facilitate optimal use of DAF, therefore, clinicians must provide instruction, modeling, and feedback. Clinician instruction is frequently used in speech-language therapy, but has not been evaluated during use of DAF-based interventions. Therefore, the primary purpose of the present study was to evaluate the impact of clinician instruction on the effectiveness of DAF in treating speech deficits. A related purpose was to compare the effects of different delay intervals on speech behaviors. An A-B-A-B single-subject design was utilized. The A phases consisted of a sentence reading task using DAF, while the B phases incorporated clinician instruction into the DAF protocol. During each of the 16 experimental sessions, speakers read with four different delay intervals (0 ms, 50 ms, 100 ms, and 150 ms). During the B phases, the experimenter provided verbal feedback and modeling pertaining to how precisely the speaker matched the delayed signal. Dependent variables measured were speech rate, percent intelligible syllables, and percent disfluencies. Three males with Parkinson\u27s disease and an associated dysarthria participated in the study. Results revealed that for all three speakers, DAF significantly reduced reading rate and produced significant improvements in either intelligibility (for Speaker 3) or fluency (for Speakers 1 and 2). A delay interval of 150 ms produced the greatest reductions in reading rates for all three speakers, although any of the DAF settings used was sufficient to produce significant improvements in either intelligibility or fluency. In addition, supplementing the DAF intervention with clinician instruction resulted in significantly greater gains achieved with DAF. These findings confirmed the effectiveness of various intervals of DAF in improving speech deficits in Parkinson\u27s disease speakers, particular when patients are provided with instruction and modeling from the clinician

    Semi-Automated & Collaborative Online Training Module For Improving Communication Skills

    Full text link
    This paper presents a description and evaluation of the ROC Speak system, a platform that allows ubiquitous access to communication skills training. ROC Speak (available at rocspeak.com) enables anyone to go to a website, record a video, and receive feedback on smile intensity, body movement, volume modulation, filler word usage, unique word usage, word cloud of the spoken words, in addition to overall assessment and subjective comments by peers. Peer comments are automatically ranked and sorted for usefulness and sentiment (i.e., positive vs. negative). We evaluated the system with a diverse group of 56 online participants for a 10-day period. Participants submitted responses to career oriented prompts every other day. The participants were randomly split into two groups: 1) treatment - full feedback from the ROC Speak system; 2) control - written feedback from online peers. When judged by peers (p<.001) and independent raters (p<.05), participants from the treatment group demonstrated statistically significant improvement in overall speaking skills rating while the control group did not. Furthermore, in terms of speaking attributes, treatment group showed an improvement in friendliness (p<.001), vocal variety (p<.05) and articulation (p<.01)
    corecore