Search CORE

1,297 research outputs found

Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype

Author: Eskenazi Maxine
Publication venue: Michigan State University Center for Language Education and Research
Publication date: 01/01/1999
Field of study

ScholarSpace at University of Hawai'i at Manoa

Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

Author: Callan Akiko M.
Callan Daniel E.
Jones Jeffery A.
Kroos Christian
Munhall Kevin
Vatikiotis-Bateson Eric
Publication venue: Scholars Commons @ Laurier
Publication date: 01/06/2004
Field of study

Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

CiteSeerX

Crossref

Wilfrid Laurier University

Analysing patterns of right brain-hemisphere activity prior to speech articulation for identification of system-directed speech

Author: Akira Hayakawa
Campbell Nick
Haider Fasih
Luz Saturnino
Vogel Carl
Publication venue: 'Elsevier BV'
Publication date: 01/02/2019
Field of study

Edinburgh Research Explorer

A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

Author: Kousidis Spyridon, [Thesis]
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2010
Field of study

Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

Arrow@TUDublin

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Factors influencing the efficacy of delayed auditory feedback in treating dysarthria associated with Parkinson\u27s disease

Author: Blanchet Paul Gerard
Publication venue: LSU Digital Commons
Publication date: 01/01/2002
Field of study

Parkinson\u27s disease patients exhibit a high prevalence of speech deficits including excessive speech rate, reduced intelligibility, and disfluencies. The present study examined the effects of delayed auditory feedback (DAF) as a rate control intervention for dysarthric speakers with Parkinson\u27s disease. Adverse reactions to relatively long delay intervals are commonly observed during clinical use of DAF, and seem to result from improper matching of the delayed signal. To facilitate optimal use of DAF, therefore, clinicians must provide instruction, modeling, and feedback. Clinician instruction is frequently used in speech-language therapy, but has not been evaluated during use of DAF-based interventions. Therefore, the primary purpose of the present study was to evaluate the impact of clinician instruction on the effectiveness of DAF in treating speech deficits. A related purpose was to compare the effects of different delay intervals on speech behaviors. An A-B-A-B single-subject design was utilized. The A phases consisted of a sentence reading task using DAF, while the B phases incorporated clinician instruction into the DAF protocol. During each of the 16 experimental sessions, speakers read with four different delay intervals (0 ms, 50 ms, 100 ms, and 150 ms). During the B phases, the experimenter provided verbal feedback and modeling pertaining to how precisely the speaker matched the delayed signal. Dependent variables measured were speech rate, percent intelligible syllables, and percent disfluencies. Three males with Parkinson\u27s disease and an associated dysarthria participated in the study. Results revealed that for all three speakers, DAF significantly reduced reading rate and produced significant improvements in either intelligibility (for Speaker 3) or fluency (for Speakers 1 and 2). A delay interval of 150 ms produced the greatest reductions in reading rates for all three speakers, although any of the DAF settings used was sufficient to produce significant improvements in either intelligibility or fluency. In addition, supplementing the DAF intervention with clinician instruction resulted in significantly greater gains achieved with DAF. These findings confirmed the effectiveness of various intervals of DAF in improving speech deficits in Parkinson\u27s disease speakers, particular when patients are provided with instruction and modeling from the clinician

Louisiana State University

Semi-Automated & Collaborative Online Training Module For Improving Communication Skills

Author: Barbosa Hugo
Ghoshal Gourab
Hoque
Li Vivian
Mohammed
Zhao Ru
Publication venue
Publication date: 27/04/2017
Field of study

This paper presents a description and evaluation of the ROC Speak system, a platform that allows ubiquitous access to communication skills training. ROC Speak (available at rocspeak.com) enables anyone to go to a website, record a video, and receive feedback on smile intensity, body movement, volume modulation, filler word usage, unique word usage, word cloud of the spoken words, in addition to overall assessment and subjective comments by peers. Peer comments are automatically ranked and sorted for usefulness and sentiment (i.e., positive vs. negative). We evaluated the system with a diverse group of 56 online participants for a 10-day period. Participants submitted responses to career oriented prompts every other day. The participants were randomly split into two groups: 1) treatment - full feedback from the ROC Speak system; 2) control - written feedback from online peers. When judged by peers (p<.001) and independent raters (p<.05), participants from the treatment group demonstrated statistically significant improvement in overall speaking skills rating while the control group did not. Furthermore, in terms of speaking attributes, treatment group showed an improvement in friendliness (p<.001), vocal variety (p<.05) and articulation (p<.01)

arXiv.org e-Print Archive

Recommended from our members

The Challenge of Spoken Language Systems: Research Directions for the Nineties

Author: Atlas Les
Beckman Mary
Biermann Alan
Bush Marcia
Clements Mark
Cohen Jordan
Cole Ron
Garcia Oscar
Hanson Brian
Hermansky Hynek
Hirschman Lynette
Levinson Steve
McKeown Kathleen
Morgan Nelson
Novick David G.
Ostendorf Mari
Oviatt Sharon
Price Patti
Silverman Harvey
Spitz Judy
Waibel Alex
Weinstein Clifford
Zahorian Steve
Zue Victor
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1995
Field of study

A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area

Columbia University Academic Commons