38 research outputs found
Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech
The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case
Multimodal Data Fusion of Electromyography and Acoustic Signals for Thai Syllable Recognition
Speech disorders such as dysarthria are common and frequent after suffering a stroke. Speech rehabilitation performed by a speech-language pathologist is needed to improve and recover. However, in Thailand, there is a shortage of speech-language pathologists. In this paper, we present a syllable recognition system, which can be deployable in a speech rehabilitation system to provide support to the limited speech-language pathologists available. The proposed system is based on a multimodal fusion of acoustic signal and surface electromyography (sEMG) collected from facial muscles. Multimodal data fusion is studied to improve signal collection under noisy situations while reducing the number of electrodes needed. The signals are simultaneously collected while articulating 12 Thai syllables designed for rehabilitation exercises. Several features are extracted from sEMG signals and five channels are studied. The best combination of features and channels is chosen to be fused with the mel-frequency cepstral coefficients extracted from the acoustic signal. The feature vector from each signal source is projected by spectral regression extreme learning machine and concatenated. Data from seven healthy subjects were collected for evaluation purposes. Results show that the multimodal fusion outperforms the use of a single signal source achieving up to 98% of accuracy. In other words, an accuracy improvement up to 5% can be achieved when using the proposed multimodal fusion. Moreover, its low standard deviations in classification accuracy compared to those from the unimodal fusion indicate the improvement in the robustness of the syllable recognition
Towards a clinical assessment of acquired speech dyspraxia.
No standardised assessment exists for the recognition and quantification of acquired speech dyspraxia (also called apraxia of speech, AS). This thesis aims to work towards development of such an assessment based on perceptual features. Review of previous features claimed to characterise AS and differentiate it from other acquired pronunciation problems (dysarthrias; phonemic paraphasia - PP) has proved negative. Reasons for this have been explored. A reconceptualisation of AS is attempted based on physical studies of AS, PP and the dysarthrias; their position and relationship within coalitional models of speech production; by comparison with normal action control and other dyspraxias. Contrary to the view of many it is concluded that AS and PP are dyspraxias (albeit different types). However, due to the interactive nature of speech-language production and behaviour of the vocal tract as a functional whole AS is unlikely to be distinguishable in an absolute fashion based on single speech characteristics. Rather it is predicted that pronunciation disordered groups will differ relatively on total error profiles and susceptibility to associated effects (variability; propositionality; struggle; length-complexity; latency-utterance times). Using a prototype battery and refined error transcription and analysis procedures a series of studies test predictions on three groups: spastic dysarthrics (n = 6) AS and PP without (n = 12) and with (n = 12) dysphasia. The main conclusions do not support the error profile hypotheses in any straightforward manner. Length-complexity effects and latency-utterance times fail to consistently separate groups. Variability, propositionality and struggle proved the most reliable indicators. Error profiles remain the closest indicators of speakers' intelligibility and therapeutic goals. The thesis argues for a single case approach to differential diagnosis and alternative statistical analyses to capture individual and group differences. Suggestions for changes to the prototype clinical battery and data management to effect optimal speaker differentiation conclude the work
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Voice onset time and vowel formant measures in online testing and laboratory-based testing with(out) surgical face masks
Since the COVID-19 pandemic started, conducting experiments online is increasingly common, and face masks are often used in everyday life. It remains unclear whether phonetic detail in speech production is captured adequately when speech is recorded in internet-based experiments or in experiments conducted with face masks. We tested 55 Spanish–Basque–English trilinguals in picture naming tasks in three conditions: online, laboratory-based with surgical face masks, and laboratory-based without face masks (control). We measured plosive voice onset time (VOT) in each language, the formants and duration of English vowels /iː/ and /ɪ/, and the Spanish/Basque vowel space. Across conditions, there were differences between English and Spanish/Basque VOT and in formants and duration between English /iː/–/ɪ/; between conditions, small differences emerged. Relative to the control condition, the Spanish/Basque vowel space was larger in online testing and smaller in the face mask condition. We conclude that testing online or with face masks is suitable for investigating phonetic detail in within-participant designs although the precise measurements may differ from those in traditional laboratory-based researchThis work was supported by institutional grants from the
Basque Government [BERC 2022–2025 program] and the
Spanish State Research Agency [BCBL Severo Ochoa
excellence accreditation CEX2020-001010/AEI/10.13039/501100011033] awarded to the BCBL. This project has also
received funding from the European Union’s H2020 research
and innovation program [Marie Skłodowska-Curie grant
agreement No 843533 awarded to AS]; the European Research
Council (ERC) under the European Union’s Horizon 2020
research and innovation program [grant agreement No 819093
to CDM]; the Spanish State Research Agency [BES-2017-
082500 to CS; PID2020-113926GB-I00 to CDM; PID2021-
123578NA-I00/AEI/10.13039/501100011033/FEDER, UE, &
FJC2020-044978-I to AS]; and by the Basque Government’s
Department of Education [Predoctoral training program for
research staff PRE_2021_2_0006 awarded to TT]
Changes in speech intelligibility and acoustic distinctiveness along a speech rate continuum in Parkinson’s disease
Asking a person to speak slowly is a common technique in speech therapy for people with Parkinson’s disease (PD). Slowed speaking rates are thought to bring about changes in speech production that make it easier for people with speech impairments associated with PD to be understood, but this is not always the case. Furthermore, research suggests that using faster speech does not necessarily lead to decreases in speech intelligibility for some people with PD. Most studies of rate modification in PD have only included one or two rate adjustments to investigate the relationship between speech rate, intelligibility, and acoustic aspects of speech production. The present study adds to this literature and expands it by eliciting a broader range of speech rates than has previously been studied in order to provide a comprehensive description of changes along such a continuum.
Two groups of people with PD and documented speech changes participated: 22 receiving standard pharmaceutical intervention, and 12 who additionally had undergone deep brain stimulation surgery (DBS), a common surgical treatment for PD. DBS is often associated with further speech impairment, but it is unknown to what extent these individuals may benefit from speech rate adjustments. Younger and older healthy control groups were also included. All participants were asked to modify their speech rate along a seven-step continuum from very slow to very fast while reading words, sentences, and responding to prompts. Naïve listeners later heard these speech samples and were asked to either transcribe or rate what they heard.
Results indicated different patterns of speech changes across groups, rates, and tasks. Sentence reading and conversational speech were rated as being more intelligible at slow rates, and less intelligible at fast rates. All modified rates were found to negatively impact speech sound identification during a novel carrier phrase task. Slower speech was overall associated with greater acoustic contrast and variability, lower intensity, and higher voice quality. Differences in acoustic speech adjustments across the groups and speech rates emerged, however, in particular for the DBS group. Findings pointed to a complex relationship between speech rate modifications, acoustic distinctiveness, and intelligibility
PROFOUND AND MULTIPLE LEARNING DISABILITIES AND LANGUAGE: AN INVESTIGATION INTO THE USE OF MEANINGFUL, INTELLIGIBLE SUB-VOCAL UTTERANCES BY CHILDREN AND YOUNG ADULTS WITH PROFOUND AND MULTIPLE LEARNING DISABILITIES.
The aim of this thesis is to investigate the use of sub vocal (SV) meaningful utterances by 20 children and young adults assessed by their teachers as having profound and multiple learning difficulties (PMLD.) People designated PMLD are believed to be incapable of using language beyond a few words or symbols and to operate developmentally between 0-24 months, prior to the acquisition of language. Nevertheless, digital recordings captured linguistic sub vocal utterances, apparently meaningful and intelligible by 20 research participants. Consequently, the research hypothesis proposed that: Children and young adults designated PMLD can produce meaningful sub vocal utterances intelligible to listeners.
The conclusions from the 4 phases of the research were as follows:
1. As proposed by the hypothesis, 20 research participants designated PMLD can produce meaningful sub vocal utterances intelligible to listeners.
2. Acoustic phonetic features integral to normal speech and whisper can be identified in SV utterances including the presence of a ‘speech like’ event.
3. SV utterances by the participants were intelligible to 40 listeners in closed and open conditions.
4. The content of SV utterances encompass developmental and linguistic levels beyond the developmental age of 0-24 months attributed to individuals designated PMLD.
5. The 20 research participants can produce meaningful language as SV utterances including abstract concepts expressed as views, opinions, and idea
Recommended from our members
Perceptual learning of context-sensitive phonetic detail
[Abstract abbreviated due to inability of DSpace@Cambridge to display phonetic symbols. Please see the full abstract in the attached pdf file.]
Although familiarity with a talker or accent is known to facilitate perception, it is not clear what underlies this phenomenon. Previous research has focused primarily on whether listeners can learn to associate novel phonetic characteristics with low-level units such as features or phonemes. However, this neglects the potential role of phonetic information at many other levels of representation. To address this shortcoming, this thesis investigated perceptual learning of systematic phonetic detail relating to higher levels of linguistic structure, including prosodic, grammatical and morphological contexts. Furthermore, in contrast to many previous studies, this research used relatively natural stimuli and tasks, thus maximising its relevance to perceptual learning in ordinary listening situations.
This research shows that listeners can update their phonetic representations in response to incoming information and its relation to linguistic-structural context. In addition, certain patterns of systematic phonetic detail were more learnable than others. These findings are used to inform an account of how new information is integrated with prior experience in speech processing, within a framework that emphasises the importance of phonetic detail at multiple levels of representation.This work was funded by an AHRC grant
Düsartriaga täiskasvanute kõne hindamine ja kõne tunnused: videopõhine õppematerjal
https://www.ester.ee/record=b5255669*es