38 research outputs found

    Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech

    Get PDF
    The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case

    Multimodal Data Fusion of Electromyography and Acoustic Signals for Thai Syllable Recognition

    Get PDF
    Speech disorders such as dysarthria are common and frequent after suffering a stroke. Speech rehabilitation performed by a speech-language pathologist is needed to improve and recover. However, in Thailand, there is a shortage of speech-language pathologists. In this paper, we present a syllable recognition system, which can be deployable in a speech rehabilitation system to provide support to the limited speech-language pathologists available. The proposed system is based on a multimodal fusion of acoustic signal and surface electromyography (sEMG) collected from facial muscles. Multimodal data fusion is studied to improve signal collection under noisy situations while reducing the number of electrodes needed. The signals are simultaneously collected while articulating 12 Thai syllables designed for rehabilitation exercises. Several features are extracted from sEMG signals and five channels are studied. The best combination of features and channels is chosen to be fused with the mel-frequency cepstral coefficients extracted from the acoustic signal. The feature vector from each signal source is projected by spectral regression extreme learning machine and concatenated. Data from seven healthy subjects were collected for evaluation purposes. Results show that the multimodal fusion outperforms the use of a single signal source achieving up to 98% of accuracy. In other words, an accuracy improvement up to 5% can be achieved when using the proposed multimodal fusion. Moreover, its low standard deviations in classification accuracy compared to those from the unimodal fusion indicate the improvement in the robustness of the syllable recognition

    Towards a clinical assessment of acquired speech dyspraxia.

    Get PDF
    No standardised assessment exists for the recognition and quantification of acquired speech dyspraxia (also called apraxia of speech, AS). This thesis aims to work towards development of such an assessment based on perceptual features. Review of previous features claimed to characterise AS and differentiate it from other acquired pronunciation problems (dysarthrias; phonemic paraphasia - PP) has proved negative. Reasons for this have been explored. A reconceptualisation of AS is attempted based on physical studies of AS, PP and the dysarthrias; their position and relationship within coalitional models of speech production; by comparison with normal action control and other dyspraxias. Contrary to the view of many it is concluded that AS and PP are dyspraxias (albeit different types). However, due to the interactive nature of speech-language production and behaviour of the vocal tract as a functional whole AS is unlikely to be distinguishable in an absolute fashion based on single speech characteristics. Rather it is predicted that pronunciation disordered groups will differ relatively on total error profiles and susceptibility to associated effects (variability; propositionality; struggle; length-complexity; latency-utterance times). Using a prototype battery and refined error transcription and analysis procedures a series of studies test predictions on three groups: spastic dysarthrics (n = 6) AS and PP without (n = 12) and with (n = 12) dysphasia. The main conclusions do not support the error profile hypotheses in any straightforward manner. Length-complexity effects and latency-utterance times fail to consistently separate groups. Variability, propositionality and struggle proved the most reliable indicators. Error profiles remain the closest indicators of speakers' intelligibility and therapeutic goals. The thesis argues for a single case approach to differential diagnosis and alternative statistical analyses to capture individual and group differences. Suggestions for changes to the prototype clinical battery and data management to effect optimal speaker differentiation conclude the work

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Voice onset time and vowel formant measures in online testing and laboratory-based testing with(out) surgical face masks

    Get PDF
    Since the COVID-19 pandemic started, conducting experiments online is increasingly common, and face masks are often used in everyday life. It remains unclear whether phonetic detail in speech production is captured adequately when speech is recorded in internet-based experiments or in experiments conducted with face masks. We tested 55 Spanish–Basque–English trilinguals in picture naming tasks in three conditions: online, laboratory-based with surgical face masks, and laboratory-based without face masks (control). We measured plosive voice onset time (VOT) in each language, the formants and duration of English vowels /iː/ and /ɪ/, and the Spanish/Basque vowel space. Across conditions, there were differences between English and Spanish/Basque VOT and in formants and duration between English /iː/–/ɪ/; between conditions, small differences emerged. Relative to the control condition, the Spanish/Basque vowel space was larger in online testing and smaller in the face mask condition. We conclude that testing online or with face masks is suitable for investigating phonetic detail in within-participant designs although the precise measurements may differ from those in traditional laboratory-based researchThis work was supported by institutional grants from the Basque Government [BERC 2022–2025 program] and the Spanish State Research Agency [BCBL Severo Ochoa excellence accreditation CEX2020-001010/AEI/10.13039/501100011033] awarded to the BCBL. This project has also received funding from the European Union’s H2020 research and innovation program [Marie Skłodowska-Curie grant agreement No 843533 awarded to AS]; the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program [grant agreement No 819093 to CDM]; the Spanish State Research Agency [BES-2017- 082500 to CS; PID2020-113926GB-I00 to CDM; PID2021- 123578NA-I00/AEI/10.13039/501100011033/FEDER, UE, & FJC2020-044978-I to AS]; and by the Basque Government’s Department of Education [Predoctoral training program for research staff PRE_2021_2_0006 awarded to TT]

    Changes in speech intelligibility and acoustic distinctiveness along a speech rate continuum in Parkinson’s disease

    Get PDF
    Asking a person to speak slowly is a common technique in speech therapy for people with Parkinson’s disease (PD). Slowed speaking rates are thought to bring about changes in speech production that make it easier for people with speech impairments associated with PD to be understood, but this is not always the case. Furthermore, research suggests that using faster speech does not necessarily lead to decreases in speech intelligibility for some people with PD. Most studies of rate modification in PD have only included one or two rate adjustments to investigate the relationship between speech rate, intelligibility, and acoustic aspects of speech production. The present study adds to this literature and expands it by eliciting a broader range of speech rates than has previously been studied in order to provide a comprehensive description of changes along such a continuum. Two groups of people with PD and documented speech changes participated: 22 receiving standard pharmaceutical intervention, and 12 who additionally had undergone deep brain stimulation surgery (DBS), a common surgical treatment for PD. DBS is often associated with further speech impairment, but it is unknown to what extent these individuals may benefit from speech rate adjustments. Younger and older healthy control groups were also included. All participants were asked to modify their speech rate along a seven-step continuum from very slow to very fast while reading words, sentences, and responding to prompts. Naïve listeners later heard these speech samples and were asked to either transcribe or rate what they heard. Results indicated different patterns of speech changes across groups, rates, and tasks. Sentence reading and conversational speech were rated as being more intelligible at slow rates, and less intelligible at fast rates. All modified rates were found to negatively impact speech sound identification during a novel carrier phrase task. Slower speech was overall associated with greater acoustic contrast and variability, lower intensity, and higher voice quality. Differences in acoustic speech adjustments across the groups and speech rates emerged, however, in particular for the DBS group. Findings pointed to a complex relationship between speech rate modifications, acoustic distinctiveness, and intelligibility

    PROFOUND AND MULTIPLE LEARNING DISABILITIES AND LANGUAGE: AN INVESTIGATION INTO THE USE OF MEANINGFUL, INTELLIGIBLE SUB-VOCAL UTTERANCES BY CHILDREN AND YOUNG ADULTS WITH PROFOUND AND MULTIPLE LEARNING DISABILITIES.

    Get PDF
    The aim of this thesis is to investigate the use of sub vocal (SV) meaningful utterances by 20 children and young adults assessed by their teachers as having profound and multiple learning difficulties (PMLD.) People designated PMLD are believed to be incapable of using language beyond a few words or symbols and to operate developmentally between 0-24 months, prior to the acquisition of language. Nevertheless, digital recordings captured linguistic sub vocal utterances, apparently meaningful and intelligible by 20 research participants. Consequently, the research hypothesis proposed that: Children and young adults designated PMLD can produce meaningful sub vocal utterances intelligible to listeners. The conclusions from the 4 phases of the research were as follows: 1. As proposed by the hypothesis, 20 research participants designated PMLD can produce meaningful sub vocal utterances intelligible to listeners. 2. Acoustic phonetic features integral to normal speech and whisper can be identified in SV utterances including the presence of a ‘speech like’ event. 3. SV utterances by the participants were intelligible to 40 listeners in closed and open conditions. 4. The content of SV utterances encompass developmental and linguistic levels beyond the developmental age of 0-24 months attributed to individuals designated PMLD. 5. The 20 research participants can produce meaningful language as SV utterances including abstract concepts expressed as views, opinions, and idea

    Düsartriaga täiskasvanute kõne hindamine ja kõne tunnused: videopõhine õppematerjal

    Get PDF
    https://www.ester.ee/record=b5255669*es
    corecore