19 research outputs found

    Automatic speech intelligibility detection for speakers with speech impairments: the identification of significant speech features

    Get PDF
    Selection of relevant features is important for discriminating speech in detection based ASR system, thus contributing to the improved performance of the detector. In the context of speech impairments, speech errors can be discriminated from regular speech by adopting the appropriate discriminative speech features with high discriminative ability between the impaired and the control group. However, identification of suitable discriminative speech features for error detection in impaired speech was not well investigated in the literature. Characteristics of impaired speech are grossly different from regular speech, thus making the existing speech features to be less effective in recognizing the impaired speech. To overcome this gap, the speech features of impaired speech based on the prosody, pronunciation and voice quality are analyzed for identifying the significant speech features which are related to the intelligibility deficits. In this research, we investigate the relations of speech impairments due to cerebral palsy, and hearing impairment with the prosody, pronunciation, and voice quality. Later, we identify the relationship of the speech features with the speech intelligibility classification and the significant speech features in improving the discriminative ability of an automatic speech intelligibility detection system. The findings showed that prosody, pronunciation and voice quality features are statistically significant speech features for improving the detection ability of impaired speeches. Voice quality is identified as the best speech features with more discriminative power in detecting speech intelligibility of impaired speech

    CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing

    No full text
    Speech enhancement algorithms are applied in multiple levels of enhancement to improve the quality of speech signals under noisy environments known as multi-channel speech enhancement (MCSE) systems. Numerous existing algorithms are used to filter noise in speech enhancement systems, which are typically employed as a pre-processor to reduce noise and improve speech quality. They may, however, be limited in performing well under low signal-to-noise ratio (SNR) situations. The speech devices are exposed to all kinds of environmental noises which may go up to a high-level frequency of noises. The objective of this research is to conduct a noise reduction experiment for a multi-channel speech enhancement (MCSE) system in stationary and non-stationary environmental noisy situations with varying speech signal SNR levels. The experiments examined the performance of the existing and the proposed MCSE systems for environmental noises in filtering low to high SNRs environmental noises (−10 dB to 20 dB). The experiments were conducted using the AURORA and LibriSpeech datasets, which consist of different types of environmental noises. The existing MCSE (BAV-MCSE) makes use of beamforming, adaptive noise reduction and voice activity detection algorithms (BAV) to filter the noises from speech signals. The proposed MCSE (DWT-CNN-MCSE) system was developed based on discrete wavelet transform (DWT) preprocessing and convolution neural network (CNN) for denoising the input noisy speech signals to improve the performance accuracy. The performance of the existing BAV-MCSE and the proposed DWT-CNN-MCSE were measured using spectrogram analysis and word recognition rate (WRR). It was identified that the existing BAV-MCSE reported the highest WRR at 93.77% for a high SNR (at 20 dB) and 5.64% on average for a low SNR (at −10 dB) for different noises. The proposed DWT-CNN-MCSE system has proven to perform well at a low SNR with WRR of 70.55% and the highest improvement (64.91% WRR) at −10 dB SNR

    An FPN-based classification method for speech intelligibility detection of children with speech impairments

    No full text
    The inability to speak fluently degrades the quality of life of many individuals. Early intervention from childhood can reduce the disfluency of speech among adults. Traditionally, disfluency of speech among children is diagnosed based on the speech intelligibility assessment by speech and language pathologists, which can be expensive and time-consuming. Hence, numerous attempts were made to automate the speech intelligibility detection. Current detectors can discriminate unintelligible speech by calculating the posterior probability scores for each articulatory feature class. However, their major drawback is producing results that are most likely based on training and input data, leading to inconsistencies in discriminating speech sounds. As such, the performance of detectors is still far below humans. To overcome this limitation, a new classification method based on Fuzzy Petri Nets (FPN) is proposed to improve the classification accuracy. FPN was proposed as it has greater knowledge representation ability to reason using uncertain or ambiguous information. In this research, the speech features of Malay impaired children’s speeches are analyzed for the identification of the significant speech features in the impaired speech which are related to the intelligibility deficits. The results showed that FPN is more reliable in discriminating speech sounds than the baseline classifiers with improvements in the classification accuracy and precision. © 2017, Springer-Verlag GmbH Germany, part of Springer Nature

    Speech emotion recognition research: an analysis of research focus

    No full text
    This article analyses research in speech emotion recognition (“SER”) from 2006 to 2017 in order to identify the current focus of research, and areas in which research is lacking. The objective is to examine what is being done in this field of research. Searching on selected keywords, we extracted and analysed 260 articles from well-known online databases. The analysis indicates that SER research is an active field of research, dozens of articles being published each year in journals and conference proceedings. The majority of articles concentrate on three critical aspects of SER, namely (1) databases, (2) suitable speech features, and (3) classification techniques to maximize the recognition accuracy of SER systems. Having carried out association analysis of the critical aspects and how they influence the performance of the SER system in term of recognition accuracy, we found that certain combination of databases, speech features and classifiers influence the recognition accuracy of the SER system. We have also suggested aspects of SER that could be taken into consideration in future works based on our review

    Automatic Speech Intelligibility Detection for Speakers with Speech Impairments: The Identification of Significant Speech Features

    No full text
    Selection of relevant features is important for discriminating speech in detection based ASR system, thus contributing to the improved performance of the detector. In the context of speech impairments, speech errors can be discriminated from regular speech by adopting the appropriate discriminative speech features with high discriminative ability between the impaired and the control group. However, identification of suitable discriminative speech features for error detection in impaired speech was not well investigated in the literature. Characteristics of impaired speech are grossly different from regular speech, thus making the existing speech features to be less effective in recognizing the impaired speech. To overcome this gap, the speech features of impaired speech based on the prosody, pronunciation and voice quality are analyzed for identifying the significant speech features which are related to the intelligibility deficits. In this research, we investigate the relations of speech impairments due to cerebral palsy, and hearing impairment with the prosody, pronunciation, and voice quality. Later, we identify the relationship of the speech features with the speech intelligibility classification and the significant speech features in improving the discriminative ability of an automatic speech intelligibility detection system. The findings showed that prosody, pronunciation and voice quality features are statistically significant speech features for improving the detection ability of impaired speeches. Voice quality is identified as the best speech features with more discriminative power in detecting speech intelligibility of impaired speech. © 2019 Penerbit Universiti Kebangsaan Malaysia. All rights reserved

    Severity-based adaptation with limited data for ASR to aid dysarthric speakers

    Get PDF
    Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data.Published versio

    Understanding User Requirements for a Senior-Friendly Mobile Health Application

    No full text
    The advancement of mobile technologies has motivated countries around the world to aim for smarter health management to support senior citizens. However, the use of mobile health applications (mHealth apps) among senior citizens appears to be low. Thus, drawing upon user expectations, the present study examined user requirements for a senior-friendly mHealth application. A total of 74 senior citizens were interviewed to explore the difficulties they encounter when using existing mobile apps. This study followed Nielsen’s usability model to identify user requirements from five aspects, namely learnability, efficiency, memorability, error, and satisfaction. Based on the results, a guideline was proposed pertaining to usability and health management features. This guideline offers suggestions for mHealth app issues related to phrasing, menus, simplicity, error messages, icons and buttons, navigation, and layout, among others. The study also found that speech recognition technology can help seniors access information quickly. The proposed guideline and findings offer valuable input for software and app developers in building more engaging and senior-friendly mHealth apps

    Fusion of speech and handwritten signatures biometrics for person identification

    No full text
    Automatic person identification (API) using human biometrics is essential and highly demanded compared to traditional API methods, where a person is automatically identified using his/her distinct characteristics including speech, fingerprint, iris, handwritten signatures, and others. The fusion of more than one human biometric produces bimodal and multimodal API systems that normally outperform single modality systems. This paper presents our work towards fusing speech and handwritten signatures for developing a bimodal API system, where fusion was conducted at the decision level due to the differences in the type and format of the features extracted. A data set is created that contains recordings of usernames and handwritten signatures of 100 persons (50 males and 50 females), where each person recorded his/her username 30 times and provided his/her handwritten signature 30 times. Consequently, a total of 3000 utterances and 3000 handwritten signatures were collected. The speech API used Mel-Frequency Cepstral Coefficients (MFCC) technique for features extraction and Vector Quantization (VQ) for features training and classification. On the other hand, the handwritten signatures API used global features for reflecting the structure of the hand signature image such as image area, pure height, pure width and signature height and the Multi-Layer Perceptron (MLP) architecture of Artificial Neural Network for features training and classification. Once the best matches for both the speech and the handwritten signatures API are produced, the fusion process takes place at decision level. It computes the difference between the two best matches for each modality and selects the modality of the maximum difference. Based on our experimental results, the bimodal API obtained an average recognition rate of 96.40%, whereas the speech API and the handwritten signatures API obtained average recognition rates of 92.60% and 75.20%, respectively. Therefore, the bimodal API system is able to outperform other single modality API systems
    corecore