216 research outputs found

    Learning-Based Reference-Free Speech Quality Assessment for Normal Hearing and Hearing Impaired Applications

    Get PDF
    Accurate speech quality measures are highly attractive and beneficial in the design, fine-tuning, and benchmarking of speech processing algorithms, devices, and communication systems. Switching from narrowband telecommunication to wideband telephony is a change within the telecommunication industry which provides users with better speech quality experience but introduces a number of challenges in speech processing. Noise is the most common distortion on audio signals and as a result there have been a lot of studies on developing high performance noise reduction algorithms. Assistive hearing devices are designed to decrease communication difficulties for people with loss of hearing. As the algorithms within these devices become more advanced, it becomes increasingly crucial to develop accurate and robust quality metrics to assess their performance. Objective speech quality measurements are more attractive compared to subjective assessments as they are cost-effective and subjective variability is eliminated. Although there has been extensive research on objective speech quality evaluation for narrowband speech, those methods are unsuitable for wideband telephony. In the case of hearing-impaired applications, objective quality assessment is challenging as it has to be capable of distinguishing between desired modifications which make signals audible and undesired artifacts. In this thesis a model is proposed that allows extracting two sets of features from the distorted signal only. This approach which is called reference-free (nonintrusive) assessment is attractive as it does not need access to the reference signal. Although this benefit makes nonintrusive assessments suitable for real-time applications, more features need to be extracted and smartly combined to provide comparable accuracy as intrusive metrics. Two feature vectors are proposed to extract information from distorted signals and their performance is examined in three studies. In the first study, both feature vectors are trained on various portions of a noise reduction database for normal hearing applications. In the second study, the same investigation is performed on two sets of databases acquired through several hearing aids. Third study examined the generalizability of the proposed metrics on benchmarking four wireless remote microphones in a variety of environmental conditions. Machine learning techniques are deployed for training the models in the three studies. The studies show that one of the feature sets is robust when trained on different portions of the data from different databases and it also provides good quality prediction accuracy for both normal hearing and hearing-impaired applications

    A Speech Quality Classifier based on Tree-CNN Algorithm that Considers Network Degradations

    Get PDF
    Many factors can affect the users’ quality of experience (QoE) in speech communication services. The impairment factors appear due to physical phenomena that occur in the transmission channel of wireless and wired networks. The monitoring of users’ QoE is important for service providers. In this context, a non-intrusive speech quality classifier based on the Tree Convolutional Neural Network (Tree-CNN) is proposed. The Tree-CNN is an adaptive network structure composed of hierarchical CNNs models, and its main advantage is to decrease the training time that is very relevant on speech quality assessment methods. In the training phase of the proposed classifier model, impaired speech signals caused by wired and wireless network degradation are used as input. Also, in the network scenario, different modulation schemes and channel degradation intensities, such as packet loss rate, signal-to-noise ratio, and maximum Doppler shift frequencies are implemented. Experimental results demonstrated that the proposed model achieves significant reduction of training time, reaching 25% of reduction in relation to another implementation based on DRBM. The accuracy reached by the Tree-CNN model is almost 95% for each quality class. Performance assessment results show that the proposed classifier based on the Tree-CNN overcomes both the current standardized algorithm described in ITU-T Rec. P.563 and the speech quality assessment method called ViSQOL

    Monitoring VoIP Speech Quality for Chopped and Clipped Speech

    Get PDF

    Objective Estimation of Tracheoesophageal Speech Quality

    Get PDF
    Speech quality estimation for pathological voices is becoming an increasingly important research topic. The assessment of the quality and the degree of severity of a disordered speech is important to the clinical treatment and rehabilitation of patients. In particular, patients who have undergone total laryngectomy (larynx removal) produce Tracheoesophageal (TE) speech. In this thesis, we study the problem of TE speech quality estimation using advanced signal processing approaches. Since it is not possible to have a reference (clean) signal corresponding to a given TE speech (disordered) signal, we investigate in particular the non-intrusive techniques (also called single-ended or blind approaches) that do not require a reference signal to deduce the speech quality level. First, we develop a novel TE speech quality estimation based on some existing double-ended (intrusive) speech quality evaluation techniques such as the Perceptual Evaluation Speech Quality (PESQ) and Hearing Aid Speech Quality Index HASQI. The matching pursuit algorithm (MPA) was used to generate a quasi-clean speech signal from a given disordered TE speech signal. Then, by adequately choosing the parameters of the MPA (atoms, number of iterations,...etc) and using the resulting signal as our reference signal in the intrusive algorithm, we show that the resulting intrusive algorithm correlates well with the subjective scores of two TE speech databases. Second, we investigate the extraction of low complexity auditory features for the evaluation of speech quality. An 18-th order Linear Prediction (LP) analysis is performed on each voiced frame of the speech signal. Two evaluation features are extracted corresponding to higher-order statistics of the LP coefficients and the vocal tract model parameters (cross-sectional tubes areas). Using a set of 35 TE speech samples, we perform forward stepwise regression as well as K-fold cross-validation to select the best sets of features that are used in each of the regression models. Finally, the selected features are fitted to different support vector regression models yielding high correlations with subjective scores. Finally, we investigate a new approach for the estimation of the quality of TE speech using deep neural networks (DNNs). A synthetic dataset that consists of 2173 samples was used to train a DNN model that was shown to predict the TE voice quality. The synthetic dataset was formed by mixing 53 normal speech samples with modulated noise signals that had a similar envelope to the speech samples, at different speech-to-modulation noise ratios. A validated instrumental speech quality predictor was used to quantify the perceived quality of speech samples in this database, and these objective quality scores were used for training the DNN model. The DNN model was comprised of an input layer that accepted sixty relevant features extracted through filterbank and linear prediction analyses of the input speech signal, two hidden layers with 15 neurons each, and an output layer that produced the predicted speech quality score. The DNN trained on the synthetic dataset was subsequently applied to four different databases that contained speech samples collected from TE speakers. The DNN-estimated quality scores exhibited a strong correlation with the subjective ratings of the TE samples in all four databases, thus it shows strong robustness compared to those speech quality metrics developed in this thesis or those from the literature
    corecore