4,047 research outputs found

    New single-ended objective measure for non-intrusive speech quality evaluation

    Get PDF
    peer-reviewedThis article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.acceptedpeer-reviewe

    Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

    Full text link
    Without the need for a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations. While deep learning models have been used to develop non-intrusive speech assessment methods with promising results, there is limited research on hearing-impaired subjects. This study proposes a multi-objective non-intrusive hearing-aid speech assessment model, called HASA-Net Large, which predicts speech quality and intelligibility scores based on input speech signals and specified hearing-loss patterns. Our experiments showed the utilization of pre-trained SSL models leads to a significant boost in speech quality and intelligibility predictions compared to using spectrograms as input. Additionally, we examined three distinct fine-tuning approaches that resulted in further performance improvements. Furthermore, we demonstrated that incorporating SSL models resulted in greater transferability to OOD dataset. Finally, this study introduces HASA-Net Large, which is a non-invasive approach for evaluating speech quality and intelligibility. HASA-Net Large utilizes raw waveforms and hearing-loss patterns to accurately predict speech quality and intelligibility levels for individuals with normal and impaired hearing and demonstrates superior prediction performance and transferability

    Non-intrusive speech quality assessment using context-aware neural networks

    Get PDF
    To meet the human perceived quality of experience (QoE) while communicating over various Voice over Internet protocol (VoIP) applications, for example Google Meet, Microsoft Skype, Apple FaceTime, etc. a precise speech quality assessment metric is needed. The metric should be able to detect and segregate different types of noise degradations present in the surroundings before measuring and monitoring the quality of speech in real-time. Our research is motivated by the lack of clear evidence presenting speech quality metric that can firstly distinguish different types of noise degradations before providing speech quality prediction decision. To that end, this paper presents a novel non-intrusive speech quality assessment metric using context-aware neural networks in which the noise class (context) of the degraded or noisy speech signal is first identified using a classifier then deep neutral networks (DNNs) based speech quality metrics (SQMs) are trained and optimized for each noise class to obtain the noise class-specific (context-specific) optimized speech quality predictions (MOS scores). The noisy speech signals, that is, clean speech signals degraded by different types of background noises are taken from the NOIZEUS speech corpus. Results demonstrate that even in the presence of less number of speech samples available from the NOIZEUS speech corpus, the proposed metric outperforms in different contexts compared to the metric where the contexts are not classified before speech quality prediction.publishedVersio

    Speech Intelligibility Prediction for Hearing Aid Systems

    Get PDF

    Learning-Based Reference-Free Speech Quality Assessment for Normal Hearing and Hearing Impaired Applications

    Get PDF
    Accurate speech quality measures are highly attractive and beneficial in the design, fine-tuning, and benchmarking of speech processing algorithms, devices, and communication systems. Switching from narrowband telecommunication to wideband telephony is a change within the telecommunication industry which provides users with better speech quality experience but introduces a number of challenges in speech processing. Noise is the most common distortion on audio signals and as a result there have been a lot of studies on developing high performance noise reduction algorithms. Assistive hearing devices are designed to decrease communication difficulties for people with loss of hearing. As the algorithms within these devices become more advanced, it becomes increasingly crucial to develop accurate and robust quality metrics to assess their performance. Objective speech quality measurements are more attractive compared to subjective assessments as they are cost-effective and subjective variability is eliminated. Although there has been extensive research on objective speech quality evaluation for narrowband speech, those methods are unsuitable for wideband telephony. In the case of hearing-impaired applications, objective quality assessment is challenging as it has to be capable of distinguishing between desired modifications which make signals audible and undesired artifacts. In this thesis a model is proposed that allows extracting two sets of features from the distorted signal only. This approach which is called reference-free (nonintrusive) assessment is attractive as it does not need access to the reference signal. Although this benefit makes nonintrusive assessments suitable for real-time applications, more features need to be extracted and smartly combined to provide comparable accuracy as intrusive metrics. Two feature vectors are proposed to extract information from distorted signals and their performance is examined in three studies. In the first study, both feature vectors are trained on various portions of a noise reduction database for normal hearing applications. In the second study, the same investigation is performed on two sets of databases acquired through several hearing aids. Third study examined the generalizability of the proposed metrics on benchmarking four wireless remote microphones in a variety of environmental conditions. Machine learning techniques are deployed for training the models in the three studies. The studies show that one of the feature sets is robust when trained on different portions of the data from different databases and it also provides good quality prediction accuracy for both normal hearing and hearing-impaired applications

    Electroacoustic and Behavioural Evaluation of Hearing Aid Digital Signal Processing Features

    Get PDF
    Modern digital hearing aids provide an array of features to improve the user listening experience. As the features become more advanced and interdependent, it becomes increasingly necessary to develop accurate and cost-effective methods to evaluate their performance. Subjective experiments are an accurate method to determine hearing aid performance but they come with a high monetary and time cost. Four studies that develop and evaluate electroacoustic hearing aid feature evaluation techniques are presented. The first study applies a recent speech quality metric to two bilateral wireless hearing aids with various features enabled in a variety of environmental conditions. The study shows that accurate speech quality predictions are made with a reduced version of the original metric, and that a portion of the original metric does not perform well when applied to a novel subjective speech quality rating database. The second study presents a reference free (non-intrusive) electroacoustic speech quality metric developed specifically for hearing aid applications and compares its performance to a recent intrusive metric. The non-intrusive metric offers the advantage of eliminating the need for a shaped reference signal and can be used in real time applications but requires a sacrifice in prediction accuracy. The third study investigates the digital noise reduction performance of seven recent hearing aid models. An electroacoustic measurement system is presented that allows the noise and speech signals to be separated from hearing aid recordings. It is shown how this can be used to investigate digital noise reduction performance through the application of speech quality and speech intelligibility measures. It is also shown how the system can be used to quantify digital noise reduction attack times. The fourth study presents a turntable-based system to investigate hearing aid directionality performance. Two methods to extract the signal of interest are described. Polar plots are presented for a number of hearing aid models from recordings generated in both the free-field and from a head-and-torso simulator. It is expected that the proposed electroacoustic techniques will assist Audiologists and hearing researchers in choosing, benchmarking, and fine-tuning hearing aid features

    Non-intrusive speech quality prediction using modulation energies and LSTM-network

    Get PDF
    Many signal processing algorithms have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. Perceptual measures, i.e., listening tests, are usually considered the most reliable way to evaluate the quality of speech processed by such algorithms but are costly and time-consuming. Consequently, speech enhancement algorithms are often evaluated using signal-based measures, which can be either intrusive or non-intrusive. As the computation of intrusive measures requires a reference signal, only non-intrusive measures can be used in applications for which the clean speech signal is not available. However, many existing non-intrusive measures correlate poorly with the perceived speech quality, particularly when applied over a wide range of algorithms or acoustic conditions. In this paper, we propose a novel non-intrusive measure of the quality of processed speech that combines modulation energy features and a recurrent neural network using long short-term memory cells. We collected a dataset of perceptually evaluated signals representing several acoustic conditions and algorithms and used this dataset to train and evaluate the proposed measure. Results show that the proposed measure yields higher correlation with perceptual speech quality than that of benchmark intrusive and non-intrusive measures when considering various categories of algorithms. Although the proposed measure is sensitive to mismatch between training and testing, results show that it is a useful approach to evaluate specific algorithms over a wide range of acoustic conditions and may, thus, become particularly useful for real-time selection of speech enhancement algorithm settings

    Non-Intrusive Speech Intelligibility Prediction

    Get PDF
    • …
    corecore