Search CORE

25 research outputs found

Objective and Subjective Evaluation of Wideband Speech Quality

Author: Pourmand Nazanin
Publication venue: Scholarship@Western
Publication date: 27/03/2013
Field of study

Traditional landline and cellular communications use a bandwidth of 300 - 3400 Hz for transmitting speech. This narrow bandwidth impacts quality, intelligibility and naturalness of transmitted speech. There is an impending change within the telecommunication industry towards using wider bandwidth speech, but the enlarged bandwidth also introduces a few challenges in speech processing. Echo and noise are two challenging issues in wideband telephony, due to increased perceptual sensitivity by users. Subjective and/or objective measurements of speech quality are important in benchmarking speech processing algorithms and evaluating the effect of parameters like noise, echo, and delay in wideband telephony. Subjective measures include ratings of speech quality by listeners, whereas objective measures compute a metric based on the reference and degraded speech samples. While subjective quality ratings are the gold - standard\u27\u27, they are also time- and resource- consuming. An objective metric that correlates highly with subjective data is attractive, as it can act as a substitute for subjective quality scores in gauging the performance of different algorithms and devices. This thesis reports results from a series of experiments on subjective and objective speech quality evaluation for wideband telephony applications. First, a custom wideband noise reduction database was created that contained speech samples corrupted by different background noises at different signal to noise ratios (SNRs) and processed by six different noise reduction algorithms. Comprehensive subjective evaluation of this database revealed an interaction between the algorithm performance, noise type and SNR. Several auditory-based objective metrics such as the Loudness Pattern Distortion (LPD) measure based on the Moore - Glasberg auditory model were evaluated in predicting the subjective scores. In addition, the performance of Bayesian Multivariate Regression Splines(BMLS) was also evaluated in terms of mapping the scores calculated by the objective metrics to the true quality scores. The combination of LPD and BMLS resulted in high correlation with the subjective scores and was used as a substitution for fine - tuning the noise reduction algorithms. Second, the effect of echo and delay on the wideband speech was evaluated in both listening and conversational context, through both subjective and objective measures. A database containing speech samples corrupted by echo with different delay and frequency response characteristics was created, and was later used to collect subjective quality ratings. The LPD - BMLS objective metric was then validated using the subjective scores. Third, to evaluate the effect of echo and delay in conversational context, a realtime simulator was developed. Pairs of subjects conversed over the simulated system and rated the quality of their conversations which were degraded by different amount of echo and delay. The quality scores were analysed and LPD+BMLS combination was found to be effective in predicting subjective impressions of quality for condition-averaged data

Scholarship@Western

Sparseness-controlled adaptive algorithms for supervised and unsupervised system identification

Author: Loganathan Pradeep
Loganathan Pradeep
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/11/2011
Field of study

In single-channel hands-free telephony, the acoustic coupling between the loudspeaker and the microphone can be strong and this generates echoes that can degrade user experience. Therefore, effective acoustic echo cancellation (AEC) is necessary to maintain a stable system and hence improve the perceived voice quality of a call. Traditionally, adaptive filters have been deployed in acoustic echo cancellers to estimate the acoustic impulse responses (AIRs) using adaptive algorithms. The performances of a range of well-known algorithms are studied in the context of both AEC and network echo cancellation (NEC). It presents insights into their tracking performances under both time-invariant and time-varying system conditions. In the context of AEC, the level of sparseness in AIRs can vary greatly in a mobile environment. When the response is strongly sparse, convergence of conventional approaches is poor. Drawing on techniques originally developed for NEC, a class of time-domain and a frequency-domain AEC algorithms are proposed that can not only work well in both sparse and dispersive circumstances, but also adapt dynamically to the level of sparseness using a new sparseness-controlled approach. As it will be shown later that the early part of the acoustic echo path is sparse while the late reverberant part of the acoustic path is dispersive, a novel approach to an adaptive filter structure that consists of two time-domain partition blocks is proposed such that different adaptive algorithms can be used for each part. By properly controlling the mixing parameter for the partitioned blocks separately, where the block lengths are controlled adaptively, the proposed partitioned block algorithm works well in both sparse and dispersive time-varying circumstances. A new insight into an analysis on the tracking performance of improved proportionate NLMS (IPNLMS) is presented by deriving the expression for the mean-square error. By employing the framework for both sparse and dispersive time-varying echo paths, this work validates the analytic results in practical simulations for AEC. The time-domain second-order statistic based blind SIMO identification algorithms, which exploit the cross relation method, are investigated and then a technique with proportionate step-size control for both sparse and dispersive system identification is also developed

Spiral - Imperial College Digital Repository

An Algorithm to Evaluate the Echo Signal and the Voice Quality in VoIP Networks

Author: Kauffman Andre Neumann
Publication venue
Publication date: 18/04/2006
Field of study

Voice over the Internet Protocol (VoIP) has been increasingly popular, but reliability and voice quality remain important factors that limit the widespread adoption of VoIP systems. Providing good voice quality is of major importance for the transition from the PSTN to VoIP networks. There are several non-real-time algorithms that estimate the voice quality such as the PESQ and the E-model. In this thesis we propose a real-time fuzzy algorithm to estimate the echo quality component of the voice quality in VoIP networks. Differently from the existing algorithms, the proposed algorithm does not need a reference signal and has low computational complexity. For these reasons, the proposed algorithm can be embedded in every VoIP system of a network to monitor live calls, giving an estimate of the instantaneous voice quality to the network provider

Digital Repository at the University of Maryland

Beiträge zu breitbandigen Freisprechsystemen und ihrer Evaluation

Author: Jung Marc-André
Publication venue: Shaker
Publication date: 01/01/2017
Field of study

This work deals with the advancement of wideband hands-free systems (HFS’s) for mono- and stereophonic cases of application. Furthermore, innovative contributions to the corr. field of quality evaluation are made. The proposed HFS approaches are based on frequency-domain adaptive filtering for system identification, making use of Kalman theory and state-space modeling. Functional enhancement modules are developed in this work, which improve one or more of key quality aspects, aiming at not to harm others. In so doing, these modules can be combined in a flexible way, dependent on the needs at hand. The enhanced monophonic HFS is evaluated according to automotive ITU-T recommendations, to prove its customized efficacy. Furthermore, a novel methodology and techn. framework are introduced in this work to improve the prototyping and evaluation process of automotive HF and in-car-communication (ICC) systems. The monophonic HFS in several configurations hereby acts as device under test (DUT) and is thoroughly investigated, which will show the DUT’s satisfying performance, as well as the advantages of the proposed development process. As current methods for the evaluation of HFS’s in dynamic conditions oftentimes still lack flexibility, reproducibility, and accuracy, this work introduces “Car in a Box” (CiaB) as a novel, improved system for this demanding task. It is able to enhance the development process by performing high-resolution system identification of dynamic electro-acoustical systems. The extracted dyn. impulse response trajectories are then applicable to arbitrary input signals in a synthesis operation. A realistic dynamic automotive auralization of a car cabin interior is available for HFS evaluation. It is shown that this system improves evaluation flexibility at guaranteed reproducibility. In addition, the accuracy of evaluation methods can be increased by having access to exact, realistic imp. resp. trajectories acting as a so-called “ground truth” reference. If CiaB is included into an automotive evaluation setup, there is no need for an acoustical car interior prototype to be present at this stage of development. Hency, CiaB may ease the HFS development process. Dynamic acoustic replicas may be provided including an arbitrary number of acoustic car cabin interiors for multiple developers simultaneously. With CiaB, speech enh. system developers therefore have an evaluation environment at hand, which can adequately replace the real environment.Diese Arbeit beschäftigt sich mit der Weiterentwicklung breitbandiger Freisprechsysteme für mono-/stereophone Anwendungsfälle und liefert innovative Beiträge zu deren Qualitätsmessung. Die vorgestellten Verfahren basieren auf im Frequenzbereich adaptierenden Algorithmen zur Systemidentifikation gemäß Kalman-Theorie in einer Zustandsraumdarstellung. Es werden funktionale Erweiterungsmodule dahingehend entwickelt, dass mindestens eine Qualitätsanforderung verbessert wird, ohne andere eklatant zu verletzen. Diese nach Anforderung flexibel kombinierbaren algorithmischen Erweiterungen werden gemäß Empfehlungen der ITU-T (Rec. P.1110/P.1130) in vorwiegend automotiven Testszenarien getestet und somit deren zielgerichtete Wirksamkeit bestätigt. Es wird eine Methodensammlung und ein technisches System zur verbesserten Prototypentwicklung/Evaluation von automotiven Freisprech- und Innenraumkommunikationssystemen vorgestellt und beispielhaft mit dem monophonen Freisprechsystem in diversen Ausbaustufen zur Anwendung gebracht. Daraus entstehende Vorteile im Entwicklungs- und Testprozess von Sprachverbesserungssystem werden dargelegt und messtechnisch verifiziert. Bestehende Messverfahren zum Verhalten von Freisprechsystemen in zeitvarianten Umgebungen zeigten bisher oft nur ein unzureichendes Maß an Flexibilität, Reproduzierbarkeit und Genauigkeit. Daher wird hier das „Car in a Box“-Verfahren (CiaB) entwickelt und vorgestellt, mit dem zeitvariante elektro-akustische Systeme technisch identifiziert werden können. So gewonnene dynamische Impulsantworten können im Labor in einer Syntheseoperation auf beliebige Eingangsignale angewandt werden, um realistische Testsignale unter dyn. Bedingungen zu erzeugen. Bei diesem Vorgehen wird ein hohes Maß an Flexibilität bei garantierter Reproduzierbarkeit erlangt. Es wird gezeigt, dass die Genauigkeit von darauf basierenden Evaluationsverfahren zudem gesteigert werden kann, da mit dem Vorliegen von exakten, realen Impulsantworten zu jedem Zeitpunkt der Messung eine sogenannte „ground truth“ als Referenz zur Verfügung steht. Bei der Einbindung von CiaB in einen Messaufbau für automotive Freisprechsysteme ist es bedeutsam, dass zu diesem Zeitpunkt das eigentliche Fahrzeug nicht mehr benötigt wird. Es wird gezeigt, dass eine dyn. Fahrzeugakustikumgebung, wie sie im Entwicklungsprozess von automotiven Sprachverbesserungsalgorithmen benötigt wird, in beliebiger Anzahl vollständig und mind. gleichwertig durch CiaB ersetzt werden kann

Digitale Bibliothek Braunschweig

Perceptual techniques in audio quality assessment

Author: Rix Antony W.
Publication venue: The University of Edinburgh
Publication date: 01/01/2003
Field of study

Edinburgh Research Archive

Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

Author: Huck R. W.
Rafferty William
Reekie D. Hugh M.
Publication venue
Publication date
Field of study

Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

NASA Technical Reports Server

Echo control techniques in public switched telephone networks

Author: J. Jones David
Publication venue
Publication date
Field of study

University of Liverpool Repository

Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

Author: Parks D. L.
White R. W.
Publication venue
Publication date
Field of study

A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept

NASA Technical Reports Server

Development of algorithms for smart hearing protection devices

Author: Lezzoum Narimene
Publication venue: École de technologie supérieure
Publication date
Field of study

In industrial environments, wearing hearing protection devices is required to protect the wearers from high noise levels and prevent hearing loss. In addition to their protection against excessive noise, hearing protectors block other types of signals, even if they are useful and convenient. Therefore, if people want to communicate and exchange information, they must remove their hearing protectors, which is not convenient, or even dangerous. To overcome the problems encountered with the traditional passive hearing protection devices, this thesis outlines the steps and the process followed for the development of signal processing algorithms for a hearing protector that allows protection against external noise and oral communication between wearers. This hearing protector is called the “smart hearing protection device”. The smart hearing protection device is a traditional hearing protector in which a miniature digital signal processor is embedded in order to process the incoming signals, in addition to a miniature microphone to pickup external signals and a miniature internal loudspeaker to transmit the processed signals to the protected ear. To enable oral communication without removing the smart hearing protectors, signal processing algorithms must be developed. Therefore, the objective of this thesis consists of developing a noise-robust voice activity detection algorithm and a noise reduction algorithm to improve the quality and intelligibility of the speech signal. The methodology followed for the development of the algorithms is divided into three steps: first, the speech detection and noise reduction algorithms must be developed, second, these algorithms need to be evaluated and validated in software, and third, they must be implemented in the digital signal processor to validate their feasibility for the intended application. During the development of the two algorithms, the following constraints must be taken into account: the hardware resources of the digital signal processor embedded in the hearing protector (memory, number of operations per second), and the real-time constraint since the algorithm processing time should not exceed a certain threshold not to generate a perceptible delay between the active and passive paths of the hearing protector or a delay between the lips movement and the speech perception. From a scientific perspective, the thesis determines the thresholds that the digital signal processor should not exceed to not generate a perceptible delay between the active and passive paths of the hearing protector. These thresholds were obtained from a subjective study, where it was found that this delay depends on different parameters: (a) the degree of attenuation of the hearing protector, (b) the duration of the signal, (c) the level of the background noise, and (d) the type of the background noise. This study showed that when the fit of the hearing protector is shallow, 20 % of participants begin to perceive a delay after 8 ms for a bell sound (transient), 16 ms for a clean speech signal and 22 ms for a speech signal corrupted by babble noise. On the other hand, when having a deep hearing rotection fit, it was found that the delay between the two paths is 18 ms for the bell signal, 26 ms for the speech signal without noise and no delay when speech is corrupted by babble noise, showing that a better attenuation allows more time for digital signal processing. Second, this work presents a new voice activity detection algorithm in which a low complexity speech characteristic has been extracted. This characteristic was calculated as the ratio between the signal’s energy in the frequency region that contains the first formant to characterize the speech signal, and the low or high frequencies to characterize the noise signals. The evaluation of this algorithm and its comparison to another benchmark algorithm has demonstrated its selectivity with a false positive rate averaged over three signal to noise ratios (SNR) (10, 5 and 0 dB) of 4.2 % and a true positive rate of 91.4 % compared to 29.9 % false positives and 79.0 % of true positives for the benchmark algorithm. Third, this work shows that the extraction of the temporal envelope of a signal to generate a nonlinear and adaptive gain function enables the reduction of the background noise, the improvement of the quality of the speech signal and the generation of the least musical noise compared to three other benchmark algorithms. The development of speech detection and noise reduction algorithms, their objective and subjective evaluations in different noise environments, and their implementations in digital signal processors enabled the validation of their efficiency and low complexity for the the smart hearing protection application

Espace ÉTS

Proceedings of the Third International Mobile Satellite Conference (IMSC 1993)

Author: Cassingham Randy
Kwan Robert
Rigley Jack
Publication venue
Publication date
Field of study

Satellite-based mobile communications systems provide voice and data communications to users over a vast geographic area. The users may communicate via mobile or hand-held terminals, which may also provide access to terrestrial cellular communications services. While the first and second International Mobile Satellite Conferences (IMSC) mostly concentrated on technical advances, this Third IMSC also focuses on the increasing worldwide commercial activities in Mobile Satellite Services. Because of the large service areas provided by such systems, it is important to consider political and regulatory issues in addition to technical and user requirements issues. Topics covered include: the direct broadcast of audio programming from satellites; spacecraft technology; regulatory and policy considerations; advanced system concepts and analysis; propagation; and user requirements and applications

NASA Technical Reports Server