31 research outputs found

    Evaluation of audio source separation in the context of 3D audio

    Get PDF
    The emergence and broader availability of 3D audio systems allows for new possibilities in mixing, post-production and playback of audio content. Used in movie post-production for cinemas, as special effect by disk jockeys for example and even for live concerts, 3D rendering immerses the listener more than ever before. When existing audio material is to be employed, Audio Source Separation (ASS) techniques enable the extraction of single sources from a mixture. Modern mixing approaches for 3D audio do not assign individual gains and delays for each source in every channel. A sound scene is rather designed, with individual sources treated as objects to be placed within a scene. The hardware layer is mostly irrelevant for mixing in such a setting. ASS is therefore a valuable tool to ¿disassemble¿ amore traditional monophonic, stereophonic, or multichannel mix. However, due to the complexity of the ASS problem, extracted sources are subject to degradations. While state-of-the-art objective measures for ASS quality build on monaural auditory models, they don¿t take into account binaural listening and the psychoacoustic phenomena that are involved, such as binaural unmasking. In this thesis, an extension to Perceptive Evaluation Methods for Audio Source Separation (PEASS) [41] is proposed with spatial rendering in mind. Additionally a new binaural model for ASS evaluation in the context of 3D audio is presented. The performance of the basic and extended versions of PEASS, as well as the proposed binaural model is evaluated in two subjective studies. The first study is conducted with binaural spatialisation presented over headphones, while the second experiment uses a 3DWave Field Synthesis (WFS) system. A set of artificial ASS degradation algorithms is proposed and used for the stimuli of the subjective studies. Results of the studies indicate monotonic decrease of the perceived quality as a function of the amounts of degradations introduced. The most important degradation is found to be target distortion, followed by onset misallocation and musical noise-type artifacts. Additionally, spatialising the extracted target source away from the residue or having it louder than the residue negatively affects the results, indicating a perceived quality degradation. In 3D WFS conditions, results show evidence for monaural and binaural unmasking. The performance of the proposed binauralmodel is consistently superior to that of the basic or extended PEASS versions. In the binaural spatialisation experiment, a correlation coefficient of 0.60 between subjective and objective results is achieved, versus 0.57 and 0.53 with the extended and basic PEASS version respectively. For the 3D WFS study, the binaural model achieves 0.67 prediction accuracy whereas both PEASS versions get 0.57. The perceptual validity of the WFS formulation is also verified in a localisation experiment. Vertical localisation is found to be nearly as good as physical source localisation for an extended listening area with localisation precision of 6± - 9±. The response time is also used as an indicator of localisation performance

    Quality assessment of spherical microphone array auralizations

    Get PDF
    The thesis documents a scientific study on quality assessment and quality prediction in Virtual Acoustic Environments (VAEs) based on spherical microphone array data, using binaural synthesis for reproduction. In the experiments, predictive modeling is applied to estimate the influence of the array on the reproduction quality by relating the data derived in perceptual experiments to the output of an auditory model. The experiments adress various aspects of the array considered relevant in auralization applications: the influence of system errors as well as the influence of the array configuration employed. The system errors comprise spatial aliasing, measurement noise, and microphone positioning errors while the array configuration is represented by the sound field order in terms of spherical harmonics, defining the spatial resolution of the array. Based on array simulations, the experimental data comprise free-field sound fields and two shoe-box shaped rooms, one with weak and another with strong reverberation. Ten audio signals served as test material, e.g., orchestral/pop music, male/female singing voice or single instruments such as castanets. In the perceptual experiments, quantitative methods are used to evaluate the impact of system errors while a descriptive analysis assesses the array configuration using two quality factors for attribution: Apparent Source Width (ASW) and Listener Envelopment (LEV). Both are quality measures commonly used in concert hall acoustics to describe the spaciousness of a room. The results from the perceptual experiments are subsequently related to the technical data derived from the auditory model in order to build, train, and evaluate a variety of predictive models. Based on classification and regression approaches, these models are applied and investigated for automated quality assessment in order to identify and categorize system errors as well as to estimate their perceptual strength. Moreover, the models allow to predict the array’s influence on ASW and LEV perception and enable the classification of further sound field characteristics, like the reflection properties of the simulated room or the sound field order used. The applied prediction models comprise simple linear regression and decision trees, or more complex models such as support vector machines or artificial neural networks. The results show that the developed prediction models perform well in their classification and regression tasks. Although their functionality is limited to the conditions underlying the conducted experiments, they can still provide a useful tool to assess basic quality-related aspects which are important when developing spherical microphone arrays for auralization applications.Die vorliegende Arbeit beschäftigt sich mit der Qualitätsbewertung und -vorhersage in virtuellen akustischen Umgebungen, insbesondere in Raumsimulationen basierend auf Kugelarraydaten, die mithilfe binauraler Synthese auralisiert werden. Dabei werden verschiedene Prädiktionsverfahren angewandt, um den Einfluss des Arrays auf die Wiedergabequalität automatisiert vorherzusagen, indem die Daten von Hörexperimenten mit denen eines auditorischen Modells in Bezug gesetzt werden. Im Fokus der Experimente stehen unterschiedliche, praxisrelevante Aspekte des Messsystems, die einen Einfluss auf die Wiedergabequalität haben. Konkret sind dies Messfehler, wie räumliches Aliasing, Rauschen oder Mikrofonpositionierungsfehler, oder die Konfiguration des Arrays. Diese definiert das räumliche Auflösungsvermögen und entspricht der gewählten Ordnung der Sphärischen Harmonischen Zerlegung. Die Experimente basieren auf Kugelarray-Simulationen unter Freifeldbedingungen und in einfachen simulierten Rechteckräumen mit unterschiedlichen Reflexionseigenschaften, wobei ein Raum trocken, der andere dagegen stark reflektierend ist. Dabei dienen zehn Testsignale als Audiomaterial, die in praktischen Anwendungen relevant erscheinen, wie z. B. Orchester- oder Popmusik, männlicher und weiblicher Gesang oder Kastagnetten. In Wahrnehmungsexperimenten wird der Einfluss von Messfehlern in einer quantitativen Analyse bewertet und die Qualität der Synthese deskriptiv mit den Attributen Apparent Source Width (ASW) und Listener Envelopment (LEV) bewertet. Die resultierenden Daten bilden die Basis für die Qualitätsvorhersage, wobei die Hörtestergebnisse als Observationen und die Ausgangsdaten des auditorischen Modells als Prädiktoren dienen. Mit den Daten werden unterschiedliche Prädiktionsmodelle trainiert und deren Vorhersagegenauigkeit anschließend bewertet. Die entwickelten Modelle ermöglichen es, sowohl Messfehler zu identifizieren und zu klassifizieren als auch deren Ausprägung zu schätzen. Darüber hinaus erlauben sie es, den Einfluss der Arraykonfiguration auf die Wahrnehmung von ASW und LEV vorherzusagen und die verwendete Ordnung der Schallfeldzerlegung zu identifizieren, ebenso wie die Reflexionseigenschaften des simulierten Raumes. Es kommen sowohl einfache Regressionsmodelle und Entscheidungsbäume zur Anwendung als auch komplexere Modelle, wie Support Vector Machines oder neuronale Netze. Die entwickelten Modelle zeigen in der Regel eine hohe Genauigkeit bei der Qualitätsvorhersage und erlauben so die Analyse von grundlegenden Array-Eigenschaften, ohne aufwendige Hörexperimente durchführen zu müssen. Obwohl die Anwendbarkeit der Modelle auf die hier untersuchten Fälle beschränkt ist, können sie sich als hilfreiche Werkzeuge bei der Entwicklung von Kugelarrays für Auralisationsanwendungen erweisen

    Considering Bluetooth's Subband Codec (SBC) for Wideband Speech and Audio on the Internet

    Get PDF
    The Bluetooth Special Interest Group (SIG) has standardized the subband coding (SBC) audio codec to connect headphones via wireless Bluetooth links. SBC compresses audio at high fidelity while having an ultra-low algorithm delay. To make SBC suitable for the Internet, we extend it by using a time and packet loss concealment (PLC) algorithm that is based on ITU's G.711 Appendix I. The design is novel in the aspect of the interface between codec and speech receiver. We developed a new approach on how to distribute the functionality of a speech receiver between codec and application. Our approach leads to easier implementations of high quality VoIP applications. We conducted subjective and objective listening tests of the audio quality of SBC and PLC in order to determine an optimal coding mode and the trade-off between coding mode and packet loss rate. More precisely, we conducted MUSHRA listening tests for selected sample items. These tests results are then compared with the results of multiple objective assessment algorithms (ITU P.862 PESQ, ITU BS.1387-1 PEAQ, Creusere's algorithm). We found out that a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results . The linear regression has coefficient of determination of R²=0.907². By comparison, our individual human ratings show a correlation of about R=0.9 compared to our averaged human rating results. Using the combination of both PEAQ algorithms, we calculate hundred thousands of objective audio quality ratings varying audio content and algorithmic parameters of SBC and PLC. The results show which set of parameters value are best suitable for a bandwidth and delay constrained link. The transmission quality of SBC is enhanced significantly by selecting optimal encoding parameters as compared to the default parameter sets given in the standard. Finally, we present preliminary objective tests results on the comparison of the audio codecs SBC, CELT, APT-X and ULD coding speech and audio transmission. They all allow a mono and stereo transmission of music at ultra-low coding delays (<10ms), which is especially useful for distributed ensemble performances over the Internet

    Objective and Subjective Evaluation of Binaural Beamformers in Hearing Aids

    Get PDF
    Hearing aids use a variety of noise reduction techniques to enhance the experience of hearing impaired listeners. One of these techniques is beamforming, which typically aims to preserve sounds coming from the front of the user and suppresses those from the sides and back. Recently, hearing aids have begun employing a wireless connection between the left and right hearing aids in order to augment the directionality of the beamformers, called binaural beamformers. However, the effect of these binaural beamformers on perceived quality and intelligibility has not been thoroughly tested. This thesis investigated the benchmarking of hearing aids which utilize binaural beamforming algorithms using behavioural testing and computational models. Speech recordings from bilateral pairs of several popular hearing aids were obtained across different processing conditions, and in different noisy and reverberant environments. The quality of these recordings was evaluated subjectively by thirteen hearing impaired adults. In addition, computational predictors of perceived quality and intelligibility were extracted from the left and right hearing aid recordings. Objective and subjective analyses revealed that binaural beamforming has a generally positive effect on quality and intelligibility that was dependent on the directionality of the speech and noise. The ear recording with the better predicted quality score was also found to correlate better with the subjective quality ratings than the average of left and right ear predicted scores. A new weighting function that optimally combines the monaural computational metrics was developed, which was shown to be especially effective in environments where speech and/or noise sources are asymmetrically positioned

    Perceptual techniques in audio quality assessment

    Get PDF

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Towards the automatic assessment of spatial quality in the reproduced sound environment

    Get PDF
    The research in this thesis describes the creation and development of a method for the prediction of perceived spatial quality. The QESTRAL (Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener) model is an objective evaluation model capable of accurately predicting changes to perceived spatial quality. It uses probe signals and a set of objective metrics to measure changes to low-level spatial attributes. A polynomial weighting function derived from regression analysis is used to predict data from listening tests, which employed spatial audio processes (SAPs) proven to stress those low-level attributes. A listening test method was developed for collecting listener judgements of impairments to spatial quality. This involved the creation of a novel test interface to reduce the biases inherent in other similar audio quality assessment tests. Pilot studies were undertaken which established the suitability of the method. Two large scale listening tests were conducted using 31 Tonmeister students from the Institute of Sound Recording (IoSR), University of Surrey. These tests evaluated 48 different SAPs, typically encountered in consumer sound reproduction equipment, when applied to 6 types of programme material. The tests were conducted at two listening positions to determine how perceived spatial quality was changed. Analysis of the data collected from these listening tests showed that the SAPs created a diverse range of judgements that spanned the range of the spatial quality test scale and that listening position, programme material type and listener each had a statistically significant influence upon perceived spatial quality. These factors were incorporated into a database of 308 responses used to calibrate the model. The model was calibrated using partial least-squares regression using target specifications similar to those of audio quality models created by other researchers. This resulted in five objective metrics being selected for use in the model. A method of post correction using an exponential equation was used to reduce non-linearity in the predicted results, thought to be caused by the inability of some metrics to scrutinise the highest quality SAPs. The resulting model had a correlation (r) of 0.89 and an error (RMSE) of 11.06% and performs similarly to models developed by other researchers. Statistical analysis also indicated that the model would generalise to a larger population of listeners.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore