30 research outputs found

    Applications of Adaptive Filtering

    Get PDF

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Doctor of Philosophy

    Get PDF
    dissertationHearing aids suffer from the problem of acoustic feedback that limits the gain provided by hearing aids. Moreover, the output sound quality of hearing aids may be compromised in the presence of background acoustic noise. Digital hearing aids use advanced signal processing to reduce acoustic feedback and background noise to improve the output sound quality. However, it is known that the output sound quality of digital hearing aids deteriorates as the hearing aid gain is increased. Furthermore, popular subband or transform domain digital signal processing in modern hearing aids introduces analysis-synthesis delays in the forward path. Long forward-path delays are not desirable because the processed sound combines with the unprocessed sound that arrives at the cochlea through the vent and changes the sound quality. In this dissertation, we employ a variable, frequency-dependent gain function that is lower at frequencies of the incoming signal where the information is perceptually insignificant. In addition, the method of this dissertation automatically identifies and suppresses residual acoustical feedback components at frequencies that have the potential to drive the system to instability. The suppressed frequency components are monitored and the suppression is removed when such frequencies no longer pose a threat to drive the hearing aid system into instability. Together, the method of this dissertation provides more stable gain over traditional methods by reducing acoustical coupling between the microphone and the loudspeaker of a hearing aid. In addition, the method of this dissertation performs necessary hearing aid signal processing with low-delay characteristics. The central idea for the low-delay hearing aid signal processing is a spectral gain shaping method (SGSM) that employs parallel parametric equalization (EQ) filters. Parameters of the parametric EQ filters and associated gain values are selected using a least-squares approach to obtain the desired spectral response. Finally, the method of this dissertation switches to a least-squares adaptation scheme with linear complexity at the onset of howling. The method adapts to the altered feedback path quickly and allows the patient to not lose perceivable information. The complexity of the least-squares estimate is reduced by reformulating the least-squares estimate into a Toeplitz system and solving it with a direct Toeplitz solver. The increase in stable gain over traditional methods and the output sound quality were evaluated with psychoacoustic experiments on normal-hearing listeners with speech and music signals. The results indicate that the method of this dissertation provides 8 to 12 dB more hearing aid gain than feedback cancelers with traditional fixed gain functions. Furthermore, experimental results obtained with real world hearing aid gain profiles indicate that the method of this dissertation provides less distortion in the output sound quality than classical feedback cancelers, enabling the use of more comfortable style hearing aids for patients with moderate to profound hearing loss. Extensive MATLAB simulations and subjective evaluations of the results indicate that the method of this dissertation exhibits much smaller forward-path delays with superior howling suppression capability

    Objective and Subjective Evaluation of Wideband Speech Quality

    Get PDF
    Traditional landline and cellular communications use a bandwidth of 300 - 3400 Hz for transmitting speech. This narrow bandwidth impacts quality, intelligibility and naturalness of transmitted speech. There is an impending change within the telecommunication industry towards using wider bandwidth speech, but the enlarged bandwidth also introduces a few challenges in speech processing. Echo and noise are two challenging issues in wideband telephony, due to increased perceptual sensitivity by users. Subjective and/or objective measurements of speech quality are important in benchmarking speech processing algorithms and evaluating the effect of parameters like noise, echo, and delay in wideband telephony. Subjective measures include ratings of speech quality by listeners, whereas objective measures compute a metric based on the reference and degraded speech samples. While subjective quality ratings are the gold - standard\u27\u27, they are also time- and resource- consuming. An objective metric that correlates highly with subjective data is attractive, as it can act as a substitute for subjective quality scores in gauging the performance of different algorithms and devices. This thesis reports results from a series of experiments on subjective and objective speech quality evaluation for wideband telephony applications. First, a custom wideband noise reduction database was created that contained speech samples corrupted by different background noises at different signal to noise ratios (SNRs) and processed by six different noise reduction algorithms. Comprehensive subjective evaluation of this database revealed an interaction between the algorithm performance, noise type and SNR. Several auditory-based objective metrics such as the Loudness Pattern Distortion (LPD) measure based on the Moore - Glasberg auditory model were evaluated in predicting the subjective scores. In addition, the performance of Bayesian Multivariate Regression Splines(BMLS) was also evaluated in terms of mapping the scores calculated by the objective metrics to the true quality scores. The combination of LPD and BMLS resulted in high correlation with the subjective scores and was used as a substitution for fine - tuning the noise reduction algorithms. Second, the effect of echo and delay on the wideband speech was evaluated in both listening and conversational context, through both subjective and objective measures. A database containing speech samples corrupted by echo with different delay and frequency response characteristics was created, and was later used to collect subjective quality ratings. The LPD - BMLS objective metric was then validated using the subjective scores. Third, to evaluate the effect of echo and delay in conversational context, a realtime simulator was developed. Pairs of subjects conversed over the simulated system and rated the quality of their conversations which were degraded by different amount of echo and delay. The quality scores were analysed and LPD+BMLS combination was found to be effective in predicting subjective impressions of quality for condition-averaged data

    PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

    Full text link
    Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

    Smart Sound Control in Acoustic Sensor Networks: a Perceptual Perspective

    Full text link
    [ES] Los sistemas de audio han experimentado un gran desarrollo en los últimos años gracias al aumento de dispositivos con procesadores de alto rendimiento capaces de realizar un procesamiento cada vez más eficiente. Además, las comunicaciones inalámbricas permiten a los dispositivos de una red estar ubicados en diferentes lugares sin limitaciones físicas. La combinación de estas tecnologías ha dado lugar a la aparición de las redes de sensores acústicos (ASN). Una ASN está compuesta por nodos equipados con transductores de audio, como micrófonos o altavoces. En el caso de la monitorización acústica del campo, sólo es necesario incorporar sensores acústicos a los nodos ASN. Sin embargo, en el caso de las aplicaciones de control, los nodos deben interactuar con el campo acústico a través de altavoces. La ASN puede implementarse mediante dispositivos de bajo coste, como Raspberry Pi o dispositivos móviles, capaces de gestionar varios micrófonos y altavoces y de ofrecer una buena capacidad de cálculo. Además, estos dispositivos pueden comunicarse mediante conexiones inalámbricas, como Wi-Fi o Bluetooth. Por lo tanto, en esta tesis, se propone una ASN compuesta por dispositivos móviles conectados a altavoces inalámbricos mediante un enlace Bluetooth. Además, el problema de la sincronización entre los dispositivos de una ASN es uno de los principales retos a abordar, ya que el rendimiento del procesamiento de audio es muy sensible a la falta de sincronismo. Por lo tanto, también se lleva a cabo un análisis del problema de sincronización entre dispositivos conectados a altavoces inalámbricos en una ASN. En este sentido, una de las principales aportaciones es el análisis de la latencia de audio cuando los nodos acústicos de la ASN están formados por dispositivos móviles que se comunican altavoces mediante enlaces Bluetooth. Una segunda contribución significativa de esta tesis es la implementación de un método para sincronizar los diferentes dispositivos de una ASN, junto con un estudio de sus limitaciones. Por último, se ha introducido el método propuesto para implementar aplicaciones de zonas sonoras personales (PSZ). Por lo tanto, la implementación y el análisis del rendimiento de diferentes aplicaciones de audio sobre una ASN compuesta por dispositivos móviles y altavoces inalámbricos es también una contribución significativa en el área de las ASN. Cuando el entorno acústico afecta negativamente a la percepción de la señal de audio emitida por los altavoces de la ASN, se uti­lizan técnicas de ecualización para mejorar la percepción de la señal de audio. Para ello, en esta tesis se implementa un sistema de ecualización inteligente. Para ello, se emplean algoritmos psicoacústicos para implementar un procesamiento inteligente basado en el sis­tema auditivo humano capaz de adaptarse a los cambios del entorno. Por ello, otra contribución importante de esta tesis es el análisis del enmas­caramiento espectral entre dos sonidos complejos. Este análisis permitirá calcular el umbral de enmascaramiento de un sonido con más precisión que los métodos utilizados actualmente. Este método se utiliza para implementar una aplicación de ecualización perceptiva que pretende mejorar la percepción de la señal de audio en presencia de un ruido ambien­tal. Para ello, esta tesis propone dos algoritmos de ecualización diferentes: 1) la pre-ecualización de la señal de audio para que se perciba por encima del umbral de enmascaramiento del ruido ambiental y 2) diseñar un con­trol de ruido ambiental perceptivo en los sistemas de ecualización activa de ruido (ANE), para que el nivel de ruido ambiental percibido esté por debajo del umbral de enmascaramiento de la señal de audio. Por lo tanto, la ultima aportación de esta tesis es la implementación de una aplicación de ecualización perceptiva con los dos diferentes algorit­mos de ecualización embebidos y el análisis de su rendimiento a través del banco de pruebas realizado en el laboratorio GTAC-iTEAM.[CA] El sistemes de so han experimentat un gran desenvolupament en els últims anys gràcies a l'augment de dispositius amb processadors d'alt rendiment capaços de realitzar un processament d'àudio cada vegada més eficient. D'altra banda, l'expansió de les comunicacions inalàmbriques ha permès implementar xarxes en les quals els dispositius poden estar situats a difer­ents llocs sense limitacions físiques. La combinació d'aquestes tecnologies ha donat lloc a l'aparició de les xarxes de sensors acústics (ASN). Una ASN està composta per nodes equipats amb transductors d'àudio, com micr`ofons o altaveus. En el cas del monitoratge del camp acústic, només cal incorporar sensors acústics als nodes de l'ASN. No obstant això, en el cas de les aplicacions de control, els nodes han d'interactuar amb el camp acústic a través d'altaveus. Una ASN pot implementar-se mitjant¿cant dispositius de baix cost, com ara Raspberry Pi o dispositius mòbils, capaços de gestionar di­versos micròfons i altaveus i d'oferir una bona capacitat computacional. A més, aquests dispositius poden comunicar-se a través de connexions inalàmbriques, com Wi-Fi o Bluetooth. Per això, en aquesta tesi es proposa una ASN composta per dispositius mòbils connectats a altaveus inalàmbrics a través d'un enllaç Bluetooth. El problema de la sincronització entre els dispositius d'una ASN és un dels principals reptes a abordar ja que el rendiment del processament d'àudio és molt sensible a la falta de sincronisme. Per tant, també es duu a terme una anàlisi profunda del problema de la sincronització entre els dispositius comercials connectats als altaveus inalàmbrics en una ASN. En aquest sentit, una de les principals contribucions és l'anàlisi de la latència d'àudio quan els nodes acústics en l'ASN estan compostos per dispositius mòbils que es comuniquen amb els altaveus corresponents mitjançant enllaços Bluetooth. Una segona contribuciò sig­nificativa d'aquesta tesi és la implementació d'un mètode per sincronitzar els diferents dispositius d'una ASN, juntament amb un estudi de les seves limitacions. Finalment, s'ha introduït el mètode proposat per implemen­tar aplicacions de zones de so personal. Per tant, la implementació i l'anàlisi del rendiment de diferents aplicacions d'àudio sobre una ASN composta per dispositius mòbils i al­taveus inalàmbrics és també una contribució significativa a l'àrea de les ASN. Quan l'entorn acústic afecta negativament a la percepció del senyal d'àudio emesa pels altaveus de l'ASN, es fan servir tècniques d'equalització per a millorar la percepció del senyal d'àudio. En consequència, en aquesta tesi s'implementa un sistema d'equalització intel·ligent. Per això, s'utilitzen algoritmes psicoacústics per implementar un processament intel·ligent basat en el sistema audi­tiu humà capaç d'adaptar-se als canvis de l'entorn. Per aquest motiu, una altra contribució important d'aquesta tesi és l'anàlisi de l'emmascarament espectral entre dos sons complexos. Aquesta anàlisi permetrà calcular el llindar d'emmascarament d'un so sobre amb més precisió que els mètodes utilitzats actualment. Aquest mètode s'utilitza per a imple­mentar una aplicació d'equalització perceptual que pretén millorar la per­cepció del senyal d'àudio en presència d'un soroll ambiental. Per això, aquesta tesi proposa dos algoritmes d'equalització diferents: 1) la pree­qualització del senyal d'àudio perquè es percebi per damunt del llindar d'emmascarament del soroll ambiental i 2) dissenyar un control de soroll ambiental perceptiu en els sistemes d'equalització activa de soroll (ANE) de manera que el nivell de soroll ambiental percebut estiga per davall del llindar d'emmascarament del senyal d'àudio. Per tant, l'última aportació d'aquesta tesi és la implementació d'una aplicació d'equalització perceptiva amb els dos algoritmes d'equalització embeguts i l'anàlisi del seu rendiment a través del banc de proves realitzat al laboratori GTAC-iTEAM.[EN] Audio systems have been extensively developed in recent years thanks to the increase of devices with high-performance processors able to per­form more efficient processing. In addition, wireless communications allow devices in a network to be located in different places without physical limitations. The combination of these technologies has led to the emergence of Acoustic Sensor Networks (ASN). An ASN is com­posed of nodes equipped with audio transducers, such as microphones or speakers. In the case of acoustic field monitoring, only acoustic sensors need to be incorporated into the ASN nodes. However, in the case of control applications, the nodes must interact with the acoustic field through loudspeakers. ASN can be implemented through low-cost devices, such as Rasp­berry Pi or mobile devices, capable of managing multiple mi­crophones and loudspeakers and offering good computational capacity. In addition, these devices can communicate through wireless connections, such as Wi-Fi or Bluetooth. Therefore, in this dissertation, an ASN composed of mobile devices connected to wireless speak­ers through a Bluetooth link is proposed. Additionally, the problem of syn­chronization between the devices in an ASN is one of the main challenges to be addressed since the audio processing performance is very sensitive to the lack of synchronism. Therefore, an analysis of the synchroniza­tion problem between devices connected to wireless speakers in an ASN is also carried out. In this regard, one of the main contributions is the analysis of the audio latency of mobile devices when the acoustic nodes in the ASN are comprised of mobile devices communicating with the corresponding loudspeakers through Bluetooth links. A second significant contribution of this dissertation is the implementation of a method to synchronize the different devices of an ASN, together with a study of its limitations. Finally, the proposed method has been introduced in order to implement personal sound zones (PSZ) applications. Therefore, the imple­mentation and analysis of the performance of different audio applications over an ASN composed of mobile devices and wireless speakers is also a significant contribution in the area of ASN. In cases where the acoustic environment negatively affects the percep­tion of the audio signal emitted by the ASN loudspeakers, equalization techniques are used with the objective of enhancing the perception thresh­old of the audio signal. For this purpose, a smart equalization system is implemented in this dissertation. In this regard, psychoacous­tic algorithms are employed to implement a smart processing based on the human hearing system capable of adapting to changes in the envi­ronment. Therefore, another important contribution of this thesis focuses on the analysis of the spectral masking between two complex sounds. This analysis will allow to calculate the masking threshold of one sound over the other in a more accurate way than the currently used methods. This method is used to implement a perceptual equalization application that aims to improve the perception threshold of the audio signal in presence of ambient noise. To this end, this thesis proposes two different equalization algorithms: 1) pre-equalizing the audio signal so that it is perceived above the ambient noise masking threshold and 2) designing a perceptual control of ambient noise in active noise equalization (ANE) systems, so that the perceived ambient noise level is below the masking threshold of the audio signal. Therefore, the last contribution of this dissertation is the imple­mentation of a perceptual equalization application with the two different embedded equalization algorithms and the analysis of their performance through the testbed carried out in the GTAC-iTEAM laboratory.This work has received financial support of the following projects: • SSPRESING: Smart Sound Processing for the Digital Living (Reference: TEC2015-67387-C4-1-R. Entity: Ministerio de Economia y Empresa. Spain). • FPI: Ayudas para contratos predoctorales para la formación de doctores (Reference: BES-2016-077899. Entity: Agencia Estatal de Investigación. Spain). DANCE: Dynamic Acoustic Networks for Changing Environments (Reference: RTI2018-098085-B-C41-AR. Entity: Agencia Estatal de Investigación. Spain). • DNOISE: Distributed Network of Active Noise Equalizers for Multi-User Sound Control (Reference: H2020-FETOPEN-4-2016-2017. Entity: I+D Colaborativa competitiva. Comisión de las comunidades europea).Estreder Campos, J. (2022). Smart Sound Control in Acoustic Sensor Networks: a Perceptual Perspective [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181597TESI

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Ultrasonic splitting of oil-in-water emulsions

    Get PDF

    On the applicability of models for outdoor sound (A)

    Get PDF
    corecore