Search CORE

332 research outputs found

Efficient Algorithms for Immersive Audio Rendering Enhancement

Author: BRUSCHI Valeria
Publication venue: country:Italia
Publication date: 19/06/2023
Field of study

Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

IRIS UniversitÃ Politecnica delle Marche

Graafinen ekvalisointi taajuusvarpattujen digitaalisten suotimien avulla

Author: Siiskonen Jaakko
Publication venue
Publication date: 24/08/2016
Field of study

The aim of this thesis is to design a graphic equalizer with frequency warped digital filters. The proposed design consists of a warped FIR filter for the low frequency bands and a standard FIR filter for the high frequency bands. This de- sign is used to implement both an octave and a one-third octave equalizer in Matlab. Low frequency equalization with FIR filters requires high filter orders. The frequency resolution of the lowest band of the graphic equalizer requires filter orders that are impractical for real life applications. With frequency warping filter orders can be lowered, so that a practical graphic equalizer can be designed. With this design common gain build-up problems, which are present in most of the IIR designs, can be avoided. The proposed equalizer design is found to be accurate and comparable to the previous equalizer designs. Filter orders required are small enough to this design to be used in real life applications. The gain build-up problem is avoided in this design, as several equalizer bands are filtered with a single filter. The computational costs of the design are higher than the costs of the other compared designs. However, the difference can be smaller if the accuracy restrictions are lowered.Tämän työn tavoitteena on suunnitella graafinen ekvalisaattori taajuusvarpattujen digitaalisten suotimien avulla. Ehdotettu ekvalisaattorimalli koostuu taajuusvarpatusta ja tavallisesta FIR suotimesta. Varpattua suodinta käytetään alimpien taajuuskaistojen suodattamiseen ja tavallista FIR suodinta ylimpien kaistojen suodattamiseen. Tätä mallia käytetään sekä oktaavi- että terssikaista-ekvalisaattorien totetutamiseen Matlabilla. Matalien taajuuksien ekvalisointi edellyttää korkeaa astelukua FIR suotimilta. Alimpien taajuuskaistojen taajuusresoluutio edellyttää astelukuja, jotka ovat epäkäytännöllisiä tosielämän sovelluksissa. Taajuusvarppauksella suotimien astelukuja voidaan pienentää, jolloin graafinen ekvalisaattori voidaan toteuttaa käytännössä. Tällä mallilla voidaan välttää IIR ekvalisaattorien yleinen ongelma, jossa ekvalisaattorien kaistojen vahvistus vaikuttaa viereisiin kaistoihin. Ehdotettu ekvalisaattorimalli todetaan olevan tarkka ja vertailukelpoinen aikaisempien toteutuksien kanssa. Suotimien asteluvut ovat tarpeeksi pieniä, jotta tätä mallia voidaan käyttää tosielämän toteutuksissa. Kaistojen välinen vaikutus vältetään tällä mallilla, sillä useampi kaista suodatetaan yhdellä suotimella. Laskennallinen kuorma on tällä toteutuksella suurempi kuin muilla vertailluilla toteutuksilla. Eroa voidaan pienentää, jos ekvalisaattorin tarkkuusvaatimuksia lasketaan

Aaltodoc Publication Archive

Blind estimation of audio effects using an auto-encoder approach and differentiable signal processing

Author: Peeters Geoffroy
Peladeau Côme
Publication venue
Publication date: 18/10/2023
Field of study

Blind Estimation of Audio Effects (BE-AFX) aims at estimating the Audio Effects (AFXs) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFXs used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. Instead, we introduce an auto-encoder approach, which optimizes an audio quality metric. We explore, suggest, and compare various implementations of commonly used mastering AFXs, using differential signal processing or neural approximations. Our findings demonstrate that our auto-encoder approach yields superior estimates of the audio quality produced by a chain of AFXs, compared to the traditional parameter-based approach, even if the latter provides a more accurate parameter estimation

arXiv.org e-Print Archive

Design Considerations for a Digital Audio Equalizer

Author: Young Timothy
Publication venue: University of Central Florida
Publication date: 01/01/1985
Field of study

The objective of this thesis is to consider a method for designing a digital audio equalizer. The primary design criteria is minimum audible frequency response error between a digital and a reference analog equalizer throughout the entire audio frequency range from 20 Hz to 20 Khz. The first step is to obtain a set of analog filters that suitably represent the reference equalization. From these filters, digital filter coefficients are generated using the bilinear transformation. Then, the digital filters are combined with anti-aliasing and D/A reconstruction filters and a zero-order hold to complete the design. Analysis of methods to minimize frequency axis warping effects on the response of the high frequency filters is presented. The problems associated with realizing a filter with low natural frequency and a very high sample rate is also studied

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Powering the future: a comprehensive review of battery energy storage systems

Author: Attique Qamar Muhammad
Domínguez-García José Luis
Filbà Martínez Àlber
Gevorkov Levon
Obrador Rey Sergio
Romero Baena Juan Alberto
Sánchez Roger Xavier
Trilla Romero Lluís
Publication venue
Publication date: 01/09/2023
Field of study

Global society is significantly speeding up the adoption of renewable energy sources and their integration into the current existing grid in order to counteract growing environmental problems, particularly the increased carbon dioxide emission of the last century. Renewable energy sources have a tremendous potential to reduce carbon dioxide emissions because they practically never produce any carbon dioxide or other pollutants. On the other hand, these energy sources are usually influenced by geographical location, weather, and other factors that are of stochastic nature. The battery energy storage system can be applied to store the energy produced by RESs and then utilized regularly and within limits as necessary to lessen the impact of the intermittent nature of renewable energy sources. The main purpose of the review paper is to present the current state of the art of battery energy storage systems and identify their advantages and disadvantages. At the same time, this helps researchers and engineers in the field to find out the most appropriate configuration for a particular application. This study offers a thorough analysis of the battery energy storage system with regard to battery chemistries, power electronics, and management approaches. This paper also offers a detailed analysis of battery energy storage system applications and investigates the shortcomings of the current best battery energy storage system architectures to pinpoint areas that require further study.This publication is part of the project TED2021-132864A-I00, funded by MCIN/ AEI/10.13039/501100011033 and by the European Union “NextGenerationEU”/PRTR”.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Smart Sound Control in Acoustic Sensor Networks: a Perceptual Perspective

Author: Estreder Campos Juan
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 28/03/2022
Field of study

[ES] Los sistemas de audio han experimentado un gran desarrollo en los últimos años gracias al aumento de dispositivos con procesadores de alto rendimiento capaces de realizar un procesamiento cada vez más eficiente. Además, las comunicaciones inalámbricas permiten a los dispositivos de una red estar ubicados en diferentes lugares sin limitaciones físicas. La combinación de estas tecnologías ha dado lugar a la aparición de las redes de sensores acústicos (ASN). Una ASN está compuesta por nodos equipados con transductores de audio, como micrófonos o altavoces. En el caso de la monitorización acústica del campo, sólo es necesario incorporar sensores acústicos a los nodos ASN. Sin embargo, en el caso de las aplicaciones de control, los nodos deben interactuar con el campo acústico a través de altavoces. La ASN puede implementarse mediante dispositivos de bajo coste, como Raspberry Pi o dispositivos móviles, capaces de gestionar varios micrófonos y altavoces y de ofrecer una buena capacidad de cálculo. Además, estos dispositivos pueden comunicarse mediante conexiones inalámbricas, como Wi-Fi o Bluetooth. Por lo tanto, en esta tesis, se propone una ASN compuesta por dispositivos móviles conectados a altavoces inalámbricos mediante un enlace Bluetooth. Además, el problema de la sincronización entre los dispositivos de una ASN es uno de los principales retos a abordar, ya que el rendimiento del procesamiento de audio es muy sensible a la falta de sincronismo. Por lo tanto, también se lleva a cabo un análisis del problema de sincronización entre dispositivos conectados a altavoces inalámbricos en una ASN. En este sentido, una de las principales aportaciones es el análisis de la latencia de audio cuando los nodos acústicos de la ASN están formados por dispositivos móviles que se comunican altavoces mediante enlaces Bluetooth. Una segunda contribución significativa de esta tesis es la implementación de un método para sincronizar los diferentes dispositivos de una ASN, junto con un estudio de sus limitaciones. Por último, se ha introducido el método propuesto para implementar aplicaciones de zonas sonoras personales (PSZ). Por lo tanto, la implementación y el análisis del rendimiento de diferentes aplicaciones de audio sobre una ASN compuesta por dispositivos móviles y altavoces inalámbricos es también una contribución significativa en el área de las ASN. Cuando el entorno acústico afecta negativamente a la percepción de la señal de audio emitida por los altavoces de la ASN, se utilizan técnicas de ecualización para mejorar la percepción de la señal de audio. Para ello, en esta tesis se implementa un sistema de ecualización inteligente. Para ello, se emplean algoritmos psicoacústicos para implementar un procesamiento inteligente basado en el sistema auditivo humano capaz de adaptarse a los cambios del entorno. Por ello, otra contribución importante de esta tesis es el análisis del enmascaramiento espectral entre dos sonidos complejos. Este análisis permitirá calcular el umbral de enmascaramiento de un sonido con más precisión que los métodos utilizados actualmente. Este método se utiliza para implementar una aplicación de ecualización perceptiva que pretende mejorar la percepción de la señal de audio en presencia de un ruido ambiental. Para ello, esta tesis propone dos algoritmos de ecualización diferentes: 1) la pre-ecualización de la señal de audio para que se perciba por encima del umbral de enmascaramiento del ruido ambiental y 2) diseñar un control de ruido ambiental perceptivo en los sistemas de ecualización activa de ruido (ANE), para que el nivel de ruido ambiental percibido esté por debajo del umbral de enmascaramiento de la señal de audio. Por lo tanto, la ultima aportación de esta tesis es la implementación de una aplicación de ecualización perceptiva con los dos diferentes algoritmos de ecualización embebidos y el análisis de su rendimiento a través del banco de pruebas realizado en el laboratorio GTAC-iTEAM.[CA] El sistemes de so han experimentat un gran desenvolupament en els últims anys gràcies a l'augment de dispositius amb processadors d'alt rendiment capaços de realitzar un processament d'àudio cada vegada més eficient. D'altra banda, l'expansió de les comunicacions inalàmbriques ha permès implementar xarxes en les quals els dispositius poden estar situats a diferents llocs sense limitacions físiques. La combinació d'aquestes tecnologies ha donat lloc a l'aparició de les xarxes de sensors acústics (ASN). Una ASN està composta per nodes equipats amb transductors d'àudio, com micr`ofons o altaveus. En el cas del monitoratge del camp acústic, només cal incorporar sensors acústics als nodes de l'ASN. No obstant això, en el cas de les aplicacions de control, els nodes han d'interactuar amb el camp acústic a través d'altaveus. Una ASN pot implementar-se mitjant¿cant dispositius de baix cost, com ara Raspberry Pi o dispositius mòbils, capaços de gestionar diversos micròfons i altaveus i d'oferir una bona capacitat computacional. A més, aquests dispositius poden comunicar-se a través de connexions inalàmbriques, com Wi-Fi o Bluetooth. Per això, en aquesta tesi es proposa una ASN composta per dispositius mòbils connectats a altaveus inalàmbrics a través d'un enllaç Bluetooth. El problema de la sincronització entre els dispositius d'una ASN és un dels principals reptes a abordar ja que el rendiment del processament d'àudio és molt sensible a la falta de sincronisme. Per tant, també es duu a terme una anàlisi profunda del problema de la sincronització entre els dispositius comercials connectats als altaveus inalàmbrics en una ASN. En aquest sentit, una de les principals contribucions és l'anàlisi de la latència d'àudio quan els nodes acústics en l'ASN estan compostos per dispositius mòbils que es comuniquen amb els altaveus corresponents mitjançant enllaços Bluetooth. Una segona contribuciò significativa d'aquesta tesi és la implementació d'un mètode per sincronitzar els diferents dispositius d'una ASN, juntament amb un estudi de les seves limitacions. Finalment, s'ha introduït el mètode proposat per implementar aplicacions de zones de so personal. Per tant, la implementació i l'anàlisi del rendiment de diferents aplicacions d'àudio sobre una ASN composta per dispositius mòbils i altaveus inalàmbrics és també una contribució significativa a l'àrea de les ASN. Quan l'entorn acústic afecta negativament a la percepció del senyal d'àudio emesa pels altaveus de l'ASN, es fan servir tècniques d'equalització per a millorar la percepció del senyal d'àudio. En consequència, en aquesta tesi s'implementa un sistema d'equalització intel·ligent. Per això, s'utilitzen algoritmes psicoacústics per implementar un processament intel·ligent basat en el sistema auditiu humà capaç d'adaptar-se als canvis de l'entorn. Per aquest motiu, una altra contribució important d'aquesta tesi és l'anàlisi de l'emmascarament espectral entre dos sons complexos. Aquesta anàlisi permetrà calcular el llindar d'emmascarament d'un so sobre amb més precisió que els mètodes utilitzats actualment. Aquest mètode s'utilitza per a implementar una aplicació d'equalització perceptual que pretén millorar la percepció del senyal d'àudio en presència d'un soroll ambiental. Per això, aquesta tesi proposa dos algoritmes d'equalització diferents: 1) la preequalització del senyal d'àudio perquè es percebi per damunt del llindar d'emmascarament del soroll ambiental i 2) dissenyar un control de soroll ambiental perceptiu en els sistemes d'equalització activa de soroll (ANE) de manera que el nivell de soroll ambiental percebut estiga per davall del llindar d'emmascarament del senyal d'àudio. Per tant, l'última aportació d'aquesta tesi és la implementació d'una aplicació d'equalització perceptiva amb els dos algoritmes d'equalització embeguts i l'anàlisi del seu rendiment a través del banc de proves realitzat al laboratori GTAC-iTEAM.[EN] Audio systems have been extensively developed in recent years thanks to the increase of devices with high-performance processors able to perform more efficient processing. In addition, wireless communications allow devices in a network to be located in different places without physical limitations. The combination of these technologies has led to the emergence of Acoustic Sensor Networks (ASN). An ASN is composed of nodes equipped with audio transducers, such as microphones or speakers. In the case of acoustic field monitoring, only acoustic sensors need to be incorporated into the ASN nodes. However, in the case of control applications, the nodes must interact with the acoustic field through loudspeakers. ASN can be implemented through low-cost devices, such as Raspberry Pi or mobile devices, capable of managing multiple microphones and loudspeakers and offering good computational capacity. In addition, these devices can communicate through wireless connections, such as Wi-Fi or Bluetooth. Therefore, in this dissertation, an ASN composed of mobile devices connected to wireless speakers through a Bluetooth link is proposed. Additionally, the problem of synchronization between the devices in an ASN is one of the main challenges to be addressed since the audio processing performance is very sensitive to the lack of synchronism. Therefore, an analysis of the synchronization problem between devices connected to wireless speakers in an ASN is also carried out. In this regard, one of the main contributions is the analysis of the audio latency of mobile devices when the acoustic nodes in the ASN are comprised of mobile devices communicating with the corresponding loudspeakers through Bluetooth links. A second significant contribution of this dissertation is the implementation of a method to synchronize the different devices of an ASN, together with a study of its limitations. Finally, the proposed method has been introduced in order to implement personal sound zones (PSZ) applications. Therefore, the implementation and analysis of the performance of different audio applications over an ASN composed of mobile devices and wireless speakers is also a significant contribution in the area of ASN. In cases where the acoustic environment negatively affects the perception of the audio signal emitted by the ASN loudspeakers, equalization techniques are used with the objective of enhancing the perception threshold of the audio signal. For this purpose, a smart equalization system is implemented in this dissertation. In this regard, psychoacoustic algorithms are employed to implement a smart processing based on the human hearing system capable of adapting to changes in the environment. Therefore, another important contribution of this thesis focuses on the analysis of the spectral masking between two complex sounds. This analysis will allow to calculate the masking threshold of one sound over the other in a more accurate way than the currently used methods. This method is used to implement a perceptual equalization application that aims to improve the perception threshold of the audio signal in presence of ambient noise. To this end, this thesis proposes two different equalization algorithms: 1) pre-equalizing the audio signal so that it is perceived above the ambient noise masking threshold and 2) designing a perceptual control of ambient noise in active noise equalization (ANE) systems, so that the perceived ambient noise level is below the masking threshold of the audio signal. Therefore, the last contribution of this dissertation is the implementation of a perceptual equalization application with the two different embedded equalization algorithms and the analysis of their performance through the testbed carried out in the GTAC-iTEAM laboratory.This work has received financial support of the following projects: • SSPRESING: Smart Sound Processing for the Digital Living (Reference: TEC2015-67387-C4-1-R. Entity: Ministerio de Economia y Empresa. Spain). • FPI: Ayudas para contratos predoctorales para la formación de doctores (Reference: BES-2016-077899. Entity: Agencia Estatal de Investigación. Spain). DANCE: Dynamic Acoustic Networks for Changing Environments (Reference: RTI2018-098085-B-C41-AR. Entity: Agencia Estatal de Investigación. Spain). • DNOISE: Distributed Network of Active Noise Equalizers for Multi-User Sound Control (Reference: H2020-FETOPEN-4-2016-2017. Entity: I+D Colaborativa competitiva. Comisión de las comunidades europea).Estreder Campos, J. (2022). Smart Sound Control in Acoustic Sensor Networks: a Perceptual Perspective [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181597TESI

RiuNet

Differentiable Artificial Reverberation

Author: Choi Hyeong-Seok
Lee Kyogu
Lee Sungho
Publication venue
Publication date: 09/11/2021
Field of study

Artificial reverberation (AR) models play a central role in various audio applications. Therefore, estimating the AR model parameters (ARPs) of a target reverberation is a crucial task. Although a few recent deep-learning-based approaches have shown promising performance, their non-end-to-end training scheme prevents them from fully exploiting the potential of deep neural networks. This motivates to introduce differentiable artificial reverberation (DAR) models which allows loss gradients to be back-propagated end-to-end. However, implementing the AR models with their difference equations "as is" in the deep-learning framework severely bottlenecks the training speed when executed with a parallel processor like GPU due to their infinite impulse response (IIR) components. We tackle this problem by replacing the IIR filters with finite impulse response (FIR) approximations with the frequency-sampling method (FSM). Using the FSM, we implement three DAR models -- differentiable Filtered Velvet Noise (FVN), Advanced Filtered Velvet Noise (AFVN), and Feedback Delay Network (FDN). For each AR model, we train its ARP estimation networks for analysis-synthesis (RIR-to-ARP) and blind estimation (reverberant-speech-to-ARP) task in an end-to-end manner with its DAR model counterpart. Experiment results show that the proposed method achieves consistent performance improvement over the non-end-to-end approaches in both objective metrics and subjective listening test results.Comment: Manuscript submitted to TASL

arXiv.org e-Print Archive

Solutions for New Terrestrial Broadcasting Systems Offering Simultaneously Stationary and Mobile Services

Author: Montalbán Sánchez Jon
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 12/12/2014
Field of study

221 p.[EN]Since the first broadcasted TV signal was transmitted in the early decades of the past century, the television broadcasting industry has experienced a series of dramatic changes. Most recently, following the evolution from analogue to digital systems, the digital dividend has become one of the main concerns of the broadcasting industry. In fact, there are many international spectrum authorities reclaiming part of the broadcasting spectrum to satisfy the growing demand of other services, such as broadband wireless services, arguing that the TV services are not very spectrum-efficient. Apart from that, it must be taken into account that, even if up to now the mobile broadcasting has not been considered a major requirement, this will probably change in the near future. In fact, it is expected that the global mobile data traffic will increase 11-fold between 2014 and 2018, and what is more, over two thirds of the data traffic will be video stream by the end of that period. Therefore, the capability to receive HD services anywhere with a mobile device is going to be a mandatory requirement for any new generation broadcasting system. The main objective of this work is to present several technical solutions that answer to these challenges. In particular, the main questions to be solved are the spectrum efficiency issue and the increasing user expectations of receiving high quality mobile services. In other words, the main objective is to provide technical solutions for an efficient and flexible usage of the terrestrial broadcasting spectrum for both stationary and mobile services. The first contributions of this scientific work are closely related to the study of the mobile broadcast reception. Firstly, a comprehensive mathematical analysis of the OFDM signal behaviour over time-varying channels is presented. In order to maximize the channel capacity in mobile environments, channel estimation and equalization are studied in depth. First, the most implemented equalization solutions in time-varying scenarios are analyzed, and then, based on these existing techniques, a new equalization algorithm is proposed for enhancing the receivers’ performance. An alternative solution for improving the efficiency under mobile channel conditions is treating the Inter Carrier Interference as another noise source. Specifically, after analyzing the ICI impact and the existing solutions for reducing the ICI penalty, a new approach based on the robustness of FEC codes is presented. This new approach employs one dimensional algorithms at the receiver and entrusts the ICI removing task to the robust forward error correction codes. Finally, another major contribution of this work is the presentation of the Layer Division Multiplexing (LDM) as a spectrum-efficient and flexible solution for offering stationary and mobile services simultaneously. The comprehensive theoretical study developed here verifies the improved spectrum efficiency, whereas the included practical validation confirms the feasibility of the system and presents it as a very promising multiplexing technique, which will surely be a strong candidate for the next generation broadcasting services.[ES]Desde el comienzo de la transmisión de las primeras señales de televisión a principios del siglo pasado, la radiodifusión digital ha evolucionado gracias a una serie de cambios relevantes. Recientemente, como consecuencia directa de la digitalización del servicio, el dividendo digital se ha convertido en uno de los caballos de batalla de la industria de la radiodifusión. De hecho, no son pocos los consorcios internacionales que abogan por asignar parte del espectro de radiodifusión a otros servicios como, por ejemplo, la telefonía móvil, argumentado la poca eficiencia espectral de la tecnología de radiodifusión actual. Asimismo, se debe tener en cuenta que a pesar de que los servicios móviles no se han considerado fundamentales en el pasado, esta tendencia probablemente variará en el futuro cercano. De hecho, se espera que el tráfico derivado de servicios móviles se multiplique por once entre los años 2014 y 2018; y lo que es más importante, se pronostica que dos tercios del tráfico móvil sea video streaming para finales de ese periodo. Por lo tanto, la posibilidad de ofrecer servicios de alta definición en dispositivos móviles es un requisito fundamental para los sistemas de radiodifusión de nueva generación. El principal objetivo de este trabajo es presentar soluciones técnicas que den respuesta a los retos planteados anteriormente. En particular, las principales cuestiones a resolver son la ineficiencia espectral y el incremento de usuarios que demandan mayor calidad en los contenidos para dispositivos móviles. En pocas palabras, el principal objetivo de este trabajo se basa en ofrecer una solución más eficiente y flexible para la transmisión simultánea de servicios fijos y móviles. La primera contribución relevante de este trabajo está relacionada con la recepción de la señal de televisión en movimiento. En primer lugar, se presenta un completo análisis matemático del comportamiento de la señal OFDM en canales variantes con el tiempo. A continuación, con la intención de maximizar la capacidad del canal, se estudian en profundidad los algoritmos de estimación y ecualización. Posteriormente, se analizan los algoritmos de ecualización más implementados, y por último, basándose en estas técnicas, se propone un nuevo algoritmo de ecualización para aumentar el rendimiento de los receptores en tales condiciones. Del mismo modo, se plantea un nuevo enfoque para mejorar la eficiencia de los servicios móviles basado en tratar la interferencia entre portadoras como una fuente de ruido. Concretamente, tras analizar el impacto del ICI en los receptores actuales, se sugiere delegar el trabajo de corrección de dichas distorsiones en códigos FEC muy robustos. Finalmente, la última contribución importante de este trabajo es la presentación de la tecnología LDM como una manera más eficiente y flexible para la transmisión simultánea de servicios fijos y móviles. El análisis teórico presentado confirma el incremento en la eficiencia espectral, mientras que el estudio práctico valida la posible implementación del sistema y presenta la tecnología LDM c

Archivo Digital para la Docencia y la Investigación

Automatic Calibration and Equalization of a Line Array System

Author: Vidal Wagner Fernando
Publication venue
Publication date: 24/08/2015
Field of study

This ﬁnal project presents an automated Public Address processing unit, using delay and magnitude frequency response adjustment. The aim is to achieve a ﬂat frequency response and delay adjustment between different physically-placed speakers at the measuring point, which is nowadays usually made manually by the sound technician. The adjustment is obtained using four signal processing operations to the audio signal: time delay adjustment, crossover ﬁltering, gain adjustment, and graphic equalization. The automation is in the calculation of different parameter sets: estimation of the time delay, the selection of a suitable crossover frequency, and calculation of the gains for a third-octave graphic equalizer. These automatic methods reduce time and effort in the calibration of line-array PA systems, since only three sine sweeps must be played though the sound system. For verifying the functioning of the system, both simulated signals and measurements have been conducted. A 1:10 scale model of a line array system has been designed and constructed in an anechoic chamber to test the automatic calibration and equalization methods and the results are analyzed

Aaltodoc Publication Archive

Smart attendance monitoring system using computer vision.

Author: Mothwa Louis.
Publication venue
Publication date: 01/01/2019
Field of study

Masters Degree. University of KwaZulu-Natal, Durban.Monitoring of student’s attendance remains the fundamental and vital part of any educational institution. The attendance of students to classes can have an impact on their academic performance. With the gradual increase in the number of students, it becomes a challenge for institutions to manage their attendance. The traditional attendance monitoring system requires considerable amount of time due to manual recording of names and circulation of the paper-based attendance sheet for students to sign their names. The paper-based attendance recording method and some existing automated systems such as mobile applications, Radio Frequency Identification (RFID), Bluetooth, and fingerprint attendance models are prone to fake results and time wasting. The limitations of the traditional attendance monitoring system stimulated the adoption of computer vision to stand in the gap. Student’s attendance can be monitored with biometric candidate’s systems such as iris recognition system and face recognition system. Among these, face recognition have a greater potential because of its non-intrusive nature. Although some automated attendance monitoring systems have been proposed, poor system modelling negatively affects the systems. In order to improve success of the automated systems, this research proposes the smart attendance monitoring system that uses facial recognition to monitor student’s attendance in a classroom. A time integrated model is provided to monitor student’s attendance throughout the lecture period by registering the attendance information at regular time intervals. Multi-camera system is also proposed to guarantee an accurate capturing of students. The proposed multi-camera based system is tested using a real-time database in an experimental class from the University of KwaZulu-Natal (UKZN). The results show that the proposed smart attendance monitoring System is reliable, with the average accuracy rate of 98%.Examiner's copy of thesis

ResearchSpace@UKZN