247 research outputs found

    Filter Optimization for Personal Sound Zones Systems

    Full text link
    [ES] Los sistemas de zonas de sonido personal (o sus siglas en inglés PSZ) utilizan altavoces y técnicas de procesado de señal para reproducir sonidos distintos en diferentes zonas de un mismo espacio compartido. Estos sistemas se han popularizado en los últimos años debido a la amplia gama de aplicaciones que podrían verse beneficiadas por la generación de zonas de escucha individuales. El diseño de los filtros utilizados para procesar las señales de sonido es uno de los aspectos más importantes de los sistemas PSZ, al menos para las frecuencias bajas y medias. En la literatura se han propuesto diversos algoritmos para calcular estos filtros, cada uno de ellos con sus ventajas e inconvenientes. En el presente trabajo se revisan los algoritmos para sistemas PSZ propuestos en la literatura y se evalúa experimentalmente su rendimiento en un entorno reverberante. Los distintos algoritmos se comparan teniendo en cuenta aspectos como el aislamiento acústico entre zonas, el error de reproducción, la energía de los filtros y el retardo del sistema. Además, se estudian estrategias computacionalmente eficientes para obtener los filtros y también se compara su complejidad computacional. Los resultados experimentales obtenidos revelan que las soluciones existentes no pueden ofrecer una complejidad computacional baja y al mismo tiempo un buen rendimiento con baja latencia. Por ello se propone un nuevo algoritmo basado en el filtrado subbanda, y se demuestra experimentalmente que este algoritmo mitiga las limitaciones de los algoritmos existentes. Asimismo, este algoritmo ofrece una mayor versatilidad que los algoritmos existentes, ya que se pueden utilizar configuraciones distintas en cada subbanda, como por ejemplo, diferentes longitudes de filtro o distintos conjuntos de altavoces. Por último, se estudia la influencia de las respuestas objetivo en la optimización de los filtros y se propone un nuevo método en el que se aplica una ventana temporal a estas respuestas. El método propuesto se evalúa experimentalmente en dos salas con diferentes tiempos de reverberación y los resultados obtenidos muestran que se puede reducir la energía de las interferencias entre zonas gracias al efecto de la ventana temporal.[CA] Els sistemes de zones de so personal (o les seves sigles en anglés PSZ) fan servir altaveus i tècniques de processament de senyal per a reproduir sons distints en diferents zones d'un mateix espai compartit. Aquests sistemes s'han popularitzat en els últims anys a causa de l'àmplia gamma d'aplicacions que podrien veure's beneficiades per la generació de zones d'escolta individuals. El disseny dels filtres utilitzats per a processar els senyals de so és un dels aspectes més importants dels sistemes PSZ, particularment per a les freqüències baixes i mitjanes. En la literatura s'han proposat diversos algoritmes per a calcular aquests filtres, cadascun d'ells amb els seus avantatges i inconvenients. En aquest treball es revisen els algoritmes proposats en la literatura per a sistemes PSZ i s'avalua experimentalment el seu rendiment en un entorn reverberant. Els distints algoritmes es comparen tenint en compte aspectes com l'aïllament acústic entre zones, l'error de reproducció, l'energia dels filtres i el retard del sistema. A més, s'estudien estratègies de còmput eficient per obtindre els filtres i també es comparen les seves complexitats computacionals. Els resultats experimentals obtinguts revelen que les solucions existents no poder oferir al mateix temps una complexitat computacional baixa i un bon rendiment amb latència baixa. Per això es proposa un nou algoritme basat en el filtrat subbanda que mitiga aquestes limitacions. A més, l'algoritme proposat ofereix una major versatilitat que els algoritmes existents, ja que en cada subbanda el sistema pot utilitzar configuracions diferents, com per exemple, distintes longituds de filtre o distints conjunts d'altaveus. L'algoritme proposat s'avalua experimentalment en un entorn reverberant, i es mostra com pot mitigar satisfactòriament les limitacions dels algoritmes existents. Finalment, s'estudia la influència de les respostes objectiu en l'optimització dels filtres i es proposa un nou mètode en el que s'aplica una finestra temporal a les respostes objectiu. El mètode proposat s'avalua experimentalment en dues sales amb diferents temps de reverberació i els resultats obtinguts mostren que es pot reduir el nivell d'interferència entre zones grècies a l'efecte de la finestra temporal.[EN] Personal Sound Zones (PSZ) systems deliver different sounds to a number of listeners sharing an acoustic space through the use of loudspeakers together with signal processing techniques. These systems have attracted a lot of attention in recent years because of the wide range of applications that would benefit from the generation of individual listening zones, e.g., domestic or automotive audio applications. A key aspect of PSZ systems, at least for low and mid frequencies, is the optimization of the filters used to process the sound signals. Different algorithms have been proposed in the literature for computing those filters, each exhibiting some advantages and disadvantages. In this work, the state-of-the-art algorithms for PSZ systems are reviewed, and their performance in a reverberant environment is evaluated. Aspects such as the acoustic isolation between zones, the reproduction error, the energy of the filters, and the delay of the system are considered in the evaluations. Furthermore, computationally efficient strategies to obtain the filters are studied, and their computational complexity is compared too. The performance and computational evaluations reveal the main limitations of the state-of-the-art algorithms. In particular, the existing solutions can not offer low computational complexity and at the same time good performance for short system delays. Thus, a novel algorithm based on subband filtering that mitigates these limitations is proposed for PSZ systems. In addition, the proposed algorithm offers more versatility than the existing algorithms, since different system configurations, such as different filter lengths or sets of loudspeakers, can be used in each subband. The proposed algorithm is experimentally evaluated and tested in a reverberant environment, and its efficacy to mitigate the limitations of the existing solutions is demonstrated. Finally, the effect of the target responses in the optimization is discussed, and a novel approach that is based on windowing the target responses is proposed. The proposed approach is experimentally evaluated in two rooms with different reverberation levels. The evaluation results reveal that an appropriate windowing of the target responses can reduce the interference level between zones.Molés Cases, V. (2022). Filter Optimization for Personal Sound Zones Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18611

    Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain

    Get PDF
    Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Model-based Filtering of Interfering Signals in Ultrasonic Time Delay Estimations

    Get PDF
    This work presents model-based algorithmic approaches for interference-invariant time delay estimation, which are specifically suited for the estimation of small time delay differences with a necessary resolution well below the sampling time. Therefore, the methods can be applied particularly well for transit-time ultrasonic flow measurements, since the problem of interfering signals is especially prominent in this application

    Model-based Filtering of Interfering Signals in Ultrasonic Time Delay Estimations

    Get PDF
    In dieser Arbeit werden modellbasierte algorithmische Ansätze zur Interferenz-invarianten Zeitverschiebungsschätzung vorgestellt, die speziell für die Schätzung kleiner Zeitverschiebungsdifferenzen mit einer notwendigen Auflösung, die deutlich unterhalb der Abtastzeit liegt, geeignet sind. Daher lassen sich die Verfahren besonders gut auf die Laufzeit-basierte Ultraschalldurchflussmessung anwenden, da hier das Problem der Interferenzsignale besonders ausgeprägt ist. Das Hauptaugenmerk liegt auf der Frage, wie mehrere Messungen mit unterschiedlichen Zeitverschiebungen oder Prozessparametern zur Unterdrückung der Interferenzsignale in Ultraschalldurchflussmessungen verwendet werden können, wobei eine gute Robustheit gegenüber additivem weißen Gauß\u27schen Rauschen und eine hohe Auflösung erhalten bleiben sollen. Zu diesem Zweck wird ein Signalmodell angenommen, welches aus stationären Interferenzsignalen, die nicht von wechselnden Zeitverschiebungen abhängig sind, und aus Zielsignalen, die den Messeffekt enthalten, besteht. Zunächst wird das Signalmodell einer Ultraschalldurchflussmessung und sein dynamisches Verhalten bei Temperatur- oder Zeitverschiebungsschwankungen untersucht. Ziel ist es, valide Simulationsdatensätze zu erzeugen, mit denen die entwickelten Methoden sowohl unter der Prämisse, dass die Daten perfekt zum Signalmodell passen, als auch unter der Prämisse, dass Modellfehler vorliegen, getestet werden können. Dabei werden die Eigenschaften der Signalmodellkomponenten, wie Bandbreite, Stationarität und Temperaturabhängigkeit, identifiziert. Zu diesem Zweck wird eine neue Methode zur Modellierung der Temperaturabhängigkeit der Interferenzsignale vorgestellt. Nach der Charakterisierung des gesamten Messsystems wird das Signalmodell -- angepasst an die Ultraschalldurchflussmessung -- als Grundlage für zwei neue Methoden verwendet, deren Ziel es ist, die Auswirkungen der Interferenzsignale zu reduzieren. Die erste vorgeschlagene Technik erweitert die auf der Signaldynamik basierenden Ansätze in der Literatur, indem sie die Voraussetzungen für die erforderliche Varianz der Zeitverschiebungen abschwächt. Zu diesem Zweck wird eine neue Darstellung von mehreren Messsignalen als Punktwolken eingeführt. Die Punktwolken werden dann mithilfe der Hauptkomponentenanalyse und B-Splines verarbeitet, was entweder zu Interferenz-invarianten Zeitverschiebungsschätzungen oder geschätzten Interferenzsignalen führt. In diesem Zusammenhang wird eine neuartige gemeinsame B-Spline- und Registrierungsschätzung entwickelt, um die Robustheit zu erhöhen. Der zweite Ansatz besteht in einer regressionsbasierten Schätzung der Zeitverschiebungsdifferenzen durch das Erlernen angepasster Signalunterräume. Diese Unterräume werden effizient durch die Analytische Wavelet Packet Transformation berechnet, bevor die resultierenden Koeffizienten in Merkmale transformiert werden, die gut mit den Zeitverschiebungssdifferenzen korrelieren. Darüber hinaus wird ein neuartiger, unbeaufsichtigter Unterraum-Trainingsansatz vorgeschlagen und mit den konventionellen Filter- und Wrapper-basierten Merkmalsauswahlmethoden verglichen. Schließlich werden beide Methoden in einem experimentellen Ultraschalldurchflussmesssystem mit einem hohen Maß an vorhandenen Interferenzsignalen getestet, wobei sich zeigt, dass sie in den meisten Fällen den Methoden aus der Literatur überlegen sind. Die Qualität der Methoden wird anhand der Genauigkeit der Zeitverschiebungsschätzung bewertet, da die Grundwahrheit für die Interferenzsignale nicht zuverlässig bestimmt werden kann. Anhand verschiedener Datensätze werden die Abhängigkeiten von den Hyperparametern, den Prozessbedingungen und, im Falle der regressionsbasierten Methode, dem Trainingsdatensatz analysiert

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Model-based Filtering of Interfering Signals in Ultrasonic Time Delay Estimations

    Get PDF
    This work presents model-based algorithmic approaches for interference-invariant time delay estimation, which are specifically suited for the estimation of small time delay differences with a necessary resolution well below the sampling time. Therefore, the methods can be applied particularly well for transit-time ultrasonic flow measurements, since the problem of interfering signals is especially prominent in this application

    Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

    Get PDF
    Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

    Multichannel Online Blind Speech Dereverberation with Marginalization of Static Observation Parameters in a Rao-Blackwellized Particle Filter

    Get PDF
    Room reverberation leads to reduced intelligibility of audio signals and spectral coloration of audio signals. Enhancement of acoustic signals is thus crucial for high-quality audio and scene analysis applications. Multiple sensors can be used to exploit statistical evidence from multiple observations of the same event to improve enhancement. Whilst traditional beamforming techniques suffer from interfering reverberant reflections with the beam path, other approaches to dereverberation often require at least partial knowledge of the room impulse response which is not available in practice, or rely on inverse filtering of a channel estimate to obtain a clean speech estimate, resulting in difficulties with non-minimum phase acoustic impulse responses. This paper proposes a multi-sensor approach to blind dereverberation in which both the source signal and acoustic channel are directly estimated from the distorted observations using their optimal estimators. The remaining model parameters are sampled from hypothesis distributions using a particle filter, thus facilitating real-time dereverberation. This approach was previously successfully applied to single-sensor blind dereverberation. In this paper, the single-channel approach is extended to multiple sensors. Performance improvements due to the use of multiple sensors are demonstrated on synthetic and baseband speech examples

    A room acoustics measurement system using non-invasive microphone arrays

    Get PDF
    This thesis summarises research into adaptive room correction for small rooms and pre-recorded material, for example music of films. A measurement system to predict the sound at a remote location within a room, without a microphone at that location was investigated. This would allow the sound within a room to be adaptively manipulated to ensure that all listeners received optimum sound, therefore increasing their enjoyment. The solution presented used small microphone arrays, mounted on the room's walls. A unique geometry and processing system was designed, incorporating three processing stages, temporal, spatial and spectral. The temporal processing identifies individual reflection arrival times from the recorded data. Spatial processing estimates the angles of arrival of the reflections so that the three-dimensional coordinates of the reflections' origin can be calculated. The spectral processing then estimates the frequency response of the reflection. These estimates allow a mathematical model of the room to be calculated, based on the acoustic measurements made in the actual room. The model can then be used to predict the sound at different locations within the room. A simulated model of a room was produced to allow fast development of algorithms. Measurements in real rooms were then conducted and analysed to verify the theoretical models developed and to aid further development of the system. Results from these measurements and simulations, for each processing stage are presented
    corecore