10 research outputs found

    Subband adaptive filtering for acoustic echo control using allpass polyphase IIR filterbanks

    No full text
    Published versio

    Design and multiplierless implementation of two-channel biorthogonal IIR filter banks with low system delay

    Get PDF
    An efficient method for the design of low-delay two-channel, perfect reconstruction IIR filter banks is proposed. The design problem is formulated in terms of minimax designs of a general stable IIR filter that can be obtained using semidefinite programming and an FIR filter that can be obtained using the Remez exchange algorithm. A multiplierless implementation on this filter bank is also proposed and investigated.published_or_final_versio

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Generalized linear-in-parameter models : theory and audio signal processing applications

    Get PDF
    This thesis presents a mathematically oriented perspective to some basic concepts of digital signal processing. A general framework for the development of alternative signal and system representations is attained by defining a generalized linear-in-parameter model (GLM) configuration. The GLM provides a direct view into the origins of many familiar methods in signal processing, implying a variety of generalizations, and it serves as a natural introduction to rational orthonormal model structures. In particular, the conventional division between finite impulse response (FIR) and infinite impulse response (IIR) filtering methods is reconsidered. The latter part of the thesis consists of audio oriented case studies, including loudspeaker equalization, musical instrument body modeling, and room response modeling. The proposed collection of IIR filter design techniques is submitted to challenging modeling tasks. The most important practical contribution of this thesis is the introduction of a procedure for the optimization of rational orthonormal filter structures, called the BU-method. More generally, the BU-method and its variants, including the (complex) warped extension, the (C)WBU-method, can be consider as entirely new IIR filter design strategies.reviewe

    Channelization for Multi-Standard Software-Defined Radio Base Stations

    Get PDF
    As the number of radio standards increase and spectrum resources come under more pressure, it becomes ever less efficient to reserve bands of spectrum for exclusive use by a single radio standard. Therefore, this work focuses on channelization structures compatible with spectrum sharing among multiple wireless standards and dynamic spectrum allocation in particular. A channelizer extracts independent communication channels from a wideband signal, and is one of the most computationally expensive components in a communications receiver. This work specifically focuses on non-uniform channelizers suitable for multi-standard Software-Defined Radio (SDR) base stations in general and public mobile radio base stations in particular. A comprehensive evaluation of non-uniform channelizers (existing and developed during the course of this work) shows that parallel and recombined variants of the Generalised Discrete Fourier Transform Modulated Filter Bank (GDFT-FB) represent the best trade-off between computational load and flexibility for dynamic spectrum allocation. Nevertheless, for base station applications (with many channels) very high filter orders may be required, making the channelizers difficult to physically implement. To mitigate this problem, multi-stage filtering techniques are applied to the GDFT-FB. It is shown that these multi-stage designs can significantly reduce the filter orders and number of operations required by the GDFT-FB. An alternative approach, applying frequency response masking techniques to the GDFT-FB prototype filter design, leads to even bigger reductions in the number of coefficients, but computational load is only reduced for oversampled configurations and then not as much as for the multi-stage designs. Both techniques render the implementation of GDFT-FB based non-uniform channelizers more practical. Finally, channelization solutions for some real-world spectrum sharing use cases are developed before some final physical implementation issues are considered

    Unified Theory for Biorthogonal Modulated Filter Banks

    Get PDF
    Modulated filter banks (MFBs) are practical signal decomposition tools for M -channel multirate systems. They combine high subfilter selectivity with efficient realization based on polyphase filters and block transforms. Consequently, the O(M 2 ) burden of computations in a general filter bank (FB) is reduced to O(M log2 M ) - the latter being a complexity order comparable with the FFT-like transforms.Often hiding from the plain sight, these versatile digital signal processing tools have important role in various professional and everyday life applications of information and communications technology, including audiovisual communications and media storage (e.g., audio codecs for low-energy music playback in portable devices, as well as communication waveform processing and channelization). The algorithmic efficiency implies low cost, small size, and extended battery life, bringing the devices close to our skins.The main objective of this thesis is to formulate a generalized and unified approach to the MFBs, which includes, in addition to the deep theoretical background behind these banks, both their design by using appropriate optimization techniques and efficient algorithmic realizations. The FBs discussed in this thesis are discrete-time time-frequency decomposition/reconstruction, or equivalently, analysis-synthesis systems, where the subfilters are generated through modulation from either a single or two prototype filters. The perfect reconstruction (PR) property is a particularly important characteristics of the MFBs and this is the core theme of this thesis. In the presented biorthogonal arbitrary-delay exponentially modulated filter bank (EMFB), the PR property can be maintained also for complex-valued signals.The EMFB concept is quite flexible, since it may respond to the various requirements given to a subband processing system: low-delay PR prototype design, subfilters having symmetric impulse responses, efficient algorithms, and the definition covers odd and even-stacked cosine-modulated FBs as special cases. Oversampling schemes for the subsignals prove out to be advantageous in subband processing problems requiring phase information about the localized frequency components. In addition, the MFBs have strong connections with the lapped transform (LT) theory, especially with the class of LTs grounded in parametric window functions.<br/

    Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

    Get PDF
    Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency
    corecore