43 research outputs found
Combinations of adaptive filters
Adaptive filters are at the core of many signal processing applications, ranging from acoustic noise supression to echo cancelation [1], array beamforming [2], channel equalization [3], to more recent sensor network applications in surveillance, target localization, and tracking. A trending approach in this direction is to recur to in-network distributed processing in which individual nodes implement adaptation rules and diffuse their estimation to the network [4], [5].The work of Jerónimo Arenas-García and Luis Azpicueta-Ruiz was partially supported by the Spanish Ministry of Economy and Competitiveness (under projects TEC2011-22480 and PRI-PIBIN-2011-1266. The work of Magno M.T. Silva was partially supported by CNPq under Grant 304275/2014-0 and by FAPESP under Grant 2012/24835-1. The work of Vítor H. Nascimento was partially supported by CNPq under grant 306268/2014-0 and FAPESP under grant 2014/04256-2. The work of Ali Sayed was supported in part by NSF grants CCF-1011918 and ECCS-1407712. We are grateful to the colleagues with whom we have shared discussions and coauthorship of papers along this research line, especially Prof. Aníbal R. Figueiras-Vidal
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
Dirty RF Signal Processing for Mitigation of Receiver Front-end Non-linearity
Moderne drahtlose Kommunikationssysteme stellen hohe und teilweise
gegensätzliche Anforderungen an die Hardware der Funkmodule, wie z.B.
niedriger Energieverbrauch, große Bandbreite und hohe Linearität. Die
Gewährleistung einer ausreichenden Linearität ist, neben anderen analogen
Parametern, eine Herausforderung im praktischen Design der Funkmodule. Der
Fokus der Dissertation liegt auf breitbandigen HF-Frontends für
Software-konfigurierbare Funkmodule, die seit einigen Jahren kommerziell
verfügbar sind. Die praktischen Herausforderungen und Grenzen solcher
flexiblen Funkmodule offenbaren sich vor allem im realen Experiment. Eines
der Hauptprobleme ist die Sicherstellung einer ausreichenden analogen
Performanz über einen weiten Frequenzbereich. Aus einer Vielzahl an
analogen Störeffekten behandelt die Arbeit die Analyse und Minderung von
Nichtlinearitäten in Empfängern mit direkt-umsetzender Architektur. Im
Vordergrund stehen dabei Signalverarbeitungsstrategien zur Minderung
nichtlinear verursachter Interferenz - ein Algorithmus, der besser unter
"Dirty RF"-Techniken bekannt ist. Ein digitales Verfahren nach der
Vorwärtskopplung wird durch intensive Simulationen, Messungen und
Implementierung in realer Hardware verifiziert. Um die Lücken zwischen
Theorie und praktischer Anwendbarkeit zu schließen und das Verfahren in
reale Funkmodule zu integrieren, werden verschiedene Untersuchungen
durchgeführt. Hierzu wird ein erweitertes Verhaltensmodell entwickelt, das
die Struktur direkt-umsetzender Empfänger am besten nachbildet und damit
alle Verzerrungen im HF- und Basisband erfasst. Darüber hinaus wird die
Leistungsfähigkeit des Algorithmus unter realen Funkkanal-Bedingungen
untersucht. Zusätzlich folgt die Vorstellung einer ressourceneffizienten
Echtzeit-Implementierung des Verfahrens auf einem FPGA. Abschließend
diskutiert die Arbeit verschiedene Anwendungsfelder, darunter spektrales
Sensing, robuster GSM-Empfang und GSM-basiertes Passivradar. Es wird
gezeigt, dass nichtlineare Verzerrungen erfolgreich in der digitalen
Domäne gemindert werden können, wodurch die Bitfehlerrate gestörter
modulierter Signale sinkt und der Anteil nichtlinear verursachter
Interferenz minimiert wird. Schließlich kann durch das Verfahren die
effektive Linearität des HF-Frontends stark erhöht werden. Damit wird der
zuverlässige Betrieb eines einfachen Funkmoduls unter dem Einfluss der
Empfängernichtlinearität möglich. Aufgrund des flexiblen Designs ist der
Algorithmus für breitbandige Empfänger universal einsetzbar und ist nicht
auf Software-konfigurierbare Funkmodule beschränkt.Today's wireless communication systems place high requirements on the
radio's hardware that are largely mutually exclusive, such as low power
consumption, wide bandwidth, and high linearity. Achieving a sufficient
linearity, among other analogue characteristics, is a challenging issue in
practical transceiver design. The focus of this thesis is on wideband
receiver RF front-ends for software defined radio technology, which became
commercially available in the recent years. Practical challenges and
limitations are being revealed in real-world experiments with these radios.
One of the main problems is to ensure a sufficient RF performance of the
front-end over a wide bandwidth. The thesis covers the analysis and
mitigation of receiver non-linearity of typical direct-conversion receiver
architectures, among other RF impairments. The main focus is on DSP-based
algorithms for mitigating non-linearly induced interference, an approach
also known as "Dirty RF" signal processing techniques. The conceived
digital feedforward mitigation algorithm is verified through extensive
simulations, RF measurements, and implementation in real hardware. Various
studies are carried out that bridge the gap between theory and practical
applicability of this approach, especially with the aim of integrating that
technique into real devices. To this end, an advanced baseband behavioural
model is developed that matches to direct-conversion receiver architectures
as close as possible, and thus considers all generated distortions at RF
and baseband. In addition, the algorithm's performance is verified under
challenging fading conditions. Moreover, the thesis presents a
resource-efficient real-time implementation of the proposed solution on an
FPGA. Finally, different use cases are covered in the thesis that includes
spectrum monitoring or sensing, GSM downlink reception, and GSM-based
passive radar. It is shown that non-linear distortions can be successfully
mitigated at system level in the digital domain, thereby decreasing the bit
error rate of distorted modulated signals and reducing the amount of
non-linearly induced interference. Finally, the effective linearity of the
front-end is increased substantially. Thus, the proper operation of a
low-cost radio under presence of receiver non-linearity is possible. Due to
the flexible design, the algorithm is generally applicable for wideband
receivers and is not restricted to software defined radios
Recommended from our members
Integrated Self-Interference Cancellation for Full-Duplex and Frequency-Division Duplexing Wireless Communication Systems
From wirelessly connected robots to car-to-car communications, and to smart cities, almost every aspect of our lives will benefit from future wireless communications. While promise an exciting future world, next-generation wireless communications impose requirements on the data rate, spectral efficiency, and latency (among others) that are higher than those for today's systems by several orders of magnitude.
Full-duplex wireless, an emergent wireless communications paradigm, breaks the long-held assumption that it is impossible for a wireless device to transmit and receive simultaneously at the same frequency, and has the potential to immediately double network capacity at the physical (PHY) layer and offers many other benefits (such as reduced latency) at the higher layers. Recently, discrete-component-based demonstrations have established the feasibility of full-duplex wireless. However, the realization of integrated full duplex radios, compact radios that can fit into smartphones, is fraught with fundamental challenges. In addition, to unleash the full potential of full-duplex communication, a careful redesign of the PHY layer and the medium access control (MAC) layer using a cross-layer approach is required.
The biggest challenge associated with full duplex wireless is the tremendous amount of transmitter self-interference right on top of the desired signal. In this dissertation, new self-interference-cancellation approaches at both system and circuit levels are presented, contributing towards the realization of full-duplex radios using integrated circuit technology. Specifically, these new approaches involve elimination of the noise and distortion of the cancellation circuitry, enhancing the integrated cancellation bandwidth, and performing joint radio frequency, analog, and digital cancellation to achieve cancellation with nearly one part-per-billion accuracy.
In collaboration with researchers at higher layers of the stack, a cross-layer approach has been used in our full-duplex research and has allowed us to derive power allocation algorithms and to characterize rate-gain improvements for full-duplex wireless networks. To enable experimental characterization of full-duplex MAC layer algorithms, a cross-layered software-defined full-duplex radio testbed has been developed. In collaboration with researchers from the field of micro-electro-mechanical systems, we demonstrate a multi-band frequency-division duplexing system using a cavity-filter-based tunable duplexer and our integrated widely-tunable self-interference-cancelling receiver
Vocal fold vibratory and acoustic features in fatigued Karaoke singers
Session 3aMU - Musical Acoustics and Speech Communication: Singing Voice in Asian CulturesKaraoke is a popular singing entertainment particularly in Asia and is gaining more popularity in the rest of world. In Karaoke, an amateur singer sings with the background music and video (usually guided by the lyric captions on the video screen) played by Karaoke machine, using a microphone and an amplification system. As the Karaoke singers usually have no formal training, they may be more vulnerable to vocal fatigue as they may overuse and/or misuse their voices in the intensive and extensive singing activities. It is unclear whether vocal fatigue is accompanied by any vibration pattern or physiological changes of vocal folds. In this study, 20 participants aged from 18 to 23 years with normal voice were recruited to participate in an prolonged singing task, which induced vocal fatigue. High speed laryngscopic imaging and acoustic signals were recorded before and after the singing task. Images of /i/ phonation were quantitatively analyzed using the High Speed Video Processing (HSVP) program (Yiu, et al. 2010). It was found that the glottis became relatively narrower following fatigue, while the acoustic signals were not sensitive to measure change following fatigue. © 2012 Acoustical Society of Americapublished_or_final_versio
Reconfigurable Receiver Front-Ends for Advanced Telecommunication Technologies
The exponential growth of converging technologies, including augmented reality, autonomous vehicles, machine-to-machine and machine-to-human interactions, biomedical and environmental sensory systems, and artificial intelligence, is driving the need for robust infrastructural systems capable of handling vast data volumes between end users and service providers. This demand has prompted a significant evolution in wireless communication, with 5G and subsequent generations requiring exponentially improved spectral and energy efficiency compared to their predecessors. Achieving this entails intricate strategies such as advanced digital modulations, broader channel bandwidths, complex spectrum sharing, and carrier aggregation scenarios. A particularly challenging aspect arises in the form of non-contiguous aggregation of up to six carrier components across the frequency range 1 (FR1). This necessitates receiver front-ends to effectively reject out-of-band (OOB) interferences while maintaining high-performance in-band (IB) operation. Reconfigurability becomes pivotal in such dynamic environments, where frequency resource allocation, signal strength, and interference levels continuously change. Software-defined radios (SDRs) and cognitive radios (CRs) emerge as solutions, with direct RF-sampling receivers offering a suitable architecture in which the frequency translation is entirely performed in digital domain to avoid analog mixing issues. Moreover, direct RF- sampling receivers facilitate spectrum observation, which is crucial to identify free zones, and detect interferences. Acoustic and distributed filters offer impressive dynamic range and sharp roll off characteristics, but their bulkiness and lack of electronic adjustment capabilities limit their practicality. Active filters, on the other hand, present opportunities for integration in advanced CMOS technology, addressing size constraints and providing versatile programmability. However, concerns about power consumption, noise generation, and linearity in active filters require careful consideration.This thesis primarily focuses on the design and implementation of a low-voltage, low-power RFFE tailored for direct sampling receivers in 5G FR1 applications. The RFFE consists of a balun low-noise amplifier (LNA), a Q-enhanced filter, and a programmable gain amplifier (PGA). The balun-LNA employs noise cancellation, current reuse, and gm boosting for wideband gain and input impedance matching. Leveraging FD-SOI technology allows for programmable gain and linearity via body biasing. The LNA's operational state ranges between high-performance and high-tolerance modes, which are apt for sensitivityand blocking tests, respectively. The Q-enhanced filter adopts noise-cancelling, current-reuse, and programmable Gm-cells to realize a fourth-order response using two resonators. The fourth-order filter response is achieved by subtracting the individual response of these resonators. Compared to cascaded and magnetically coupled fourth-order filters, this technique maintains the large dynamic range of second-order resonators. Fabricated in 22-nm FD-SOI technology, the RFFE achieves 1%-40% fractional bandwidth (FBW) adjustability from 1.7 GHz to 6.4 GHz, 4.6 dB noise figure (NF) and an OOB third-order intermodulation intercept point (IIP3) of 22 dBm. Furthermore, concerning the implementation uncertainties and potential variations of temperature and supply voltage, design margins have been considered and a hybrid calibration scheme is introduced. A combination of on-chip and off-chip calibration based on noise response is employed to effectively adjust the quality factors, Gm-cells, and resonance frequencies, ensuring desired bandpass response. To optimize and accelerate the calibration process, a reinforcement learning (RL) agent is used.Anticipating future trends, the concept of the Q-enhanced filter extends to a multiple-mode filter for 6G upper mid-band applications. Covering the frequency range from 8 to 20 GHz, this RFFE can be configured as a fourth-order dual-band filter, two bandpass filters (BPFs) with an OOB notch, or a BPF with an IB notch. In cognitive radios, the filter’s transmission zeros can be positioned with respect to the carrier frequencies of interfering signals to yield over 50 dB blocker rejection