3,051 research outputs found
Zero-Shot Blind Audio Bandwidth Extension
Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to non-blind
filter-informed methods when tested with synthetic data. Moreover, BABE
exhibits robust generalization capabilities when enhancing real historical
recordings, effectively reconstructing the missing high-frequency content while
maintaining coherence with the original recording. Subjective preference tests
confirm that BABE significantly improves the audio quality of historical music
recordings. Examples of historical recordings restored with the proposed method
are available on the companion webpage:
(http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Advancements of MultiRate Signal processing for Wireless Communication Networks: Current State Of the Art
With the hasty growth of internet contact and voice and information centric communications, many contact technologies have been urbanized to meet the stringent insist of high speed information transmission and viaduct the wide bandwidth gap among ever-increasing high-data-rate core system and bandwidth-hungry end-user complex. To make efficient consumption of the limited bandwidth of obtainable access routes and cope with the difficult channel environment, several standards have been projected for a variety of broadband access scheme over different access situation (twisted pairs, coaxial cables, optical fibers, and unchanging or mobile wireless admittance). These access situations may create dissimilar channel impairments and utter unique sets of signal dispensation algorithms and techniques to combat precise impairments. In the intended and implementation sphere of those systems, many research issues arise. In this paper we present advancements of multi-rate indication processing methodologies that are aggravated by this design trend. The thesis covers the contemporary confirmation of the current literature on intrusion suppression using multi-rate indication in wireless communiquE9; networks
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
An Application of Spectral Translation and Spectral Envelope Extrapolation for High-frequency Bandwidth Extension of Generic Audio Signals
The scope of this work is to introduce a conceptually simple yet effective algorithm for blind high-frequency bandwidth extension of audio signals, a means of improving perceptual quality for sound which has been previously low-pass filtered or downsampled (typically due to storage considerations). The algorithm combines an application of the modulation theorem for discrete Fourier transform to regenerate the missing high-frequency end of the signal spectrum with a linear-regression-driven approach to shape the spectral envelope for the regenerated band. The results are graphically and acoustically compared to those obtained with existing audio restoration software for a variety of input signals. The source code and Windows binaries of the resulting algorithm implementation are also included
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
This paper presents a configurable version of Extreme Bandwidth Extension
Network (EBEN), a Generative Adversarial Network (GAN) designed to improve
audio captured with body-conduction microphones. We show that although these
microphones significantly reduce environmental noise, this insensitivity to
ambient noise happens at the expense of the bandwidth of the speech signal
acquired by the wearer of the devices. The obtained captured signals therefore
require the use of signal enhancement techniques to recover the full-bandwidth
speech. EBEN leverages a configurable multiband decomposition of the raw
captured signal. This decomposition allows the data time domain dimensions to
be reduced and the full band signal to be better controlled. The multiband
representation of the captured signal is processed through a U-Net-like model,
which combines feature and adversarial losses to generate an enhanced speech
signal. We also benefit from this original representation in the proposed
configurable discriminators architecture. The configurable EBEN approach can
achieve state-of-the-art enhancement results on synthetic data with a
lightweight generator that allows real-time processing.Comment: Accepted in IEEE/ACM Transactions on Audio, Speech and Language
Processing on 14/08/202
VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Restoring degraded music signals is essential to enhance audio quality for
downstream music manipulation. Recent diffusion-based music restoration methods
have demonstrated impressive performance, and among them, diffusion posterior
sampling (DPS) stands out given its intrinsic properties, making it versatile
across various restoration tasks. In this paper, we identify that there are
potential issues which will degrade current DPS-based methods' performance and
introduce the way to mitigate the issues inspired by diverse diffusion guidance
techniques including the RePaint (RP) strategy and the Pseudoinverse-Guided
Diffusion Models (GDM). We demonstrate our methods for the vocal
declipping and bandwidth extension tasks under various levels of distortion and
cutoff frequency, respectively. In both tasks, our methods outperform the
current DPS-based music restoration benchmarks. We refer to
\url{http://carlosholivan.github.io/demos/audio-restoration-2023.html} for
examples of the restored audio samples
Efficient Multiband Algorithms for Blind Source Separation
The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme
DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION
Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima
- …