11 research outputs found
Modelling black-box audio effects with time-varying feature modulation
Deep learning approaches for black-box modelling of audio effects have shown
promise, however, the majority of existing work focuses on nonlinear effects
with behaviour on relatively short time-scales, such as guitar amplifiers and
distortion. While recurrent and convolutional architectures can theoretically
be extended to capture behaviour at longer time scales, we show that simply
scaling the width, depth, or dilation factor of existing architectures does not
result in satisfactory performance when modelling audio effects such as fuzz
and dynamic range compression. To address this, we propose the integration of
time-varying feature-wise linear modulation into existing temporal
convolutional backbones, an approach that enables learnable adaptation of the
intermediate activations. We demonstrate that our approach more accurately
captures long-range dependencies for a range of fuzz and compressor
implementations across both time and frequency domain metrics. We provide sound
examples, source code, and pretrained models to faciliate reproducibility
Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment
Virtual analog (VA) audio effects are increasingly based on neural networks
and deep learning frameworks. Due to the underlying black-box methodology, a
successful model will learn to approximate the data it is presented, including
potential errors such as latency and audio dropouts as well as non-linear
characteristics and frequency-dependent phase shifts produced by the hardware.
The latter is of particular interest as the learned phase-response might cause
unwanted audible artifacts when the effect is used for creative processing
techniques such as dry-wet mixing or parallel compression. To overcome these
artifacts we propose differentiable signal processing tools and deep
optimization structures for automatically tuning all-pass filters to predict
the phase response of different VA simulations, and align processed signals
that are out of phase. The approaches are assessed using objective metrics
while listening tests evaluate their ability to enhance the quality of parallel
path processing techniques. Ultimately, an over-parameterized, BiasNet-based,
all-pass model is proposed for the optimization problem under consideration,
resulting in models that can estimate all-pass filter coefficients to align a
dry signal with its affected, wet, equivalent.Comment: Collaboration done while interning/employed at Native Instruments.
Accepted for publication in Proc. DAFX'23, Copenhagen, Denmark, September
2023. Sound examples at https://abargum.github.io v2: 10 pages, LaTeX;
figures resized, pdf optimize
Adversarial Guitar Amplifier Modelling With Unpaired Data
We propose an audio effects processing framework that learns to emulate a
target electric guitar tone from a recording. We train a deep neural network
using an adversarial approach, with the goal of transforming the timbre of a
guitar, into the timbre of another guitar after audio effects processing has
been applied, for example, by a guitar amplifier. The model training requires
no paired data, and the resulting model emulates the target timbre well whilst
being capable of real-time processing on a modern personal computer. To verify
our approach we present two experiments, one which carries out unpaired
training using paired data, allowing us to monitor training via objective
metrics, and another that uses fully unpaired data, corresponding to a
realistic scenario where a user wants to emulate a guitar timbre only using
audio data from a recording. Our listening test results confirm that the models
are perceptually convincing
Differentiable all-pass filters for phase response estimation and automatic signal alignment
Virtual analog (VA) audio effects are increasingly based on neural networks and deep learning frameworks. Due to the underlying black-box methodology, a successful model will learn to approximate the data it is presented, including potential errors such as latency and audio dropouts as well as non-linear characteristics and frequency-dependent phase shifts produced by the hardware. The latter is of particular interest as the learned phase-response might cause unwanted audible artifacts when the effect is used for creative processing techniques such as dry-wet mixing or parallel compression. To overcome these artifacts we propose differentiable signal processing tools and deep optimization structures for automatically tuning all-pass filters to predict the phase response of different VA simulations, and align processed signals that are out of phase. The approaches are assessed using objective metrics while listening tests evaluate their ability to enhance the quality of parallel path processing techniques. Ultimately, an over-parameterized, BiasNet-based, all-pass model is proposed for the optimization problem under consideration, resulting in models that can estimate all-pass filter coefficients to align a dry signal with its affected, wet, equivalent.</p
Neural modeling of magnetic tape recorders
The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices with applications in music production and audio antiquing tasks
Deep Learning for Black-Box Modeling of Audio Effects
Virtual analog modeling of audio effects consists of emulating the sound of an audio processor reference device. This digital simulation is normally done by designing mathematical models of these systems. It is often difficult because it seeks to accurately model all components within the effect unit, which usually contains various nonlinearities and time-varying components. Most existing methods for audio effects modeling are either simplified or optimized to a very specific circuit or type of audio effect and cannot be efficiently translated to other types of audio effects. Recently, deep neural networks have been explored as black-box modeling strategies to solve this task, i.e., by using only inputâoutput measurements. We analyse different state-of-the-art deep learning models based on convolutional and recurrent neural networks, feedforward WaveNet architectures and we also introduce a new model based on the combination of the aforementioned models. Through objective perceptual-based metrics and subjective listening tests we explore the performance of these models when modeling various analog audio effects. Thus, we show virtual analog models of nonlinear effects, such as a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter and nonlinear time-varying effects, such as the rotating horn and rotating woofer of a Leslie speaker cabinet