405 research outputs found
Differentiable Artificial Reverberation
Artificial reverberation (AR) models play a central role in various audio
applications. Therefore, estimating the AR model parameters (ARPs) of a target
reverberation is a crucial task. Although a few recent deep-learning-based
approaches have shown promising performance, their non-end-to-end training
scheme prevents them from fully exploiting the potential of deep neural
networks. This motivates to introduce differentiable artificial reverberation
(DAR) models which allows loss gradients to be back-propagated end-to-end.
However, implementing the AR models with their difference equations "as is" in
the deep-learning framework severely bottlenecks the training speed when
executed with a parallel processor like GPU due to their infinite impulse
response (IIR) components. We tackle this problem by replacing the IIR filters
with finite impulse response (FIR) approximations with the frequency-sampling
method (FSM). Using the FSM, we implement three DAR models -- differentiable
Filtered Velvet Noise (FVN), Advanced Filtered Velvet Noise (AFVN), and
Feedback Delay Network (FDN). For each AR model, we train its ARP estimation
networks for analysis-synthesis (RIR-to-ARP) and blind estimation
(reverberant-speech-to-ARP) task in an end-to-end manner with its DAR model
counterpart. Experiment results show that the proposed method achieves
consistent performance improvement over the non-end-to-end approaches in both
objective metrics and subjective listening test results.Comment: Manuscript submitted to TASL
Frequency domain variant of Velvet noise and its application to acoustic measurements
We propose a new family of test signals for acoustic measurements such as
impulse response, nonlinearity, and the effects of background noise. The
proposed family complements difficulties in existing families, the Swept-Sine
(SS), pseudo-random noise such as the maximum length sequence (MLS). The
proposed family uses the frequency domain variant of the Velvet noise (FVN) as
its building block. An FVN is an impulse response of an all-pass filter and
yields the unit impulse when convolved with the time-reversed version of
itself. In this respect, FVN is a member of the time-stretched pulse (TSP) in
the broadest sense. The high degree of freedom in designing an FVN opens a vast
range of applications in acoustic measurement. We introduce the following
applications and their specific procedures, among other possibilities. They are
as follows. a) Spectrum shaping adaptive to background noise. b) Simultaneous
measurement of impulse responses of multiple acoustic paths. d) Simultaneous
measurement of linear and nonlinear components of an acoustic path. e)
Automatic procedure for time axis alignment of the source and the receiver when
they are using independent clocks in acoustic impulse response measurement. We
implemented a reference measurement tool equipped with all these procedures.
The MATLAB source code and related materials are open-sourced and placed in a
GitHub repository.Comment: 10 pages, 14 figures, APSIPA ASC 2019. arXiv admin note: text overlap
with arXiv:1806.0681
Simultaneous Measurement of Multiple Acoustic Attributes Using Structured Periodic Test Signals Including Music and Other Sound Materials
We introduce a general framework for measuring acoustic properties such as
liner time-invariant (LTI) response, signal-dependent time-invariant (SDTI)
component, and random and time-varying (RTV) component simultaneously using
structured periodic test signals. The framework also enables music pieces and
other sound materials as test signals by "safeguarding" them by adding slight
deterministic "noise." Measurement using swept-sin, MLS (Maxim Length
Sequence), and their variants are special cases of the proposed framework. We
implemented interactive and real-time measuring tools based on this framework
and made them open-source. Furthermore, we applied this framework to assess
pitch extractors objectively.Comment: 8 pages, 17 figures, accepted for APSIPA ASC 202
A pilot study on discriminative power of features of superficial venous pattern in the hand
The goal of the project is to develop an automatic way to identify, represent the superficial vasculature of the back hand and investigate its discriminative power as biometric feature.
A prototype of a system that extracts the superficial venous pattern of infrared images of back hands will be described. Enhancement algorithms are used to solve the lack of contrast of the infrared images. To trace the veins, a vessel tracking technique is applied, obtaining binary masks of the superficial venous tree. Successively, a method to estimate the blood vessels calibre, length, the location and angles of vessel junctions, will be presented. The discriminative power of these features will be studied, independently and simultaneously, considering two features vector.
Pattern matching of two vasculature maps will be performed, to investigate the uniqueness of the vessel network / L’obiettivo del progetto è di sviluppare un metodo automatico per identificare e rappresentare la rete vascolare superficiale presente nel dorso della mano ed investigare sul suo potere discriminativo come caratteristica biometrica.
Un prototipo di sistema che estrae l’albero superficiale delle vene da immagini infrarosse del dorso della mano sarà descritto. Algoritmi per il miglioramento del contrasto delle immagini infrarosse saranno applicati. Per tracciare le vene, una tecnica di tracking verrà utilizzata per ottenere una maschera binaria della rete vascolare. Successivamente, un metodo per stimare il calibro e la lunghezza dei vasi sanguigni, la posizione e gli angoli delle giunzioni sarà trattato.
Il potere discriminativo delle precedenti caratteristiche verrà studiato ed una tecnica di pattern matching di due modelli vascolari sarà presentata per verificare l’unicità di quest
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
Music Production Behaviour Modelling
The new millennium has seen an explosion of computational approaches to the study of music production, due in part to the decreasing cost of computation and the increase of digital music production techniques. The rise of digital recording equipment, MIDI, digital audio workstations (DAWs), and software plugins for audio effects led to the digital capture of various processes in music production. This discretization of traditionally analogue methods allowed for the development of intelligent music production, which uses machine learning to numerically characterize and automate portions of the music production process. One algorithm from the field referred to as ``reverse engineering a multitrack mix'' can recover the audio effects processing used to transform a multitrack recording into a mixdown in the absence of information about how the mixdown was achieved. This thesis improves on this method of reverse engineering a mix by leveraging recent advancements in machine learning for audio. Using the differentiable digital signal processing paradigm, greybox modules for gain, panning, equalisation, artificial reverberation, memoryless waveshaping distortion, and dynamic range compression are presented. These modules are then connected in a mixing chain and are optimized to learn the effects used in a given mixdown. Both objective and perceptual metrics are presented to measure the performance of these various modules in isolation and within a full mixing chain. Ultimately a fully differentiable mixing chain is presented that outperforms previously proposed methods to reverse engineer a mix. Directions for future work are proposed to improve characterization of multitrack mixing behaviours
Deep Learning for Audio Effects Modeling
PhD Thesis.Audio effects modeling is the process of emulating an audio effect unit and seeks
to recreate the sound, behaviour and main perceptual features of an analog reference
device. Audio effect units are analog or digital signal processing systems
that transform certain characteristics of the sound source. These transformations
can be linear or nonlinear, time-invariant or time-varying and with short-term and
long-term memory. Most typical audio effect transformations are based on dynamics,
such as compression; tone such as distortion; frequency such as equalization;
and time such as artificial reverberation or modulation based audio effects.
The digital simulation of these audio processors is normally done by designing
mathematical models of these systems. This is often difficult because it seeks to
accurately model all components within the effect unit, which usually contains
mechanical elements together with nonlinear and time-varying analog electronics.
Most existing methods for audio effects modeling are either simplified or optimized
to a very specific circuit or type of audio effect and cannot be efficiently
translated to other types of audio effects.
This thesis aims to explore deep learning architectures for music signal processing
in the context of audio effects modeling. We investigate deep neural networks
as black-box modeling strategies to solve this task, i.e. by using only input-output
measurements. We propose different DSP-informed deep learning models to emulate
each type of audio effect transformations.
Through objective perceptual-based metrics and subjective listening tests we
explore the performance of these models when modeling various analog audio effects.
Also, we analyze how the given tasks are accomplished and what the models
are actually learning. We show virtual analog models of nonlinear effects, such as
a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter;
and electromechanical nonlinear time-varying effects, such as a Leslie speaker
cabinet and plate and spring reverberators.
We report that the proposed deep learning architectures represent an improvement
of the state-of-the-art in black-box modeling of audio effects and the respective
directions of future work are given
Proceedings of the Scientific-Practical Conference "Research and Development - 2016"
talent management; sensor arrays; automatic speech recognition; dry separation technology; oil production; oil waste; laser technolog
- …