23 research outputs found
Musical Approaches for Working with Time-Delayed Feedback Networks
(Abstract to follow
Re-Sonification of Objects, Events, and Environments
abstract: Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating objects. In this work, methods of sound synthesis by re-sonification are considered. Re-sonification, herein, refers to the general process of analyzing, possibly transforming, and resynthesizing or reusing recorded sounds in meaningful ways, to convey information. Applied to soundscapes, re-sonification is presented as a means of conveying activity within an environment. Applied to the sounds of objects, this work examines modeling the perception of objects as well as their physical properties and the ability to simulate interactive events with such objects. To create soundscapes to re-sonify geographic environments, a method of automated soundscape design is presented. Using recorded sounds that are classified based on acoustic, social, semantic, and geographic information, this method produces stochastically generated soundscapes to re-sonify selected geographic areas. Drawing on prior knowledge, local sounds and those deemed similar comprise a locale's soundscape. In the context of re-sonifying events, this work examines processes for modeling and estimating the excitations of sounding objects. These include plucking, striking, rubbing, and any interaction that imparts energy into a system, affecting the resultant sound. A method of estimating a linear system's input, constrained to a signal-subspace, is presented and applied toward improving the estimation of percussive excitations for re-sonification. To work toward robust recording-based modeling and re-sonification of objects, new implementations of banded waveguide (BWG) models are proposed for object modeling and sound synthesis. Previous implementations of BWGs use arbitrary model parameters and may produce a range of simulations that do not match digital waveguide or modal models of the same design. Subject to linear excitations, some models proposed here behave identically to other equivalently designed physical models. Under nonlinear interactions, such as bowing, many of the proposed implementations exhibit improvements in the attack characteristics of synthesized sounds.Dissertation/ThesisPh.D. Electrical Engineering 201
Deep Learning for Audio Effects Modeling
PhD Thesis.Audio effects modeling is the process of emulating an audio effect unit and seeks
to recreate the sound, behaviour and main perceptual features of an analog reference
device. Audio effect units are analog or digital signal processing systems
that transform certain characteristics of the sound source. These transformations
can be linear or nonlinear, time-invariant or time-varying and with short-term and
long-term memory. Most typical audio effect transformations are based on dynamics,
such as compression; tone such as distortion; frequency such as equalization;
and time such as artificial reverberation or modulation based audio effects.
The digital simulation of these audio processors is normally done by designing
mathematical models of these systems. This is often difficult because it seeks to
accurately model all components within the effect unit, which usually contains
mechanical elements together with nonlinear and time-varying analog electronics.
Most existing methods for audio effects modeling are either simplified or optimized
to a very specific circuit or type of audio effect and cannot be efficiently
translated to other types of audio effects.
This thesis aims to explore deep learning architectures for music signal processing
in the context of audio effects modeling. We investigate deep neural networks
as black-box modeling strategies to solve this task, i.e. by using only input-output
measurements. We propose different DSP-informed deep learning models to emulate
each type of audio effect transformations.
Through objective perceptual-based metrics and subjective listening tests we
explore the performance of these models when modeling various analog audio effects.
Also, we analyze how the given tasks are accomplished and what the models
are actually learning. We show virtual analog models of nonlinear effects, such as
a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter;
and electromechanical nonlinear time-varying effects, such as a Leslie speaker
cabinet and plate and spring reverberators.
We report that the proposed deep learning architectures represent an improvement
of the state-of-the-art in black-box modeling of audio effects and the respective
directions of future work are given
16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)
The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28-31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25-28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc
Recommended from our members
UNRAVEL: Acoustic and Electronic Resynthesis
UNRAVEL, a work for alto saxophone and interactive electronics. Examines works for saxophone and electro-acoustic music. Analyzes modes of interactivity using Robert Rowe's guidelines, with sonogram, score, and programming examples. Investigates hybrid serial-parallel signal-processing networks, and their potential for timbral transformations. Explores compositional working methods, particularly as related to electro-acoustic music
Selective harmonic elimination methods for a cascaded H-bridge converter
In recent years there has been an increased demand for integration of renewable energy into the electricity grid. This has increased research into power converter solutions required to integrate renewable technology into the electricity supply. One such converter is a Cascaded H-Bridge (CHB) Multilevel Converter.
Operation of such a topology requires strict control of power flow to ensure that energy is distributed equally across the converters energy storage components. For operation at high power levels, advanced modulation methods may be required to ensure that losses due to non-ideal semiconductor switching are minimised, whilst not compromising the quality of the voltage waveform being produced by the converter.
This thesis presents several low switching frequency modulation methods based on Selective Harmonic Elimination (SHE) in order to address these two operational issues. The methods presented involve manipulating the H-Bridge cell voltages of the CHB converter to control power flow. Simulated results are supported by experimental verification from a seven level, single phase CHB converter
Selective harmonic elimination methods for a cascaded H-bridge converter
In recent years there has been an increased demand for integration of renewable energy into the electricity grid. This has increased research into power converter solutions required to integrate renewable technology into the electricity supply. One such converter is a Cascaded H-Bridge (CHB) Multilevel Converter.
Operation of such a topology requires strict control of power flow to ensure that energy is distributed equally across the converters energy storage components. For operation at high power levels, advanced modulation methods may be required to ensure that losses due to non-ideal semiconductor switching are minimised, whilst not compromising the quality of the voltage waveform being produced by the converter.
This thesis presents several low switching frequency modulation methods based on Selective Harmonic Elimination (SHE) in order to address these two operational issues. The methods presented involve manipulating the H-Bridge cell voltages of the CHB converter to control power flow. Simulated results are supported by experimental verification from a seven level, single phase CHB converter
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique