Search CORE

38 research outputs found

A review of differentiable digital signal processing for music and speech synthesis

Author: Fazekas G
Hayes B
McPherson A
Saitis C
Shier J
Publication venue: Frontiers Media
Publication date: 11/01/2024
Field of study

The term “differentiable digital signal processing” describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music and speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably, which is further supported by a web book containing practical advice on differentiable synthesiser programming (https://intro2ddsp.github.io/). Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research

A Parametric Sound Object Model for Sound Texture Synthesis

Author: Möhlmann Daniel
Publication venue
Publication date: 01/01/2011
Field of study

This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

3rd SC@RUG 2006 proceedings:Student Colloquium 2005-2006

Author
Publication venue: Rijksuniversiteit Groningen. Universiteitsbibliotheek
Publication date: 01/01/2006
Field of study

Dissertations of the University of Groningen

3rd SC@RUG 2006 proceedings:Student Colloquium 2005-2006

Author
Publication venue: Rijksuniversiteit Groningen. Universiteitsbibliotheek
Publication date: 01/01/2006
Field of study

Dissertations of the University of Groningen