Search CORE

436 research outputs found

Synthèse de textures sonores à partir de statistiques temps-fréquence

Author: Caracalla Hugo
Publication venue: HAL CCSD
Publication date: 06/12/2019
Field of study

Sound textures are a wide class of sounds that includes the sound of the rain falling, the hubbub of a crowd and the chirping of flocks of birds. All these sounds present an element of unpredictability which is not commonly sought after in sound synthesis, requiring the use of dedicated algorithms. However, the diverse audio properties of sound textures make the designing of an algorithm able to convincingly recreate varied textures a complex task. This thesis focuses on parametric sound texture synthesis. In this paradigm, a set of summary statistics are extracted from a target texture and iteratively imposed onto a white noise. If the set of statistics is appropriate, the white noise is modified until it resemble the target, sounding as if it had been recorded moments later. In a first part, we propose improvements to perceptual-based parametric method. These improvements aim at making its synthesis of sharp and salient events by mainly altering and simplifying its imposition process. In a second, we adapt a parametric visual texture synthesis method based statistics extracted by a Convolutional Neural Networks (CNN) to work on sound textures. We modify the computation of its statistics to fit the properties of sound signals, alter the architecture of the CNN to best fit audio elements present in sound textures and use a time-frequency representation taking both magnitude and phase into account.Les textures sonores sont une catégorie de sons incluant le bruit de la pluie, le brouhaha d’une foule ou les pépiements d’un groupe d’oiseaux. Tous ces sons contiennent une part d’imprévisibilité qui n’est habituellement pas recherchée en synthèse sonore, et rend ainsi indispensable l’utilisation d’algorithmes dédiés. Cependant, la grande diversité de leurs propriétés complique la création d’un algorithme capable de synthétiser un large panel de textures. Cette thèse s’intéresse à la synthèse paramétrique de textures sonores. Dans ce paradigme, un ensemble de statistiques sont extraites d’une texture cible et progressivement imposées sur un bruit blanc. Si l’ensemble de statistiques est pertinent, le bruit blanc est alors modifié jusqu’à ressembler à la cible, donnant l’illusion d’avoir été enregistré quelques instants après. Dans un premier temps, nous proposons l’amélioration d’une méthode paramétrique basée sur des statistiques perceptuelles. Cette amélioration vise à améliorer la synthèse d’évènements à forte attaque ou singuliers en modifiant et simplifiant le processus d’imposition. Dans un second temps, nous adaptons une méthode paramétrique de synthèse de textures visuelles basée sur des statistiques extraites par un réseau de neurones convolutifs (CNN) afin de l’utiliser sur des textures sonores. Nous modifions l’ensemble de statistiques utilisées afin de mieux correspondre aux propriétés des signaux sonores, changeons l’architecture du CNN pour l’adapter aux événements présents dans les textures sonores et utilisons une représentation temps-fréquence prenant en compte à la fois amplitude et phase

Thèses en Ligne

Synthèse de textures sonores à partir de statistiques temps-fréquence

Author: Caracalla Hugo
Publication venue
Publication date: 06/12/2019
Field of study

Les textures sonores sont une catégorie de sons incluant le bruit de la pluie, le brouhaha d’une foule ou les pépiements d’un groupe d’oiseaux. Tous ces sons contiennent une part d’imprévisibilité qui n’est habituellement pas recherchée en synthèse sonore, et rend ainsi indispensable l’utilisation d’algorithmes dédiés. Cependant, la grande diversité de leurs propriétés complique la création d’un algorithme capable de synthétiser un large panel de textures. Cette thèse s’intéresse à la synthèse paramétrique de textures sonores. Dans ce paradigme, un ensemble de statistiques sont extraites d’une texture cible et progressivement imposées sur un bruit blanc. Si l’ensemble de statistiques est pertinent, le bruit blanc est alors modifié jusqu’à ressembler à la cible, donnant l’illusion d’avoir été enregistré quelques instants après. Dans un premier temps, nous proposons l’amélioration d’une méthode paramétrique basée sur des statistiques perceptuelles. Cette amélioration vise à améliorer la synthèse d’évènements à forte attaque ou singuliers en modifiant et simplifiant le processus d’imposition. Dans un second temps, nous adaptons une méthode paramétrique de synthèse de textures visuelles basée sur des statistiques extraites par un réseau de neurones convolutifs (CNN) afin de l’utiliser sur des textures sonores. Nous modifions l’ensemble de statistiques utilisées afin de mieux correspondre aux propriétés des signaux sonores, changeons l’architecture du CNN pour l’adapter aux événements présents dans les textures sonores et utilisons une représentation temps-fréquence prenant en compte à la fois amplitude et phase.Sound textures are a wide class of sounds that includes the sound of the rain falling, the hubbub of a crowd and the chirping of flocks of birds. All these sounds present an element of unpredictability which is not commonly sought after in sound synthesis, requiring the use of dedicated algorithms. However, the diverse audio properties of sound textures make the designing of an algorithm able to convincingly recreate varied textures a complex task. This thesis focuses on parametric sound texture synthesis. In this paradigm, a set of summary statistics are extracted from a target texture and iteratively imposed onto a white noise. If the set of statistics is appropriate, the white noise is modified until it resemble the target, sounding as if it had been recorded moments later. In a first part, we propose improvements to perceptual-based parametric method. These improvements aim at making its synthesis of sharp and salient events by mainly altering and simplifying its imposition process. In a second, we adapt a parametric visual texture synthesis method based statistics extracted by a Convolutional Neural Networks (CNN) to work on sound textures. We modify the computation of its statistics to fit the properties of sound signals, alter the architecture of the CNN to best fit audio elements present in sound textures and use a time-frequency representation taking both magnitude and phase into account

Theses.fr

Synthèse de textures sonores à partir de statistiques temps-fréquence

Author: Caracalla Hugo
Publication venue: HAL CCSD
Publication date: 06/12/2019
Field of study

Synthèse de textures sonores à partir de statistiques temps-fréquence

Author: Caracalla Hugo
Roebel Axel
Publication venue: HAL CCSD
Publication date: 15/07/2016
Field of study

Hal-Diderot

Gradient conversion between time and frequency domains using Wirtinger calculus

Author: Caracalla Hugo
Roebel Axel
Publication venue: HAL CCSD
Publication date: 05/09/2017
Field of study

International audienceGradient descent algorithms are found in a variety of scientific fields, audio signal processing included. This paper presents a new method of converting any gradient of a cost function with respect to a signal into, or from, a gradient with respect to the spectrum of this signal: thus, it allows the gradient descent to be performed indiscriminately in time or frequency domain. For efficiency purposes, and because the gradient of a real function with respect to a complex signal does not formally exist, this work is performed using Wirtinger calculus. An application to sound texture synthesis then experimentally validates this gradient conversion

HAL Descartes

Hal-Diderot

[Monnaie : Bronze, Ariassos, Pisidie, Caracalla]

Author: Ariassos (Pisidie
Caracalla (0186-0217
Publication venue
Publication date
Field of study

Appartient à l’ensemble documentaire : MonnGr

[Monnaie : Bronze, Pergame, Mysie, Caracalla]

Author: Caracalla (0186-0217
Pergame (Mysie
Publication venue
Publication date
Field of study

Appartient à l’ensemble documentaire : MonnGr

Sound texture synthesis using Convolutional Neural Networks

Author: Caracalla Hugo
Roebel Axel
Publication venue: HAL CCSD
Publication date
Field of study

International audienceThe following article introduces a new parametric synthesis algorithm for sound textures inspired by existing methods used for visual textures. Using a 2D Convolutional Neural Network (CNN), a sound signal is modified until the temporal cross-correlations of the feature maps of its log-spectrogram resemble those of a target texture. We show that the resulting synthesized sound signal is both different from the original and of high quality, while being able to reproduce singular events appearing in the original. This process is performed in the time domain, discarding the harmful phase recovery step which usually concludes synthesis performed in the time-frequency domain. It is also straightforward and flexible, as it does not require any fine tuning between several losses when synthesizing diverse sound textures. Synthesized spectrograms and sound signals are showcased, and a way of extending the synthesis in order to produce a sound of any length is also presented. We also discuss the choice of CNN, border effects in our synthesized signals and possible ways of modifying the algorithm in order to improve its current long computation time

[Monnaie : Bronze, Antioche, Pisidie, Caracalla]

Author: Antioche (Pisidie
Caracalla (0186-0217
Publication venue
Publication date
Field of study

Appartient à l’ensemble documentaire : MonnGr