436 research outputs found
SynthÚse de textures sonores à partir de statistiques temps-fréquence
Sound textures are a wide class of sounds that includes the sound of the rain falling, the hubbub of a crowd and the chirping of flocks of birds. All these sounds present an element of unpredictability which is not commonly sought after in sound synthesis, requiring the use of dedicated algorithms. However, the diverse audio properties of sound textures make the designing of an algorithm able to convincingly recreate varied textures a complex task. This thesis focuses on parametric sound texture synthesis. In this paradigm, a set of summary statistics are extracted from a target texture and iteratively imposed onto a white noise. If the set of statistics is appropriate, the white noise is modified until it resemble the target, sounding as if it had been recorded moments later. In a first part, we propose improvements to perceptual-based parametric method. These improvements aim at making its synthesis of sharp and salient events by mainly altering and simplifying its imposition process. In a second, we adapt a parametric visual texture synthesis method based statistics extracted by a Convolutional Neural Networks (CNN) to work on sound textures. We modify the computation of its statistics to fit the properties of sound signals, alter the architecture of the CNN to best fit audio elements present in sound textures and use a time-frequency representation taking both magnitude and phase into account.Les textures sonores sont une catĂ©gorie de sons incluant le bruit de la pluie, le brouhaha dâune foule ou les pĂ©piements dâun groupe dâoiseaux. Tous ces sons contiennent une part dâimprĂ©visibilitĂ© qui nâest habituellement pas recherchĂ©e en synthĂšse sonore, et rend ainsi indispensable lâutilisation dâalgorithmes dĂ©diĂ©s. Cependant, la grande diversitĂ© de leurs propriĂ©tĂ©s complique la crĂ©ation dâun algorithme capable de synthĂ©tiser un large panel de textures. Cette thĂšse sâintĂ©resse Ă la synthĂšse paramĂ©trique de textures sonores. Dans ce paradigme, un ensemble de statistiques sont extraites dâune texture cible et progressivement imposĂ©es sur un bruit blanc. Si lâensemble de statistiques est pertinent, le bruit blanc est alors modifiĂ© jusquâĂ ressembler Ă la cible, donnant lâillusion dâavoir Ă©tĂ© enregistrĂ© quelques instants aprĂšs. Dans un premier temps, nous proposons lâamĂ©lioration dâune mĂ©thode paramĂ©trique basĂ©e sur des statistiques perceptuelles. Cette amĂ©lioration vise Ă amĂ©liorer la synthĂšse dâĂ©vĂšnements Ă forte attaque ou singuliers en modifiant et simplifiant le processus dâimposition. Dans un second temps, nous adaptons une mĂ©thode paramĂ©trique de synthĂšse de textures visuelles basĂ©e sur des statistiques extraites par un rĂ©seau de neurones convolutifs (CNN) afin de lâutiliser sur des textures sonores. Nous modifions lâensemble de statistiques utilisĂ©es afin de mieux correspondre aux propriĂ©tĂ©s des signaux sonores, changeons lâarchitecture du CNN pour lâadapter aux Ă©vĂ©nements prĂ©sents dans les textures sonores et utilisons une reprĂ©sentation temps-frĂ©quence prenant en compte Ă la fois amplitude et phase
SynthÚse de textures sonores à partir de statistiques temps-fréquence
Les textures sonores sont une catĂ©gorie de sons incluant le bruit de la pluie, le brouhaha dâune foule ou les pĂ©piements dâun groupe dâoiseaux. Tous ces sons contiennent une part dâimprĂ©visibilitĂ© qui nâest habituellement pas recherchĂ©e en synthĂšse sonore, et rend ainsi indispensable lâutilisation dâalgorithmes dĂ©diĂ©s. Cependant, la grande diversitĂ© de leurs propriĂ©tĂ©s complique la crĂ©ation dâun algorithme capable de synthĂ©tiser un large panel de textures. Cette thĂšse sâintĂ©resse Ă la synthĂšse paramĂ©trique de textures sonores. Dans ce paradigme, un ensemble de statistiques sont extraites dâune texture cible et progressivement imposĂ©es sur un bruit blanc. Si lâensemble de statistiques est pertinent, le bruit blanc est alors modifiĂ© jusquâĂ ressembler Ă la cible, donnant lâillusion dâavoir Ă©tĂ© enregistrĂ© quelques instants aprĂšs. Dans un premier temps, nous proposons lâamĂ©lioration dâune mĂ©thode paramĂ©trique basĂ©e sur des statistiques perceptuelles. Cette amĂ©lioration vise Ă amĂ©liorer la synthĂšse dâĂ©vĂšnements Ă forte attaque ou singuliers en modifiant et simplifiant le processus dâimposition. Dans un second temps, nous adaptons une mĂ©thode paramĂ©trique de synthĂšse de textures visuelles basĂ©e sur des statistiques extraites par un rĂ©seau de neurones convolutifs (CNN) afin de lâutiliser sur des textures sonores. Nous modifions lâensemble de statistiques utilisĂ©es afin de mieux correspondre aux propriĂ©tĂ©s des signaux sonores, changeons lâarchitecture du CNN pour lâadapter aux Ă©vĂ©nements prĂ©sents dans les textures sonores et utilisons une reprĂ©sentation temps-frĂ©quence prenant en compte Ă la fois amplitude et phase.Sound textures are a wide class of sounds that includes the sound of the rain falling, the hubbub of a crowd and the chirping of flocks of birds. All these sounds present an element of unpredictability which is not commonly sought after in sound synthesis, requiring the use of dedicated algorithms. However, the diverse audio properties of sound textures make the designing of an algorithm able to convincingly recreate varied textures a complex task. This thesis focuses on parametric sound texture synthesis. In this paradigm, a set of summary statistics are extracted from a target texture and iteratively imposed onto a white noise. If the set of statistics is appropriate, the white noise is modified until it resemble the target, sounding as if it had been recorded moments later. In a first part, we propose improvements to perceptual-based parametric method. These improvements aim at making its synthesis of sharp and salient events by mainly altering and simplifying its imposition process. In a second, we adapt a parametric visual texture synthesis method based statistics extracted by a Convolutional Neural Networks (CNN) to work on sound textures. We modify the computation of its statistics to fit the properties of sound signals, alter the architecture of the CNN to best fit audio elements present in sound textures and use a time-frequency representation taking both magnitude and phase into account
SynthÚse de textures sonores à partir de statistiques temps-fréquence
Sound textures are a wide class of sounds that includes the sound of the rain falling, the hubbub of a crowd and the chirping of flocks of birds. All these sounds present an element of unpredictability which is not commonly sought after in sound synthesis, requiring the use of dedicated algorithms. However, the diverse audio properties of sound textures make the designing of an algorithm able to convincingly recreate varied textures a complex task. This thesis focuses on parametric sound texture synthesis. In this paradigm, a set of summary statistics are extracted from a target texture and iteratively imposed onto a white noise. If the set of statistics is appropriate, the white noise is modified until it resemble the target, sounding as if it had been recorded moments later. In a first part, we propose improvements to perceptual-based parametric method. These improvements aim at making its synthesis of sharp and salient events by mainly altering and simplifying its imposition process. In a second, we adapt a parametric visual texture synthesis method based statistics extracted by a Convolutional Neural Networks (CNN) to work on sound textures. We modify the computation of its statistics to fit the properties of sound signals, alter the architecture of the CNN to best fit audio elements present in sound textures and use a time-frequency representation taking both magnitude and phase into account.Les textures sonores sont une catĂ©gorie de sons incluant le bruit de la pluie, le brouhaha dâune foule ou les pĂ©piements dâun groupe dâoiseaux. Tous ces sons contiennent une part dâimprĂ©visibilitĂ© qui nâest habituellement pas recherchĂ©e en synthĂšse sonore, et rend ainsi indispensable lâutilisation dâalgorithmes dĂ©diĂ©s. Cependant, la grande diversitĂ© de leurs propriĂ©tĂ©s complique la crĂ©ation dâun algorithme capable de synthĂ©tiser un large panel de textures. Cette thĂšse sâintĂ©resse Ă la synthĂšse paramĂ©trique de textures sonores. Dans ce paradigme, un ensemble de statistiques sont extraites dâune texture cible et progressivement imposĂ©es sur un bruit blanc. Si lâensemble de statistiques est pertinent, le bruit blanc est alors modifiĂ© jusquâĂ ressembler Ă la cible, donnant lâillusion dâavoir Ă©tĂ© enregistrĂ© quelques instants aprĂšs. Dans un premier temps, nous proposons lâamĂ©lioration dâune mĂ©thode paramĂ©trique basĂ©e sur des statistiques perceptuelles. Cette amĂ©lioration vise Ă amĂ©liorer la synthĂšse dâĂ©vĂšnements Ă forte attaque ou singuliers en modifiant et simplifiant le processus dâimposition. Dans un second temps, nous adaptons une mĂ©thode paramĂ©trique de synthĂšse de textures visuelles basĂ©e sur des statistiques extraites par un rĂ©seau de neurones convolutifs (CNN) afin de lâutiliser sur des textures sonores. Nous modifions lâensemble de statistiques utilisĂ©es afin de mieux correspondre aux propriĂ©tĂ©s des signaux sonores, changeons lâarchitecture du CNN pour lâadapter aux Ă©vĂ©nements prĂ©sents dans les textures sonores et utilisons une reprĂ©sentation temps-frĂ©quence prenant en compte Ă la fois amplitude et phase
Gradient conversion between time and frequency domains using Wirtinger calculus
International audienceGradient descent algorithms are found in a variety of scientific fields, audio signal processing included. This paper presents a new method of converting any gradient of a cost function with respect to a signal into, or from, a gradient with respect to the spectrum of this signal: thus, it allows the gradient descent to be performed indiscriminately in time or frequency domain. For efficiency purposes, and because the gradient of a real function with respect to a complex signal does not formally exist, this work is performed using Wirtinger calculus. An application to sound texture synthesis then experimentally validates this gradient conversion
[Monnaie : Bronze, Ariassos, Pisidie, Caracalla]
Appartient Ă lâensemble documentaire : MonnGr
[Monnaie : Bronze, Pergame, Mysie, Caracalla]
Appartient Ă lâensemble documentaire : MonnGr
Sound texture synthesis using Convolutional Neural Networks
International audienceThe following article introduces a new parametric synthesis algorithm for sound textures inspired by existing methods used for visual textures. Using a 2D Convolutional Neural Network (CNN), a sound signal is modified until the temporal cross-correlations of the feature maps of its log-spectrogram resemble those of a target texture. We show that the resulting synthesized sound signal is both different from the original and of high quality, while being able to reproduce singular events appearing in the original. This process is performed in the time domain, discarding the harmful phase recovery step which usually concludes synthesis performed in the time-frequency domain. It is also straightforward and flexible, as it does not require any fine tuning between several losses when synthesizing diverse sound textures. Synthesized spectrograms and sound signals are showcased, and a way of extending the synthesis in order to produce a sound of any length is also presented. We also discuss the choice of CNN, border effects in our synthesized signals and possible ways of modifying the algorithm in order to improve its current long computation time
[Monnaie : Bronze, Antioche, Pisidie, Caracalla]
Appartient Ă lâensemble documentaire : MonnGr
- âŠ