37,417 research outputs found
DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks
Synthetic creation of drum sounds (e.g., in drum machines) is commonly
performed using analog or digital synthesis, allowing a musician to sculpt the
desired timbre modifying various parameters. Typically, such parameters control
low-level features of the sound and often have no musical meaning or perceptual
correspondence. With the rise of Deep Learning, data-driven processing of audio
emerges as an alternative to traditional signal processing. This new paradigm
allows controlling the synthesis process through learned high-level features or
by conditioning a model on musically relevant information. In this paper, we
apply a Generative Adversarial Network to the task of audio synthesis of drum
sounds. By conditioning the model on perceptual features computed with a
publicly available feature-extractor, intuitive control is gained over the
generation process. The experiments are carried out on a large collection of
kick, snare, and cymbal sounds. We show that, compared to a specific prior work
based on a U-Net architecture, our approach considerably improves the quality
of the generated drum samples, and that the conditional input indeed shapes the
perceptual characteristics of the sounds. Also, we provide audio examples and
release the code used in our experiments.Comment: 8 pages, 1 figure, 3 tables, accepted in Proc. of the 21st
International Society for Music Information Retrieval (ISMIR2020
Sketching sonic interactions by imitation-driven sound synthesis
Sketching is at the core of every design activity. In visual design, pencil and paper are the preferred tools to produce sketches for their simplicity and immediacy. Analogue tools for sonic sketching do not exist yet, although voice and gesture are embodied abilities commonly exploited to communicate sound concepts. The EU project SkAT-VG aims to support vocal sketching with computeraided technologies that can be easily accessed, understood and controlled through vocal and gestural imitations. This imitation-driven sound synthesis approach is meant to overcome the ephemerality and timbral limitations of human voice and gesture, allowing to produce more refined sonic sketches and to think about sound in a more designerly way. This paper presents two main outcomes of the project: The Sound Design Toolkit, a palette of basic sound synthesis models grounded on ecological perception and physical description of sound-producing phenomena, and SkAT-Studio, a visual framework based on sound design workflows organized in stages of input, analysis, mapping, synthesis, and output. The integration of these two software packages provides an environment in which sound designers can go from concepts, through exploration and mocking-up, to prototyping in sonic interaction design, taking advantage of all the possibilities of- fered by vocal and gestural imitations in every step of the process
Audio style transfer
'Style transfer' among images has recently emerged as a very active research
topic, fuelled by the power of convolution neural networks (CNNs), and has
become fast a very popular technology in social media. This paper investigates
the analogous problem in the audio domain: How to transfer the style of a
reference audio signal to a target audio content? We propose a flexible
framework for the task, which uses a sound texture model to extract statistics
characterizing the reference audio style, followed by an optimization-based
audio texture synthesis to modify the target content. In contrast to mainstream
optimization-based visual transfer method, the proposed process is initialized
by the target content instead of random noise and the optimized loss is only
about texture, not structure. These differences proved key for audio style
transfer in our experiments. In order to extract features of interest, we
investigate different architectures, whether pre-trained on other tasks, as
done in image style transfer, or engineered based on the human auditory system.
Experimental results on different types of audio signal confirm the potential
of the proposed approach.Comment: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), Apr 2018, Calgary, France. IEE
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- …