518 research outputs found

    PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

    Full text link
    Music creation is typically composed of two parts: composing the musical score, and then performing the score with instruments to make sounds. While recent work has made much progress in automatic music generation in the symbolic domain, few attempts have been made to build an AI model that can render realistic music audio from musical scores. Directly synthesizing audio with sound sample libraries often leads to mechanical and deadpan results, since musical scores do not contain performance-level information, such as subtle changes in timing and dynamics. Moreover, while the task may sound like a text-to-speech synthesis problem, there are fundamental differences since music audio has rich polyphonic sounds. To build such an AI performer, we propose in this paper a deep convolutional model that learns in an end-to-end manner the score-to-audio mapping between a symbolic representation of music called the piano rolls and an audio representation of music called the spectrograms. The model consists of two subnets: the ContourNet, which uses a U-Net structure to learn the correspondence between piano rolls and spectrograms and to give an initial result; and the TextureNet, which further uses a multi-band residual network to refine the result by adding the spectral texture of overtones and timbre. We train the model to generate music clips of the violin, cello, and flute, with a dataset of moderate size. We also present the result of a user study that shows our model achieves higher mean opinion score (MOS) in naturalness and emotional expressivity than a WaveNet-based model and two commercial sound libraries. We open our source code at https://github.com/bwang514/PerformanceNetComment: 8 pages, 6 figures, AAAI 2019 camera-ready versio

    Discriminating music performers by timbre: On the relation between instrumental gesture, tone quality and perception in classical cello performance

    Get PDF
    Classical music performers use instruments to transform the symbolic notationof the score into sound which is ultimately perceived by a listener. For acoustic instruments, the timbre of the resulting sound is assumed to be strongly linked to the physical and acoustical properties of the instrument itself. However, rather little is known about how much influence the player has over the timbre of the sound — is it possible to discriminate music performers by timbre? This thesis explores player-dependent aspects of timbre, serving as an individual means of musical expression. With a research scope narrowed to analysis of solo cello recordings, the differences in tone quality of six performers who played the same musical excerpts on the same cello are investigated from three different perspectives: perceptual, acoustical and gestural. In order to understand how the physical actions that a performer exerts on an instrument affect spectro-temporal features of the sound produced, which then can be perceived as the player’s unique tone quality, a series of experiments are conducted, starting with the creation of dedicated multi-modal cello recordings extended by performance gesture information (bowing control parameters). In the first study, selected tone samples of six cellists are perceptually evaluated across various musical contexts via timbre dissimilarity and verbal attribute ratings. The spectro-temporal analysis follows in the second experiment, with the aim to identify acoustic features which best describe varying timbral characteristics of the players. Finally, in the third study, individual combinationsof bowing controls are examined in search for bowing patterns which might characterise each cellist regardless of the music being performed. The results show that the different players can be discriminated perceptually, by timbre, and that this perceptual discrimination can be projected back through the acoustical and gestural domains. By extending current understanding of human-instrument dependencies for qualitative tone production, this research may have further applications in computer-aided musical training and performer-informed instrumental sound synthesis.This work was supported by a UK EPSRC DTA studentship EP/P505054/1 and the EPSRC funded OMRAS2 project EP/E017614/1

    Timbre from Sound Synthesis and High-level Control Perspectives

    Get PDF
    International audienceExploring the many surprising facets of timbre through sound manipulations has been a common practice among composers and instrument makers of all times. The digital era radically changed the approach to sounds thanks to the unlimited possibilities offered by computers that made it possible to investigate sounds without physical constraints. In this chapter we describe investigations on timbre based on the analysis by synthesis approach that consists in using digital synthesis algorithms to reproduce sounds and further modify the parameters of the algorithms to investigate their perceptual relevance. In the first part of the chapter timbre is investigated in a musical context. An examination of the sound quality of different wood species for xylophone making is first presented. Then the influence of instrumental control on timbre is described in the case of clarinet and cello performances. In the second part of the chapter, we mainly focus on the identification of sound morphologies, so called invariant sound structures responsible for the evocations induced by environmental sounds by relating basic signal descriptors and timbre descriptors to evocations in the case of car door noises, motor noises, solid objects, and their interactions

    An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

    Get PDF
    In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch. Further exploration of hyper-parameters and regularization techniques is demonstrated to enhance the performance of the initial design. In an unsupervised manner, the network is able to reconstruct a monophonic and harmonic sound based on latent representations. In addition, we introduce an evaluation metric to measure the similarity between the original and reconstructed samples. Evaluating a deep generative model for the synthesis of sound is a challenging task. Our approach is based on the accuracy of the generated frequencies as it presents a significant metric for the perception of harmonic sounds. This work is expected to accelerate future experiments on audio compression using neural autoencoders

    An Interactive Music Synthesizer for Gait Training in Neurorehabilitation

    Get PDF
    (Abstract to follow

    Embodied gestures

    Get PDF
    This is a book about musical gestures: multiple ways to design instruments, compose musical performances, analyze sound objects and represent sonic ideas through the central notion of ‘gesture’. The writers share knowledge on major research projects, musical compositions and methodological tools developed among different disciplines, such as sound art, embodied music cognition, human-computer interaction, performative studies and artificial intelligence. They visualize how similar and compatible are the notions of embodied music cognition and the artistic discourses proposed by musicians working with ‘gesture’ as their compositional material. The authors and editors hope to contribute to the ongoing discussion around creative technologies and music, expressive musical interface design, the debate around the use of AI technology in music practice, as well as presenting a new way of thinking about musical instruments, composing and performing with them

    DMRN+16: Digital Music Research Network One-day Workshop 2021

    Get PDF
    DMRN+16: Digital Music Research Network One-day Workshop 2021 Queen Mary University of London Tuesday 21st December 2021 Keynote speakers Keynote 1. Prof. Sophie Scott -Director, Institute of Cognitive Neuroscience, UCL. Title: "Sound on the brain - insights from functional neuroimaging and neuroanatomy" Abstract In this talk I will use functional imaging and models of primate neuroanatomy to explore how sound is processed in the human brain. I will demonstrate that sound is represented cortically in different parallel streams. I will expand this to show how this can impact on the concept of auditory perception, which arguably incorporates multiple kinds of distinct perceptual processes. I will address the roles that subcortical processes play in this, and also the contributions from hemispheric asymmetries. Keynote 2: Prof. Gus Xia - Assistant Professor at NYU Shanghai Title: "Learning interpretable music representations: from human stupidity to artificial intelligence" Abstract Gus has been leading the Music X Lab in developing intelligent systems that help people better compose and learn music. In this talk, he will show us the importance of music representation for both humans and machines, and how to learn better music representations via the design of inductive bias. Once we got interpretable music representations, the potential applications are limitless
    • …
    corecore