Automated Rhythmic Transformation of Drum Recordings

Abstract

Within the creative industries, music information retrieval techniques are now being applied in a variety of music creation and production applications. Audio artists incorporate techniques from music informatics and machine learning (e.g., beat and metre detection) for generative content creation and manipulation systems within the music production setting. Here musicians, desiring a certain sound or aesthetic influenced by the style of artists they admire, may change or replace the rhythmic pattern and sound characteristics (i.e., timbre) of drums in their recordings with those from an idealised recording (e.g., in processes of redrumming and mashup creation). Automated transformation systems for rhythm and timbre can be powerful tools for music producers, allowing them to quickly and easily adjust the different elements of a drum recording to fit the overall style of a song. The aim of this thesis is to develop systems for automated transformation of rhythmic patterns of drum recordings using a subset of techniques from deep learning called deep generative models (DGM) for neural audio synthesis. DGMs such as autoencoders and generative adversarial networks have been shown to be effective for transforming musical signals in a variety of genres as well as for learning the underlying structure of datasets for generation of new audio examples. To this end, modular deep learning-based systems are presented in this thesis with evaluations which measure the extent of the rhythmic modifications generated by different modes of transformation, which include audio style transfer, drum translation and latent space manipulation. The evaluation results underscore both the strengths and constraints of DGMs for transformation of rhythmic patterns as well as neural synthesis of drum sounds within a variety of musical genres. New audio style transfer (AST) functions were specifically designed for mashup-oriented drum recording transformation. The designed loss objectives lowered the computational demands of the AST algorithm and offered rhythmic transformation capabilities which adhere to a larger rhythmic structure of the input to generate music that is both creative and realistic. To extend the transformation possibilities of DGMs, systems based on adversarial autoencoders (AAE) were proposed for drum translation and continuous rhythmic transformation of bar-length patterns. The evaluations which investigated the lower dimensional representations of the latent space of the proposed system based on AAEs with a Gaussian mixture prior (AAE-GM) highlighted the importance of the structure of the disentangled latent distributions of AAE-GM. Furthermore, the proposed system demonstrated improved performance, as evidenced by higher reconstruction metrics, when compared to traditional autoencoder models. This implies that the system can more accurately recreate complex drum sounds, ensuring that the produced rhythmic transformation maintains richness of the source material. For music producers, this means heightened fidelity in drum synthesis and the potential for more expressive and varied drum tracks, enhancing the creativity in music production. This work also enhances neural drum synthesis by introducing a new, diverse dataset of kick, snare, and hi-hat drum samples, along with multiple drum loop datasets for model training and evaluation. Overall, the work in this thesis increased the profile of the field and hopefully will attract more attention and resources to the area, which will help drive future research and development of neural rhythmic transformation systems

    Similar works