18 research outputs found

    Creating Latent Spaces for Modern Music Genre Rhythms Using Minimal Training Data

    Get PDF
    In this paper we present R-VAE, a system designed for the exploration of latent spaces of musical rhythms. Unlike most previous work in rhythm modeling, R-VAE can be trained with small datasets, enabling rapid customization and exploration by individual users. R-VAE employs a data representation that encodes simple and compound meter rhythms. To the best of our knowledge, this is the first time that a network architecture has been used to encode rhythms with these characteristics, which are common in some modern popular music genres

    Steering latent audio models through interactive machine learning

    Full text link
    In this paper, we present a proof-of-concept mechanism for steering latent audio models through interactive machine learning. Our approach involves mapping the human-performance space to the high-dimensional, computer-generated latent space of a neural audio model by utilizing a regressive model learned from a set of demonstrative actions. By implementing this method in ideation, exploration, and sound and music performance we have observed its efficiency, flexibility, and immediacy of control over generative audio processes

    Interacting with neural audio synthesis models through interactive machine learning

    Full text link
    Recent advances in neural audio synthesis have made it possible to generate audio signals in real time, enabling the use of applications in musical performance. However, exploring and playing with their high-dimensional spaces remains challenging, as the axes do not necessarily correlate to clear musical labels and may vary from model to model. In this paper, we present a proof-of-concept mechanism for steering latent audio models through interactive machine learning. Our approach involves mapping the human-performance space to the high-dimensional, computer- generated latent space of a neural audio model by utilizing a regressive model learned from a set of demonstrative actions. By implementing this method in ideation, exploration, and sound and music performance we have observed its efficiency, flexibility, and immediacy of control over generative audio processes

    A Small-Data Mindset for Generative AI Creative Work

    Full text link
    In this paper, we argue that working with small-scale datasets is an often-overlooked but powerful mechanism for enabling greater human influence over generative AI (GenAI) systems in creative contexts. We describe some of the benefits of working with small-scale data, and we argue that conventional ways of thinking about the value of large data, such as preventing overfitting, are not always well-matched to creative aims. We discuss how models built with small-scale data can facilitate meaningful creative work, providing examples from text, image, and sound

    Generation and visualization of rhythmic latent spaces

    Full text link
    In this paper we extend R-VAE, a system designed for the modeling and exploration of latent spaces of musical rhythms. R-VAE employs a data representation that encodes simple and compound me- ter rhythms, common in some contemporary popular music genres. It can be trained with small datasets, enabling rapid customization and exploration by individual users. To facilitate the exploration of the la- tent space, we provide R-VAE with a web-based visualizer designed for the dynamic representation of rhythmic latent spaces. To the best of our knowledge, this is the first time that a dynamic visualization has been implemented to observe a latent space learned from rhythmic patterns

    R-VAE: Live latent space drum rhythm generation from minimal-size datasets

    Get PDF
    In this article, we present R-VAE, a system designed for the modeling and exploration of latent spaces learned from rhythms encoded in MIDI clips. The system is based on a variational autoencoder neural network, uses a data structure that is capable of encoding rhythms in simple and compound meter, and can learn models from little training data. To facilitate the exploration of models, we implemented a visualizer that relies on the dynamic nature of the pulsing rhythmic patterns. To test our system in real-life musical practice, we collected small-scale datasets of contemporary music genre rhythms and trained models with them. We found that the non-linearities of the learned latent spaces coupled with tactile interfaces to interact with the models were very expressive and led to unexpected places in musical composition and live performance settings. A music album was recorded and it was premiered at a major music festival using the VAE latent space on stage

    IIIF-based lyric and neume editor for square-notation manuscripts

    Get PDF
    In this paper we introduce a set of improvements to Neon, an online square-notation music editor based on the International Image Interoperability Framework (IIIF) and the Music Encoding Initiative (MEI) file format. The enhancements extend the functionality of Neon to the editing of lyrics and single-session editing of entire manuscripts and lyric editing. We describe a scheme for managing and processing the information necessary for visualizing and editing full manuscripts. A method of concurrently editing the position and content of lyrics is also discussed. We expect these will provide a better user experience when correcting the output of automated optical music recognition workflow

    Interactive Machine Learning for Generative Models

    Full text link
    Effective control of generative media models remains a challenge for specialised generation tasks, including where no suitable dataset to train a contrastive language model exists. We describe a new approach that enables users to interactively create bespoke text-to-media mappings for arbitrary media generation models, using a small number of examples. This approach - very distinct from contrastive language pretraining approaches - facilitates new strategies for using language to drive media creation in creative contexts not well served by existing methods

    SOUNDCATCHER: EXPLORATIONS IN AUDIO-LOOPING AND TIME-FREEZING USING AN OPEN-AIR GESTURAL CONTROLLER

    No full text
    SoundCatcher is an open-air gestural controller designed to control a looper and time-freezing sound patch. It makes use of ultrasonic sensors to measure the distance of the performer’s hands to the device located in a microphone stand. Tactile and visual feedback using a pair of vibrating motors and LEDs are provided to inform the performer when she is inside the sensed space. In addition, the rotational speed of the motors is scaled according to each hand distance to the microphone stand to provide tactile cues about hand position. 1
    corecore