2,946 research outputs found
Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings
In this paper we introduce multi-label ferns, and apply this technique for
automatic classification of musical instruments in audio recordings. We compare
the performance of our proposed method to a set of binary random ferns, using
jazz recordings as input data. Our main result is obtaining much faster
classification and higher F-score. We also achieve substantial reduction of the
model size
Frequency shifting approach towards textual transcription of heartbeat sounds
Auscultation is an approach for diagnosing many cardiovascular problems. Automatic analysis of heartbeat sounds and extraction of its audio features can assist physicians towards diagnosing diseases. Textual transcription allows recording a continuous heart sound stream using a text format which can be stored in very small memory in comparison with other audio formats. In addition, a text-based data allows applying indexing and searching techniques to access to the critical events. Hence, the transcribed heartbeat sounds provides useful information to monitor the behavior of a patient for the long duration of time. This paper proposes a frequency shifting method in order to improve the performance of the transcription. The main objective of this study is to transfer the heartbeat sounds to the music domain. The proposed technique is tested with 100 samples which were recorded from different heart diseases categories. The observed results show that, the proposed shifting method significantly improves the performance of the transcription
Invariances and Data Augmentation for Supervised Music Transcription
This paper explores a variety of models for frame-based music transcription,
with an emphasis on the methods needed to reach state-of-the-art on human
recordings. The translation-invariant network discussed in this paper, which
combines a traditional filterbank with a convolutional neural network, was the
top-performing model in the 2017 MIREX Multiple Fundamental Frequency
Estimation evaluation. This class of models shares parameters in the
log-frequency domain, which exploits the frequency invariance of music to
reduce the number of model parameters and avoid overfitting to the training
data. All models in this paper were trained with supervision by labeled data
from the MusicNet dataset, augmented by random label-preserving pitch-shift
transformations.Comment: 6 page
VGM-RNN: Recurrent Neural Networks for Video Game Music Generation
The recent explosion of interest in deep neural networks has affected and in some cases reinvigorated work in fields as diverse as natural language processing, image recognition, speech recognition and many more. For sequence learning tasks, recurrent neural networks and in particular LSTM-based networks have shown promising results. Recently there has been interest – for example in the research by Google’s Magenta team – in applying so-called “language modeling” recurrent neural networks to musical tasks, including for the automatic generation of original music. In this work we demonstrate our own LSTM-based music language modeling recurrent network. We show that it is able to learn musical features from a MIDI dataset and generate output that is musically interesting while demonstrating features of melody, harmony and rhythm. We source our dataset from VGMusic.com, a collection of user-submitted MIDI transcriptions of video game songs, and attempt to generate output which emulates this kind of music
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Generating music has a few notable differences from generating images and
videos. First, music is an art of time, necessitating a temporal model. Second,
music is usually composed of multiple instruments/tracks with their own
temporal dynamics, but collectively they unfold over time interdependently.
Lastly, musical notes are often grouped into chords, arpeggios or melodies in
polyphonic music, and thereby introducing a chronological ordering of notes is
not naturally suitable. In this paper, we propose three models for symbolic
multi-track music generation under the framework of generative adversarial
networks (GANs). The three models, which differ in the underlying assumptions
and accordingly the network architectures, are referred to as the jamming
model, the composer model and the hybrid model. We trained the proposed models
on a dataset of over one hundred thousand bars of rock music and applied them
to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings.
A few intra-track and inter-track objective metrics are also proposed to
evaluate the generative results, in addition to a subjective user study. We
show that our models can generate coherent music of four bars right from
scratch (i.e. without human inputs). We also extend our models to human-AI
cooperative music generation: given a specific track composed by human, we can
generate four additional tracks to accompany it. All code, the dataset and the
rendered audio samples are available at https://salu133445.github.io/musegan/ .Comment: to appear at AAAI 201
- …