192 research outputs found
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
With recent breakthroughs in artificial neural networks, deep generative
models have become one of the leading techniques for computational creativity.
Despite very promising progress on image and short sequence generation,
symbolic music generation remains a challenging problem since the structure of
compositions are usually complicated. In this study, we attempt to solve the
melody generation problem constrained by the given chord progression. This
music meta-creation problem can also be incorporated into a plan recognition
system with user inputs and predictive structural outputs. In particular, we
explore the effect of explicit architectural encoding of musical structure via
comparing two sequential generative models: LSTM (a type of RNN) and WaveNet
(dilated temporal-CNN). As far as we know, this is the first study of applying
WaveNet to symbolic music generation, as well as the first systematic
comparison between temporal-CNN and RNN for music generation. We conduct a
survey for evaluation in our generations and implemented Variable Markov Oracle
in music pattern discovery. Experimental results show that to encode structure
more explicitly using a stack of dilated convolution layers improved the
performance significantly, and a global encoding of underlying chord
progression into the generation procedure gains even more.Comment: 8 pages, 13 figure
Creating Musical Scores Inspired by the Intersection of Human Speech and Music Through Model-Based Cross Synthesis
This research addresses the development of machine learning techniques used to create musical scores and performances that are inspired by the intersection of speech and music. Machine learning models are created from MIDI files that are transcribed from datasets of musical audio recordings and human speech audio recordings. Through the creation of succinct models, model based cross synthesis is possible. Models trained on musical MIDI data are asked to replicate MIDI data that approximate human speech. Alternatively, models that have been trained on MIDI data that approximate speech are asked to replicate musical MIDI data. The product of these developed techniques is a collection of piano music, Seven Piano Etudes Speaks the Moody Machine. These etudes are intended to be performed on one Yamaha Disklavier piano with two performers, one human pianist and one machine player piano
The Sticky Riff: Quantifying the Melodic Identities of Medieval Modes
Andrew Hughes' Late Medieval Liturgical Offices afforded chant scholarship more melodies than it knew what to do with. Until now, chant scholarship involving 'Big Data' usually meant comparing individual feasts to the whole corpus or looking at general trends with respect to 'word painting' or stereotyped cadences. New research presented here, using n-gram analysis, networks, and Recurrent Neural Networks (RNN) looks to the nature of the gestural components of the melodies themselves. By isolating the notes preceding, and proceeding from, the naturally occurring semitones in the medieval church modes, we find significant recurrence of particular phrases, or riffs, which we propose could have been used to help 'build modes' from the inside out. Special care needed to be brought to the question of assumed B-flats that were not given explicitly in the manuscripts represented in Hughes' work. Understanding modes not as 'scales' but as a collection of associated smaller musical gestures, has resulted in a set of recurring riffs that appear as the identifiers of their larger contexts and confirming the influence of an earlier, oral / aural culture on these late medieval chants where musical literacy was expected.
An Industry Driven Genre Classification Application using Natural Language Processing
With the advent of digitized music, many online streaming companies such as Spotify have capitalized on a listener’s need for a common stream platform. An essential component of such a platform is the recommender systems that suggest to the constituent user base, related tracks, albums and artists. In order to sustain such a recommender system, labeling data to indicate which genre it belongs to is essential. Most recent academic publications that deal with music genre classification focus on the use of deep neural networks developed and applied within the music genre classification domain. This thesis attempts to use some of the highly sophisticated techniques, such as Hierarchical Attention Networks that exist within the text classification domain in order to classify tracks of different genres. In order to do this, the music is first separated into different tracks (drums, vocals, bass and accompaniment) and converted into symbolic text data. Due to the sophistication of the distributed machine learning system (over five computers, each possessing a graphical processing units greater than a GTX 1070) present in this thesis, it is capable of classifying contemporary genres with an impressive peak accuracy of over 93%, when comparing the results with that of competing classifiers. It is also argued that through the use text classification, the ex- pert domain knowledge which musicians and people involved with musicological techniques, can be attracted to improving reccomender systems within the music information retrieval research domain
Don't hide in the frames: Note- and pattern-based evaluation of automated melody extraction algorithms
International audienc
- …