192 research outputs found

    The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

    Full text link
    With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.Comment: 8 pages, 13 figure

    Creating Musical Scores Inspired by the Intersection of Human Speech and Music Through Model-Based Cross Synthesis

    Get PDF
    This research addresses the development of machine learning techniques used to create musical scores and performances that are inspired by the intersection of speech and music. Machine learning models are created from MIDI files that are transcribed from datasets of musical audio recordings and human speech audio recordings. Through the creation of succinct models, model based cross synthesis is possible. Models trained on musical MIDI data are asked to replicate MIDI data that approximate human speech. Alternatively, models that have been trained on MIDI data that approximate speech are asked to replicate musical MIDI data. The product of these developed techniques is a collection of piano music, Seven Piano Etudes Speaks the Moody Machine. These etudes are intended to be performed on one Yamaha Disklavier piano with two performers, one human pianist and one machine player piano

    The Sticky Riff: Quantifying the Melodic Identities of Medieval Modes

    Get PDF
    Andrew Hughes' Late Medieval Liturgical Offices afforded chant scholarship more melodies than it knew what to do with. Until now, chant scholarship involving 'Big Data' usually meant comparing individual feasts to the whole corpus or looking at general trends with respect to 'word painting' or stereotyped cadences. New research presented here, using n-gram analysis, networks, and Recurrent Neural Networks (RNN) looks to the nature of the gestural components of the melodies themselves. By isolating the notes preceding, and proceeding from, the naturally occurring semitones in the medieval church modes, we find significant recurrence of particular phrases, or riffs, which we propose could have been used to help 'build modes' from the inside out. Special care needed to be brought to the question of assumed B-flats that were not given explicitly in the manuscripts represented in Hughes' work. Understanding modes not as 'scales' but as a collection of associated smaller musical gestures, has resulted in a set of recurring riffs that appear as the identifiers of their larger contexts and confirming the influence of an earlier, oral / aural culture on these late medieval chants where musical literacy was expected.

    An Industry Driven Genre Classification Application using Natural Language Processing

    Get PDF
    With the advent of digitized music, many online streaming companies such as Spotify have capitalized on a listener’s need for a common stream platform. An essential component of such a platform is the recommender systems that suggest to the constituent user base, related tracks, albums and artists. In order to sustain such a recommender system, labeling data to indicate which genre it belongs to is essential. Most recent academic publications that deal with music genre classification focus on the use of deep neural networks developed and applied within the music genre classification domain. This thesis attempts to use some of the highly sophisticated techniques, such as Hierarchical Attention Networks that exist within the text classification domain in order to classify tracks of different genres. In order to do this, the music is first separated into different tracks (drums, vocals, bass and accompaniment) and converted into symbolic text data. Due to the sophistication of the distributed machine learning system (over five computers, each possessing a graphical processing units greater than a GTX 1070) present in this thesis, it is capable of classifying contemporary genres with an impressive peak accuracy of over 93%, when comparing the results with that of competing classifiers. It is also argued that through the use text classification, the ex- pert domain knowledge which musicians and people involved with musicological techniques, can be attracted to improving reccomender systems within the music information retrieval research domain
    • …
    corecore