188 research outputs found
MorpheuS: Generating Structured Music with Constrained Patterns and Tension
Automatic music generation systems have gained in popularity and
sophistication as advances in cloud computing have enabled large-scale complex
computations such as deep models and optimization algorithms on personal
devices. Yet, they still face an important challenge, that of long-term
structure, which is key to conveying a sense of musical coherence. We present
the MorpheuS music generation system designed to tackle this problem. MorpheuS'
novel framework has the ability to generate polyphonic pieces with a given
tension profile and long- and short-term repeated pattern structures. A
mathematical model for tonal tension quantifies the tension profile and
state-of-the-art pattern detection algorithms extract repeated patterns in a
template piece. An efficient optimization metaheuristic, variable neighborhood
search, generates music by assigning pitches that best fit the prescribed
tension profile to the template rhythm while hard constraining long-term
structure through the detected patterns. This ability to generate affective
music with specific tension profile and long-term structure is particularly
useful in a game or film music context. Music generated by the MorpheuS system
has been performed live in concerts.Comment: IEEE Transactions on Affective Computing. PP(99
Generating structured music for bagana using quality metrics based on Markov models.
This research is partially supported by the project Lrn2Cre8 which acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET Grant No. 610859
A Functional Taxonomy of Music Generation Systems
Digital advances have transformed the face of automatic music generation
since its beginnings at the dawn of computing. Despite the many breakthroughs,
issues such as the musical tasks targeted by different machines and the degree
to which they succeed remain open questions. We present a functional taxonomy
for music generation systems with reference to existing systems. The taxonomy
organizes systems according to the purposes for which they were designed. It
also reveals the inter-relatedness amongst the systems. This design-centered
approach contrasts with predominant methods-based surveys and facilitates the
identification of grand challenges to set the stage for new breakthroughs.Comment: survey, music generation, taxonomy, functional survey, survey,
automatic composition, algorithmic compositio
The effect of spectrogram reconstructions on automatic music transcription: an alternative approach to improve transcription accuracy
Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the final transcription. We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks. In this paper, we do not aim at achieving state-of-the-art transcription accuracy, instead, we explore the effect that spectrogram reconstruction has on our AMT model. Our proposed model consists of two U-nets: the first U-net transcribes the spectrogram into a posteriorgram, and a second U-net transforms the posteriorgram back into a spectrogram. A reconstruction loss is applied between the original spectrogram and the reconstructed spectrogram to constrain the second U-net to focus only on reconstruction. We train our model on three different datasets: MAPS, MAESTRO, and MusicNet. Our experiments show that adding the reconstruction loss can generally improve the note-level transcription accuracy when compared to the same model without the reconstruction part. Moreover, it can also boost the frame-level precision to be higher than the state-of-the-art models. The feature maps learned by our U-net contain gridlike structures (not present in the baseline model) which implies that with the presence of the reconstruction loss, the model is probably trying to count along both the time and frequency axis, resulting in a higher note-level transcription accuracy
- …