423 research outputs found

    Performance Following: Real-Time Prediction of Musical Sequences Without a Score

    Get PDF
    (c)2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

    VGM-RNN: Recurrent Neural Networks for Video Game Music Generation

    Get PDF
    The recent explosion of interest in deep neural networks has affected and in some cases reinvigorated work in fields as diverse as natural language processing, image recognition, speech recognition and many more. For sequence learning tasks, recurrent neural networks and in particular LSTM-based networks have shown promising results. Recently there has been interest – for example in the research by Google’s Magenta team – in applying so-called “language modeling” recurrent neural networks to musical tasks, including for the automatic generation of original music. In this work we demonstrate our own LSTM-based music language modeling recurrent network. We show that it is able to learn musical features from a MIDI dataset and generate output that is musically interesting while demonstrating features of melody, harmony and rhythm. We source our dataset from VGMusic.com, a collection of user-submitted MIDI transcriptions of video game songs, and attempt to generate output which emulates this kind of music

    Mustango: Toward Controllable Text-to-Music Generation

    Full text link
    With recent advancements in text-to-audio and text-to-music based on latent diffusion models, the quality of generated content has been reaching new heights. The controllability of musical aspects, however, has not been explicitly explored in text-to-music systems yet. In this paper, we present Mustango, a music-domain-knowledge-inspired text-to-music system based on diffusion, that expands the Tango text-to-audio model. Mustango aims to control the generated music, not only with general text captions, but from more rich captions that could include specific instructions related to chords, beats, tempo, and key. As part of Mustango, we propose MuNet, a Music-Domain-Knowledge-Informed UNet sub-module to integrate these music-specific features, which we predict from the text prompt, as well as the general text embedding, into the diffusion denoising process. To overcome the limited availability of open datasets of music with text captions, we propose a novel data augmentation method that includes altering the harmonic, rhythmic, and dynamic aspects of music audio and using state-of-the-art Music Information Retrieval methods to extract the music features which will then be appended to the existing descriptions in text format. We release the resulting MusicBench dataset which contains over 52K instances and includes music-theory-based descriptions in the caption text. Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models in terms of desired chords, beat, key, and tempo, on multiple datasets

    Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis

    Get PDF
    ï»żMusiksignale bestehen in der Regel aus einer Überlagerung mehrerer Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music Information Retrieval (MIR) versuchen, semantische Information direkt aus diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde hĂ€ufig beobachtet, dass die LeistungsfĂ€higkeit dieser Algorithmen durch die SignalĂŒberlagerungen und den daraus resultierenden Informationsverlust generell limitiert ist. Ein möglicher Lösungsansatz besteht darin, mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der Analyse klanglich zu isolieren. Die LeistungsfĂ€higkeit dieser Algorithmen ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine sehr gute Trennung der Einzelquellen zu ermöglichen. In dieser Arbeit werden daher ausschließlich isolierte Instrumentalaufnahmen untersucht, die klanglich nicht von anderen Instrumenten ĂŒberlagert sind. Exemplarisch werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein Algorithmus vorgestellt, der eine automatische Transkription von Bassgitarrenaufnahmen durchfĂŒhrt. Dabei wird das Audiosignal durch verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf dem Instrument entsprechen. Neben den ĂŒblichen Notenparametern Anfang, Dauer, LautstĂ€rke und Tonhöhe werden dabei auch instrumentenspezifische Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand zweier neu erstellter AudiodatensĂ€tze belegen, dass der vorgestellte Transkriptionsalgorithmus auf einem Datensatz von realistischen Bassgitarrenaufnahmen eine höhere Erkennungsgenauigkeit erreichen kann als drei existierende Algorithmen aus dem Stand der Technik. Die SchĂ€tzung der instrumentenspezifischen Parameter kann insbesondere fĂŒr isolierte Einzelnoten mit einer hohen GĂŒte durchgefĂŒhrt werden.Im zweiten Teil der Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich wieder- holender Basslinien auf das Musikgenre geschlossen werden kann. Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale, rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene AnsĂ€tze fĂŒr die automatische Genreklassifikation verglichen. Dabei zeigte sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur Anhand der Analyse der Basslinie eines MusikstĂŒckes bereits eine mittlere Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der originalen Bassspuren basierend auf den extrahierten Notenparametern wird im dritten Teil der Arbeit untersucht. Dabei wird ein neuer Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des Physical Modeling verschiedene Aspekte der fĂŒr die Bassgitarre charakteristische Klangerzeugung wie Saitenanregung, DĂ€mpfung, Kollision zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet. Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen Parameter zu ĂŒbertragen um sie auf Dekoderseite wieder zu resynthetisieren. Die Ergebnisse mehrerer Hötest belegen, dass der vorgeschlagene Synthesealgorithmus eine Re- Synthese von Bassgitarrenaufnahmen mit einer besseren KlangqualitĂ€t ermöglicht als die Übertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die auf sehr geringe Bitraten ein gestellt sind.Music recordings most often consist of multiple instrument signals, which overlap in time and frequency. In the field of Music Information Retrieval (MIR), existing algorithms for the automatic transcription and analysis of music recordings aim to extract semantic information from mixed audio signals. In the last years, it was frequently observed that the algorithm performance is limited due to the signal interference and the resulting loss of information. One common approach to solve this problem is to first apply source separation algorithms to isolate the present musical instrument signals before analyzing them individually. The performance of source separation algorithms strongly depends on the number of instruments as well as on the amount of spectral overlap.In this thesis, isolated instrumental tracks are analyzed in order to circumvent the challenges of source separation. Instead, the focus is on the development of instrument-centered signal processing algorithms for music transcription, musical analysis, as well as sound synthesis. The electric bass guitar is chosen as an example instrument. Its sound production principles are closely investigated and considered in the algorithmic design.In the first part of this thesis, an automatic music transcription algorithm for electric bass guitar recordings will be presented. The audio signal is interpreted as a sequence of sound events, which are described by various parameters. In addition to the conventionally used score-level parameters note onset, duration, loudness, and pitch, instrument-specific parameters such as the applied instrument playing techniques and the geometric position on the instrument fretboard will be extracted. Different evaluation experiments confirmed that the proposed transcription algorithm outperformed three state-of-the-art bass transcription algorithms for the transcription of realistic bass guitar recordings. The estimation of the instrument-level parameters works with high accuracy, in particular for isolated note samples.In the second part of the thesis, it will be investigated, whether the sole analysis of the bassline of a music piece allows to automatically classify its music genre. Different score-based audio features will be proposed that allow to quantify tonal, rhythmic, and structural properties of basslines. Based on a novel data set of 520 bassline transcriptions from 13 different music genres, three approaches for music genre classification were compared. A rule-based classification system could achieve a mean class accuracy of 64.8 % by only taking features into account that were extracted from the bassline of a music piece.The re-synthesis of a bass guitar recordings using the previously extracted note parameters will be studied in the third part of this thesis. Based on the physical modeling of string instruments, a novel sound synthesis algorithm tailored to the electric bass guitar will be presented. The algorithm mimics different aspects of the instrument’s sound production mechanism such as string excitement, string damping, string-fret collision, and the influence of the electro-magnetic pickup. Furthermore, a parametric audio coding approach will be discussed that allows to encode and transmit bass guitar tracks with a significantly smaller bit rate than conventional audio coding algorithms do. The results of different listening tests confirmed that a higher perceptual quality can be achieved if the original bass guitar recordings are encoded and re-synthesized using the proposed parametric audio codec instead of being encoded using conventional audio codecs at very low bit rate settings

    Cortico-cerebellar audio-motor regions coordinate self and other in musical joint action

    Get PDF
    Joint music performance requires flexible sensorimotor coordination between self and other. Cognitive and sensory parameters of joint action—such as shared knowledge or temporal (a)synchrony—influence this coordination by shifting the balance between self-other segregation and integration. To investigate the neural bases of these parameters and their interaction during joint action, we asked pianists to play on an MR-compatible piano, in duet with a partner outside of the scanner room. Motor knowledge of the partner’s musical part and the temporal compatibility of the partner’s action feedback were manipulated. First, we found stronger activity and functional connectivity within cortico-cerebellar audio-motor networks when pianists had practiced their partner’s part before. This indicates that they simulated and anticipated the auditory feedback of the partner by virtue of an internal model. Second, we observed stronger cerebellar activity and reduced behavioral adaptation when pianists encountered subtle asynchronies between these model-based anticipations and the perceived sensory outcome of (familiar) partner actions, indicating a shift towards self-other segregation. These combined findings demonstrate that cortico-cerebellar audio-motor networks link motor knowledge and other-produced sounds depending on cognitive and sensory factors of the joint performance, and play a crucial role in balancing self-other integration and segregation

    Virtuosity in Computationally Creative Musical Performance for Bass Guitar

    Get PDF
    This thesis focuses on the development and implementation of a theory for a computationally creative musical performance system aimed at producing virtuosic interpretations of musical pieces for performance on bass guitar. This theory has been developed and formalised using Wiggins’ Creative Systems Framework (CSF) and uses case-base reasoning (CBR) and an engagement-reflection cycle to adorn monophonic musical note sequences with explicit performance directions, selected to maximise the virtuosity when performed using a bass guitar. A survey of 497 bass players’ playing competences was conducted and used to develop a playing complexity rating for adorned musical pieces. Measures of musical similarity used within the case-base reasoning were assessed by a listening test of 12 participants. A study into the perceived difficulty of bass performances was also conducted and an appropriate model of perceived bass playing difficulty determined. The complexity rating and perceived playing difficulties are utilised within the heuristic used by the system to determine what performances are considered to be virtuosic. The output of the system was rendered on a digital waveguide model of an electric bass, that was updated with newly developed digital waveguide synthesis methods for advanced bass guitar playing techniques. These audio renderings were evaluated with a perceptual study of 60 participants, the results of which were used to validate the heuristic used within the system. This research makes contribution to the fields of Computational Creativity (CC), AI Music Creativity, Music Information Retrieval and Musicology. It demonstrates how the CSF can be used as a tool to aid in designing computationally creative musical performance systems, provides a method to assess musical complexity and perceived difficulty of bass guitar performances, tested a suitable musical similarity measure for use within creative systems, and made advances in bass guitar digital waveguide synthesis methods

    Networks of Liveness in Singer-Songwriting: A practice-based enquiry into developing audio-visual interactive systems and creative strategies for composition and performance.

    Get PDF
    This enquiry explores the creation and use of computer-based, real-time interactive audio-visual systems for the composition and performance of popular music by solo artists. Using a practice-based methodology, research questions are identified that relate to the impact of incorporating interactive systems into the songwriting process and the liveness of the performances with them. Four approaches to the creation of interactive systems are identified: creating explorative-generative tools, multiple tools for guitar/vocal pieces, typing systems and audio-visual metaphors. A portfolio of ten pieces that use these approaches was developed for live performance. A model of the songwriting process is presented that incorporates system-building and strategies are identified for reconciling the indeterminate, electronic audio output of the system with composed popular music features and instrumental/vocal output. The four system approaches and ten pieces are compared in terms of four aspects of liveness, derived from current theories. It was found that, in terms of overall liveness, a unity to system design facilitated both technological and aesthetic connections between the composition, the system processes and the audio and visual outputs. However, there was considerable variation between the four system approaches in terms of the different aspects of liveness. The enquiry concludes by identifying strategies for maximising liveness in the different system approaches and discussing the connections between liveness and the songwriting process
    • 

    corecore