531 research outputs found

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Music Encoding Conference Proceedings 2021, 19–22 July, 2021 University of Alicante (Spain): Onsite & Online

    Get PDF
    Este documento incluye los artículos y pósters presentados en el Music Encoding Conference 2021 realizado en Alicante entre el 19 y el 22 de julio de 2022.Funded by project Multiscore, MCIN/AEI/10.13039/50110001103

    Dynamical and topological tools for (modern) music analysis

    Get PDF
    Is it possible to represent the horizontal motions of the melodic strands of a contrapuntal composition, or the main ideas of a jazz standard as mathematical entities? In this work, we suggest a collection of novel models for the representation of music that are endowed with two main features. First, they originate from a topological and geometrical inspiration; second, their low dimensionality allows to build simple and informative visualisations. Here, we tackle the problem of music representation following three non-orthogonal directions. We suggest a formalisation of the concept of voice leading (the assignment of an instrument to each voice in a sequence of chords) suggesting a horizontal viewpoint on music, constituted by the simultaneous motions of superposed melodies. This formalisation naturally leads to the interpretation of counterpoint as a multivariate time series of partial permutation matrices, whose observations are characterised by a degree of complexity. After providing both a static and a dynamic representation of counterpoint, voice leadings are reinterpreted as a special class of partial singular braids (paths in the Euclidean space), and their main features are visualised as geometric configurations of collections of 3-dimensional strands. Thereafter, we neglect this time-related information, in order to reduce the problem to the study of vertical musical entities. The model we propose is derived from a topological interpretation of the Tonnetz (a graph commonly used in computational musicology) and the deformation of its vertices induced by a harmonic and a consonance-oriented function, respectively. The 3-dimensional shapes derived from these deformations are classified using the formalism of persistent homology. This powerful topological technique allows to compute a fingerprint of a shape, that reflects its persistent geometrical and topological properties. Furthermore, it is possible to compute a distance between these fingerprints and hence study their hierarchical organisation. This particular feature allows us to tackle the problem of automatic classification of music in an innovative way. Thus, this novel representation of music is evaluated on a collection of heterogenous musical datasets. Finally, a combination of the two aforementioned approaches is proposed. A model at the crossroad between the signal and symbolic analysis of music uses multiple sequences alignment to provide an encompassing, novel viewpoint on the musical inspiration transfer among compositions belonging to different artists, genres and time. To conclude, we shall represent music as a time series of topological fingerprints, whose metric nature allows to compare pairs of time-varying shapes in both topological and in musical terms. In particular the dissimilarity scores computed by aligning such sequences shall be applied both to the analysis and classification of music

    Getting aligned on representational alignment

    Full text link
    Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions.Comment: Working paper, changes to be made in upcoming revision

    Separation of musical sources and structure from single-channel polyphonic recordings

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A User-assisted Approach to Multiple Instrument Music Transcription

    Get PDF
    PhDThe task of automatic music transcription has been studied for several decades and is regarded as an enabling technology for a multitude of applications such as music retrieval and discovery, intelligent music processing and large-scale musicological analyses. It refers to the process of identifying the musical content of a performance and representing it in a symbolic format. Despite its long research history, fully automatic music transcription systems are still error prone and often fail when more complex polyphonic music is analysed. This gives rise to the question in what ways human knowledge can be incorporated in the transcription process. This thesis investigates ways to involve a human user in the transcription process. More specifically, it is investigated how user input can be employed to derive timbre models for the instruments in a music recording, which are employed to obtain instrument-specific (parts-based) transcriptions. A first investigation studies different types of user input in order to derive instrument models by means of a non-negative matrix factorisation framework. The transcription accuracy of the different models is evaluated and a method is proposed that refines the models by allowing each pitch of each instrument to be represented by multiple basis functions. A second study aims at limiting the amount of user input to make the method more applicable in practice. Different methods are considered to estimate missing non-negative basis functions when only a subset of basis functions can be extracted based on the user information. A method is proposed to track the pitches of individual instruments over time by means of a Viterbi framework in which the states at each time frame contain several candidate instrument-pitch combinations. A transition probability is employed that combines three different criteria: the frame-wise reconstruction error of each combination, a pitch continuity measure that favours similar pitches in consecutive frames, and an explicit activity model for each instrument. The method is shown to outperform other state-of-the-art multi-instrument tracking methods. Finally, the extraction of instrument models that include phase information is investigated as a step towards complex matrix decomposition. The phase relations between the partials of harmonic sounds are explored as a time-invariant property that can be employed to form complex-valued basis functions. The application of the model for a user-assisted transcription task is illustrated with a saxophone example.QMU

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Music Encoding Conference Proceedings

    Get PDF
    UIDB/00693/2020 UIDP/00693/2020publishersversionpublishe
    corecore