2,524 research outputs found

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    A Deep Learning Based Tool For Ear Training

    Get PDF
    L'objectiu principal d'aquest projecte és utilitzar tècniques d'aprenentatge profund per desenvolupar una eina capaç de generar exercicis de dictat melòdic significatius perquè els professors de música i els seus estudiants els utilitzin i practiquin. Un exercici significatiu seria aquell que segueix una progressió d'acords i té un cert grau de musicalitat. Per aconseguir això, el conjunt de dades de Nottingham es va preprocessar i formatar acuradament de tal manera que només es van retenir les característiques més rellevants abans d'alimentar-les al model. Després d'ajustar els hiperparàmetres d'un model amb una arquitectura similar a la d'un model GPT (Transformador Generatiu Pre-entrenat), els models amb millor rendiment es van avaluar utilitzant mètriques sintètiques així com per professors amb experiència en música. Encara que els resultats no van complir amb les expectatives desitjades a causa de certes decisions preses durant la fase de preprocessament, el projecte ofereix valuoses idees i recomanacions per continuar progressant en el desenvolupament d'un sistema que satisfaci eficaçment les necessitats dels professors i estudiants de música.The primary objective of this thesis is to utilize deep learning techniques to develop a tool capable of generating meaningful melodic dictation exercises for music teachers and their students to use and practice with. A meaningful exercise would be one that follows a chord progression and has a certain degree of musicality. To achieve this, the Nottingham dataset was carefully preprocessed and formatted in such a way that only the most relevant characteristics were retained before being fed into the model. After fine-tuning the hyperparameters of a model with an architecture resembling that of a GPT (Generative Pre-trained Transformer) model, the best performing models were evaluated using synthetic metrics as well as by teachers with a background in music. Although the results did not meet the desired expectations due to certain decisions made during the preprocessing phase, the project offers valuable insights and recommendations for continued progress in developing a system that effectively meets the needs of music teachers and students

    Incrementar la presencia en entornos virtuales en primera persona a través de interfaces auditivas: un acercamiento analítico al sonido y la música adaptativos

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 25-11-2019The popularisation of virtual reality devices has brought with it an increased need of telepresence and player immersion in video games. This goals are often pursued through more realistic computer graphics and sound; however, invasive graphical user interfaces are still present in industry standard products for VR, even though previous research has advised against them in order to reach better results in immersion. Non-visual, multimodal communication channels are explored throughout this thesis as a means of reducing the amount of graphical elements needed in head-up displays while increasing telepresence. Thus, the main goals of this research are to find the optimal channels that allow for semantic communication without recurring to visual interfaces, while reducing the general number of extra-diegetic elements in a video game, and to develop a total of six software applications in order to validate the obtained knowledge in real-life scenarios. The central piece of software produced as a result of this process is called LitSens, and consists of an adaptive music generator which takes human emotions as inputs...La popularización de los dispositivos de realidad virtual ha traído consigo una mayor necesidad de presencia e inmersión para los jugadores de videojuegos. Habitualmentese intenta cumplir con dicha necesidad a través de gráficos y sonido por ordenador más realistas; no obstante, las interfaces gráficas de usuario muy invasivas aún son un estándar en la industria del videojuego de RV, incluso si se tiene en cuenta que varias investigaciones previas a la redacción de este texto recomiendan no utilizarlas para conseguir un resultado más inmersivo. A lo largo de esta tesis, varios canales de comunicación multimodales y no visuales son explorados con el fin de reducir la cantidad de elementos gráficos extradiegéticos necesarios en las capas de las interfaces gráficas de usuario destinadas a la representación de datos, todo ello mientras se logra un aumento de la sensación de presencia. Por tanto, los principales objetivos de esta investigación son encontrar los canales óptimos para efectuar comunicación semántica sin recurrir a interfaces visuales —a la vez que se reduce el número de elementos extradiegéticos en un videojuego— y desarrollar un total de seis aplicaciones con el objetivo de validar todo el conocimiento obtenido mediante prototipos similares a videojuegos comerciales. De todos ellos, el más importante es LitSens: un generador de música adaptativa que toma como entradas emociones humanas...Fac. de InformáticaTRUEunpu

    16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)

    Get PDF
    The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28-31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25-28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc

    The Creation of an Expert System for Teaching Piano Lessons

    Get PDF
    Combining the arts with science and technology has had many beneficial results. Computers and music have been connected for many years. Computers have been used in music composition, electronic keyboards, music publishing and digital sound processing. Artificial intelligence has been used in creating expert systems for training people in various fields. An attempt will be made to tie together expert systems for training with current computerized music technology. This research report proposes that an expert system be developed to teach piano lessons. The fields of music and artificial intelligence will be drawn upon in developing this expert system structure. While existing technology makes the choice of an electronic keyboard the logical one, using an acoustic piano will also be addressed

    Intelligent assistant for music practice

    Get PDF
    Generally, the present disclosure is directed to techniques to automatically provide feedback and suggestions to musicians. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to provide real-time feedback to musicians based on audio and/or video of the musician playing music. The techniques of this disclosure use various input features, e.g., the musician’s practice piece; references from a database of musical scores, data from different sensors, e.g., microphones, cameras, etc. to analyze the musician’s playing and provide real-time feedback or suggestions for corrections to be made, e.g., changing the tempo, playing a sharp or flat note (acting as an intelligent tuner), suggestions of practice pieces, etc

    The Creation of an Expert System for Teaching Piano Lessons

    Get PDF
    Combining the arts with science and technology has had many beneficial results. Computers and music have been connected for many years. Computers have been used in music composition, electronic keyboards, music publishing and digital sound processing. Artificial intelligence has been used in creating expert systems for training people in various fields. An attempt will be made to tie together expert systems for training with current computerized music technology. This research report proposes that an expert system be developed to teach piano lessons. The fields of music and artificial intelligence will be drawn upon in developing this expert system structure. While existing technology makes the choice of an electronic keyboard the logical one, using an acoustic piano will also be addressed

    Music similarity analysis using the big data framework spark

    Get PDF
    A parameterizable recommender system based on the Big Data processing framework Spark is introduced, which takes multiple tonal properties of music into account and is capable of recommending music based on a user's personal preferences. The implemented system is fully scalable; more songs can be added to the dataset, the cluster size can be increased, and the possibility to add different kinds of audio features and more state-of-the-art similarity measurements is given. This thesis also deals with the extraction of the required audio features in parallel on a computer cluster. The extracted features are then processed by the Spark based recommender system, and song recommendations for a dataset consisting of approximately 114000 songs are retrieved in less than 12 seconds on a 16 node Spark cluster, combining eight different audio feature types and similarity measurements.Ein parametrisierbares Empfehlungssystem, basierend auf dem Big Data Framework Spark, wird präsentiert. Dieses berücksichtigt verschiedene klangliche Eigenschaften der Musik und erstellt Musikempfehlungen basierend auf den persönlichen Vorlieben eines Nutzers. Das implementierte Empfehlungssystem ist voll skalierbar. Mehr Lieder können dem Datensatz hinzugefügt werden, mehr Rechner können in das Computercluster eingebunden werden und die Möglichkeit andere Audiofeatures und aktuellere Ähnlichkeitsmaße hizuzufügen und zu verwenden, ist ebenfalls gegeben. Des Weiteren behandelt die Arbeit die parallele Berechnung der benötigten Audiofeatures auf einem Computercluster. Die Features werden von dem auf Spark basierenden Empfehlungssystem verarbeitet und Empfehlungen für einen Datensatz bestehend aus ca. 114000 Liedern können unter Berücksichtigung von acht verschiedenen Arten von Audiofeatures und Abstandsmaßen innerhalb von zwölf Sekunden auf einem Computercluster mit 16 Knoten berechnet werden

    AXMEDIS 2007 Conference Proceedings

    Get PDF
    The AXMEDIS International Conference series has been established since 2005 and is focused on the research, developments and applications in the cross-media domain, exploring innovative technologies to meet the challenges of the sector. AXMEDIS2007 deals with all subjects and topics related to cross-media and digital-media content production, processing, management, standards, representation, sharing, interoperability, protection and rights management. It addresses the latest developments and future trends of the technologies and their applications, their impact and exploitation within academic, business and industrial communities
    corecore