4,158 research outputs found

    Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

    Full text link
    We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results

    MorpheuS: Generating Structured Music with Constrained Patterns and Tension

    Get PDF
    Automatic music generation systems have gained in popularity and sophistication as advances in cloud computing have enabled large-scale complex computations such as deep models and optimization algorithms on personal devices. Yet, they still face an important challenge, that of long-term structure, which is key to conveying a sense of musical coherence. We present the MorpheuS music generation system designed to tackle this problem. MorpheuS' novel framework has the ability to generate polyphonic pieces with a given tension profile and long- and short-term repeated pattern structures. A mathematical model for tonal tension quantifies the tension profile and state-of-the-art pattern detection algorithms extract repeated patterns in a template piece. An efficient optimization metaheuristic, variable neighborhood search, generates music by assigning pitches that best fit the prescribed tension profile to the template rhythm while hard constraining long-term structure through the detected patterns. This ability to generate affective music with specific tension profile and long-term structure is particularly useful in a game or film music context. Music generated by the MorpheuS system has been performed live in concerts.Comment: IEEE Transactions on Affective Computing. PP(99

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    A peça musical como uma instância : ensaios acerca da análise musical assistida por computador

    Get PDF
    Orientadores: Jônatas Manzolli, Moreno AndreattaTese (doutorado) - Universidade Estadual de Campinas, Instituto de Artes e Université Pierre et Marie Curie (França)Resumo: A partir de uma interpretação musicológica do conceito científico de "modelagem e simulação", esta tese apresenta uma abordagem para a análise musical assistida por computador onde partituras são reconstruídas a partir de processos algorítmicos e então simuladas através do uso de diferentes valores paramétricos resultando na geração de variações chamadas "instâncias". Investigar uma obra musical empregando "modelagem e simulações" significa buscar a sua compreensão através da uma atividade "recomposição", aproximando assim perspectivas analíticas e criativas. Esta abordagem foi aplicada em três casos de estudo: uma técnica isolada, a "multiplicação de acordes" usada pelo compositor francês Pierre Boulez (1925-2016) que foi explorada através do prisma formado pelas teorias de H. Hanson, S. Heinemann e L. Koblyakov e sua respectiva implementação computacional; a peça "Spectral Canon for Conlon Nancarrow" (1974) escrita pelo compositor americano James Tenney (1934-2006) na qual a simulação computacional a partir de diferentes valores paramétricos é levada às últimas consequências quando um "espaço de instâncias" é criado e estratégias de visualização são esboçadas; e por último a peça Désordre (1985), o primeiro estudo para piano escrito pelo austro-húngaro György Ligeti (1923-2006) onde os conceitos de "tonalidade combinatória" e "decomposição de um número inteiro (duração) em dois primos" são usados para maximizar o potencial de gerar diferentes variações através do respectivo modelo computacionalAbstract: From a musicological interpretation of the scientific notion of "modeling and simulation", this thesis presents an approach for computer-aided analysis where musical scores are reconstructed from algorithmic processes and then simulated with different sets of parameters from which neighboring variants, called instances, are generated. Studying a musical piece by modelling and simulation means to understand the work by (re)composing it again, blurring boundaries between analytical and creative work. This approach is applied to three case studies: an isolated technique, Pierre Boulez¿ (1925¿2016) Chord Multiplication, which was explored through the prism formed by the theories of H. Hanson, S. Heinemann and L. Koblyakov and by its computational implementation; the piece Spectral Canon for Conlon Nancarrow (1974) by the American composer James Tenney (1934¿2006) to which the computa- tional simulation from different sets of parameters was taken to its ultimate consequences when a "space of instances" is created and strategies of visu- alization and exploration are devised; and finally D ?esordre (1985), the first piano ?etude written by Austro-Hungarian Gyo ?rgy Ligeti (1923¿2006) in which the concepts of "combinatorial tonality" and "decomposition of a number (duration) into two prime numbers" were used to maximize the potential that a model has to produce different variations of the original pieceDoutoradoMúsica, Teoria, Criação e PráticaDoutor em Música2014/08525-8FAPES

    Soundscape Generation Using Web Audio Archives

    Get PDF
    Os grandes e crescentes acervos de áudio na web têm transformado a prática do design de som. Neste contexto, sampling -- uma ferramenta essencial do design de som -- mudou de gravações mecânicas para os domínios da cópia e reprodução no computador. A navegação eficaz nos grandes acervos e a recuperação de conteúdo tornaram-se um problema bem identificado em Music Information Retrieval, nomeadamente através da adoção de metodologias baseadas no conteúdo do áudio.Apesar da sua robustez e eficácia, as soluções tecnológicas atuais assentam principalmente em métodos (estatísticos) de processamento de sinal, cuja terminologia atinge um nível de adequação centrada no utilizador.Esta dissertação avança uma nova estratégia orientada semanticamente para navegação e recuperação de conteúdo de áudio, em particular, sons ambientais, a partir de grandes acervos de áudio na web. Por fim, pretendemos simplificar a extração de pedidos definidos pelo utilizador para promover uma geração fluida de paisagens sonoras. No nosso trabalho, os pedidos aos acervos de áudio na web são feitos por dimensões afetivas que se relacionam com estados emocionais (exemplo: baixa ativação e baixa valência) e descrições semânticas das fontes de áudio (exemplo: chuva). Para tal, mapeamos as anotações humanas das dimensões afetivas para descrições espectrais de áudio extraídas do conteúdo do sinal. A extração de novos sons dos acervos da web é feita estipulando um pedido que combina um ponto num plano afetivo bidimensional e tags semânticas. A aplicação protótipo, MScaper, implementa o método no ambiente Ableton Live. A avaliação da nossa pesquisa avaliou a confiabilidade perceptual dos descritores espectrais de áudio na captura de dimensões afetivas e a usabilidade da MScaper. Os resultados mostram que as características espectrais do áudio capturam significativamente as dimensões afetivas e que o MScaper foi entendido pelos os utilizadores experientes como tendo excelente usabilidade.The large and growing archives of audio content on the web have been transforming the sound design practice. In this context, sampling -- a fundamental sound design tool -- has shifted from mechanical recording to the realms of the copying and cutting on the computer. To effectively browse these large archives and retrieve content became a well-identified problem in Music Information Retrieval, namely through the adoption of audio content-based methodologies. Despite its robustness and effectiveness, current technological solutions rely mostly on (statistical) signal processing methods, whose terminology do attain a level of user-centered explanatory adequacy.This dissertation advances a novel semantically-oriented strategy for browsing and retrieving audio content, in particular, environmental sounds, from large web audio archives. Ultimately, we aim to streamline the retrieval of user-defined queries to foster a fluid generation of soundscapes. In our work, querying web audio archives is done by affective dimensions that relate to emotional states (e.g., low arousal and low valence) and semantic audio source descriptions (e.g., rain). To this end, we map human annotations of affective dimensions to spectral audio-content descriptions extracted from the signal content. Retrieving new sounds from web archives is then made by specifying a query which combines a point in a 2-dimensional affective plane and semantic tags. A prototype application, MScaper, implements the method in the Ableton Live environment. An evaluation of our research assesses the perceptual soundness of the spectral audio-content descriptors in capturing affective dimensions and the usability of MScaper. The results show that spectral audio features significantly capture affective dimensions and that MScaper has been perceived by expert-users as having excellent usability
    corecore