4,158 research outputs found
Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision
We tackle the problem of audiovisual scene analysis for weakly-labeled data.
To this end, we build upon our previous audiovisual representation learning
framework to perform object classification in noisy acoustic environments and
integrate audio source enhancement capability. This is made possible by a novel
use of non-negative matrix factorization for the audio modality. Our approach
is founded on the multiple instance learning paradigm. Its effectiveness is
established through experiments over a challenging dataset of music instrument
performance videos. We also show encouraging visual object localization
results
MorpheuS: Generating Structured Music with Constrained Patterns and Tension
Automatic music generation systems have gained in popularity and
sophistication as advances in cloud computing have enabled large-scale complex
computations such as deep models and optimization algorithms on personal
devices. Yet, they still face an important challenge, that of long-term
structure, which is key to conveying a sense of musical coherence. We present
the MorpheuS music generation system designed to tackle this problem. MorpheuS'
novel framework has the ability to generate polyphonic pieces with a given
tension profile and long- and short-term repeated pattern structures. A
mathematical model for tonal tension quantifies the tension profile and
state-of-the-art pattern detection algorithms extract repeated patterns in a
template piece. An efficient optimization metaheuristic, variable neighborhood
search, generates music by assigning pitches that best fit the prescribed
tension profile to the template rhythm while hard constraining long-term
structure through the detected patterns. This ability to generate affective
music with specific tension profile and long-term structure is particularly
useful in a game or film music context. Music generated by the MorpheuS system
has been performed live in concerts.Comment: IEEE Transactions on Affective Computing. PP(99
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
A peça musical como uma instância : ensaios acerca da análise musical assistida por computador
Orientadores: Jônatas Manzolli, Moreno AndreattaTese (doutorado) - Universidade Estadual de Campinas, Instituto de Artes e Université Pierre et Marie Curie (França)Resumo: A partir de uma interpretação musicológica do conceito científico de "modelagem e simulação", esta tese apresenta uma abordagem para a análise musical assistida por computador onde partituras são reconstruídas a partir de processos algorítmicos e então simuladas através do uso de diferentes valores paramétricos resultando na geração de variações chamadas "instâncias". Investigar uma obra musical empregando "modelagem e simulações" significa buscar a sua compreensão através da uma atividade "recomposição", aproximando assim perspectivas analíticas e criativas. Esta abordagem foi aplicada em três casos de estudo: uma técnica isolada, a "multiplicação de acordes" usada pelo compositor francês Pierre Boulez (1925-2016) que foi explorada através do prisma formado pelas teorias de H. Hanson, S. Heinemann e L. Koblyakov e sua respectiva implementação computacional; a peça "Spectral Canon for Conlon Nancarrow" (1974) escrita pelo compositor americano James Tenney (1934-2006) na qual a simulação computacional a partir de diferentes valores paramétricos é levada às últimas consequências quando um "espaço de instâncias" é criado e estratégias de visualização são esboçadas; e por último a peça Désordre (1985), o primeiro estudo para piano escrito pelo austro-húngaro György Ligeti (1923-2006) onde os conceitos de "tonalidade combinatória" e "decomposição de um número inteiro (duração) em dois primos" são usados para maximizar o potencial de gerar diferentes variações através do respectivo modelo computacionalAbstract: From a musicological interpretation of the scientific notion of "modeling and simulation", this thesis presents an approach for computer-aided analysis where musical scores are reconstructed from algorithmic processes and then simulated with different sets of parameters from which neighboring variants, called instances, are generated. Studying a musical piece by modelling and simulation means to understand the work by (re)composing it again, blurring boundaries between analytical and creative work. This approach is applied to three case studies: an isolated technique, Pierre Boulez¿ (1925¿2016) Chord Multiplication, which was explored through the prism formed by the theories of H. Hanson, S. Heinemann and L. Koblyakov and by its computational implementation; the piece Spectral Canon for Conlon Nancarrow (1974) by the American composer James Tenney (1934¿2006) to which the computa- tional simulation from different sets of parameters was taken to its ultimate consequences when a "space of instances" is created and strategies of visu- alization and exploration are devised; and finally D ?esordre (1985), the first piano ?etude written by Austro-Hungarian Gyo ?rgy Ligeti (1923¿2006) in which the concepts of "combinatorial tonality" and "decomposition of a number (duration) into two prime numbers" were used to maximize the potential that a model has to produce different variations of the original pieceDoutoradoMúsica, Teoria, Criação e PráticaDoutor em Música2014/08525-8FAPES
Soundscape Generation Using Web Audio Archives
Os grandes e crescentes acervos de áudio na web têm transformado a prática do design de som. Neste contexto, sampling -- uma ferramenta essencial do design de som -- mudou de gravações mecânicas para os domínios da cópia e reprodução no computador. A navegação eficaz nos grandes acervos e a recuperação de conteúdo tornaram-se um problema bem identificado em Music Information Retrieval, nomeadamente através da adoção de metodologias baseadas no conteúdo do áudio.Apesar da sua robustez e eficácia, as soluções tecnológicas atuais assentam principalmente em métodos (estatísticos) de processamento de sinal, cuja terminologia atinge um nível de adequação centrada no utilizador.Esta dissertação avança uma nova estratégia orientada semanticamente para navegação e recuperação de conteúdo de áudio, em particular, sons ambientais, a partir de grandes acervos de áudio na web. Por fim, pretendemos simplificar a extração de pedidos definidos pelo utilizador para promover uma geração fluida de paisagens sonoras. No nosso trabalho, os pedidos aos acervos de áudio na web são feitos por dimensões afetivas que se relacionam com estados emocionais (exemplo: baixa ativação e baixa valência) e descrições semânticas das fontes de áudio (exemplo: chuva). Para tal, mapeamos as anotações humanas das dimensões afetivas para descrições espectrais de áudio extraídas do conteúdo do sinal. A extração de novos sons dos acervos da web é feita estipulando um pedido que combina um ponto num plano afetivo bidimensional e tags semânticas. A aplicação protótipo, MScaper, implementa o método no ambiente Ableton Live. A avaliação da nossa pesquisa avaliou a confiabilidade perceptual dos descritores espectrais de áudio na captura de dimensões afetivas e a usabilidade da MScaper. Os resultados mostram que as características espectrais do áudio capturam significativamente as dimensões afetivas e que o MScaper foi entendido pelos os utilizadores experientes como tendo excelente usabilidade.The large and growing archives of audio content on the web have been transforming the sound design practice. In this context, sampling -- a fundamental sound design tool -- has shifted from mechanical recording to the realms of the copying and cutting on the computer. To effectively browse these large archives and retrieve content became a well-identified problem in Music Information Retrieval, namely through the adoption of audio content-based methodologies. Despite its robustness and effectiveness, current technological solutions rely mostly on (statistical) signal processing methods, whose terminology do attain a level of user-centered explanatory adequacy.This dissertation advances a novel semantically-oriented strategy for browsing and retrieving audio content, in particular, environmental sounds, from large web audio archives. Ultimately, we aim to streamline the retrieval of user-defined queries to foster a fluid generation of soundscapes. In our work, querying web audio archives is done by affective dimensions that relate to emotional states (e.g., low arousal and low valence) and semantic audio source descriptions (e.g., rain). To this end, we map human annotations of affective dimensions to spectral audio-content descriptions extracted from the signal content. Retrieving new sounds from web archives is then made by specifying a query which combines a point in a 2-dimensional affective plane and semantic tags. A prototype application, MScaper, implements the method in the Ableton Live environment. An evaluation of our research assesses the perceptual soundness of the spectral audio-content descriptors in capturing affective dimensions and the usability of MScaper. The results show that spectral audio features significantly capture affective dimensions and that MScaper has been perceived by expert-users as having excellent usability
- …