181 research outputs found

    Interactive Music Generation with Positional Constraints using Anticipation-RNNs

    Full text link
    Recurrent Neural Networks (RNNS) are now widely used on sequence generation tasks due to their ability to learn long-range dependencies and to generate sequences of arbitrary length. However, their left-to-right generation procedure only allows a limited control from a potential user which makes them unsuitable for interactive and creative usages such as interactive music generation. This paper introduces a novel architecture called Anticipation-RNN which possesses the assets of the RNN-based generative models while allowing to enforce user-defined positional constraints. We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J.S. Bach chorale harmonizations. Sampling using the Anticipation-RNN is of the same order of complexity than sampling from the traditional RNN model. This fast and interactive generation of musical sequences opens ways to devise real-time systems that could be used for creative purposes.Comment: 9 pages, 7 figure

    Sparse and structured decomposition of audio signals on hybrid dictionaries using musical priors

    No full text
    International audienceThis paper investigates the use of musical priors for sparse expansion of audio signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music audio signals

    Bayesian Interpolation and Parameter Estimation in a Dynamic Sinusoidal Model

    Get PDF
    In this paper, we propose a method for restoring the missing or corrupted observations of nonstationary sinusoidal signals which are often encountered in music and speech applications. To model nonstationary signals, we use a time-varying sinusoidal model which is obtained by extending the static sinusoidal model into a dynamic sinusoidal model. In this model, the in-phase and quadrature components of the sinusoids are modeled as first-order Gauss–Markov processes. The inference scheme for the model parameters and missing observations is formulated in a Bayesian framework and is based on a Markov chain Monte Carlo method known as Gibbs sampler. We focus on the parameter estimation in the dynamic sinusoidal model since this constitutes the core of model-based interpolation. In the simulations, we first investigate the applicability of the model and then demonstrate the inference scheme by applying it to the restoration of lost audio packets on a packet-based network. The results show that the proposed method is a reasonable inference scheme for estimating unknown signal parameters and interpolating gaps consisting of missing/corrupted signal segments

    A sticky HDP-HMM with application to speaker diarization

    Get PDF
    We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Deep Recurrent Music Writer: Memory-enhanced Variational Autoencoder-based Musical Score Composition and an Objective Measure

    Get PDF
    Abstract: In recent years, there has been an increasing interest in music generation using machine learning techniques typically used for classification or regression tasks. This is a field still in its infancy, and most attempts are still characterized by the imposition of many restrictions to the music composition process in order to favor the creation of “interesting” outputs. Furthermore, and most importantly, none of the past attempts has focused on developing objective measures to evaluate the music composed, which would allow to evaluate the pieces composed against a predetermined standard as well as permitting to fine-tune models for better “performance” and music composition goals. In this work, we intend to advance state-of-the-art in this area by introducing and evaluating a new metric for an objective assessment of the quality of the generated pieces. We will use this measure to evaluate the outputs of a truly generative model based on Variational Autoencoders that we apply here to automated music composition. Using our metric, we demonstrate that our model can generate music pieces that follow general stylistic characteristics of a given composer or musical genre. Additionally, we use this measure to investigate the impact of various parameters and model architectures on the compositional process and output
    • …
    corecore