1,274 research outputs found

    A Generative Product-of-Filters Model of Audio

    Full text link
    We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. This paper formulates the PoF model and derives a mean-field method for posterior inference and a variational EM algorithm to estimate the model's free parameters. We demonstrate PoF's potential for audio processing on a bandwidth expansion task, and show that PoF can serve as an effective unsupervised feature extractor for a speaker identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod

    Creating music by listening

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 127-139).Machines have the power and potential to make expressive music on their own. This thesis aims to computationally model the process of creating music using experience from listening to examples. Our unbiased signal-based solution models the life cycle of listening, composing, and performing, turning the machine into an active musician, instead of simply an instrument. We accomplish this through an analysis-synthesis technique by combined perceptual and structural modeling of the musical surface, which leads to a minimal data representation. We introduce a music cognition framework that results from the interaction of psychoacoustically grounded causal listening, a time-lag embedded feature representation, and perceptual similarity clustering. Our bottom-up analysis intends to be generic and uniform by recursively revealing metrical hierarchies and structures of pitch, rhythm, and timbre. Training is suggested for top-down un-biased supervision, and is demonstrated with the prediction of downbeat. This musical intelligence enables a range of original manipulations including song alignment, music restoration, cross-synthesis or song morphing, and ultimately the synthesis of original pieces.by Tristan Jehan.Ph.D

    Bayesian Interpolation and Parameter Estimation in a Dynamic Sinusoidal Model

    Get PDF
    In this paper, we propose a method for restoring the missing or corrupted observations of nonstationary sinusoidal signals which are often encountered in music and speech applications. To model nonstationary signals, we use a time-varying sinusoidal model which is obtained by extending the static sinusoidal model into a dynamic sinusoidal model. In this model, the in-phase and quadrature components of the sinusoids are modeled as first-order Gauss–Markov processes. The inference scheme for the model parameters and missing observations is formulated in a Bayesian framework and is based on a Markov chain Monte Carlo method known as Gibbs sampler. We focus on the parameter estimation in the dynamic sinusoidal model since this constitutes the core of model-based interpolation. In the simulations, we first investigate the applicability of the model and then demonstrate the inference scheme by applying it to the restoration of lost audio packets on a packet-based network. The results show that the proposed method is a reasonable inference scheme for estimating unknown signal parameters and interpolating gaps consisting of missing/corrupted signal segments

    Some New Results on the Estimation of Sinusoids in Noise

    Get PDF

    Bayes meets Bach: applications of Bayesian statistics to audio restoration

    Get PDF
    Memoryless nonlinear distortion can be present in audio signals, from recording to reproduction: bad quality or amateurishly operated equipments, physically degraded media and low quality reproducing devices are some examples where nonlinearities can naturally appear. Another quite common defect in old recordings are the long pulses, caused in general by the reproduction of disks with deep scratches or severely degraded magnetic tapes. Such defects are characterized by an initial discontinuity in the waveform, followed by a low-frequency transient of long duration. In both cases audible artifacts can be created, causing an unpleasant experience to the listener. It is then important to develop techniques to mitigate such defects, having at hand only the degraded signal, in a way to recover the original signal. In this thesis, techniques to deal with both problems are presented: the restoration of nonlinearly degraded recordings is tackled in a Bayesian context, considering both autoregressive models and sparsity in the DCT domain for the original signal, as well as through a deterministic solution also based on sparsity; for the suppression of long pulses, a parametric approach is revisited with the addition of an efficient initialization procedure, and a nonparametric modeling via Gaussian process is also presented.Distorções não-lineares podem aparecer em sinais de áudio desde o momento da sua gravação até a posterior reprodução: equipamentos precários ou operados de maneira indevida, mídias fisicamente degradadas e baixa qualidade dos aparelhos de reprodução são somente alguns exemplos onde não-linearidades podem aparecer de modo natural. Outro defeito bastante comum em gravações antigas são os pulsos longos, em geral causados pela reprodução de discos com arranhões muito profundos ou fitas magnéticas severamente degradadas. Tais defeitos são caracterizados por uma descontinuidade inicial na forma de onda, seguida de um transitório de baixa frequência e longa duração. Em ambos os casos, artefatos auditivos podem ser criados, causando assim uma experiência ruim para o ouvinte. E importante então desenvolver técnicas para mitigar tais efeitos, tendo como base somente uma versão do sinal degradado, de modo a recuperar o sinal original não degradado. Nessa tese são apresentadas técnicas para lidar com esses dois problemas: o problema de restaurar gravações corrompidas com distorções não-lineares é abordado em um contexto bayesiano, considerando tanto modelos autorregressivos quanto de esparsidade no domínio da DCT para o sinal original, bem como por uma solução determinística também em usando esparsidade; para a supressão de pulsos longos, uma abordagem paramétrica é revisitada, junto com o acréscimo de um eficiente procedimento de inicialização, sendo também apresentada uma abordagem não-paramétricausando processos gaussianos

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
    corecore