118 research outputs found

    SPARSE DECOMPOSITION OF AUDIO SIGNALS USING A PERCEPTUAL MEASURE OF DISTORTION. APPLICATION TO LOSSY AUDIO CODING.

    No full text
    International audienceState-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, several studies applied adaptive sparse time-frequency transforms to audio coding, e.g. on unions of cosine bases using a Matching-Pursuit-derived algorithm. This was shown to significantly improve the coding efficiency. We propose another approach based on a variational algorithm, i.e. the optimization of a cost function taking into account both a perceptual distortion measure derived form a hearing model and a sparsity constraint, which favors the coding efficiency. In this early version, we show that, using a coding scheme without perceptual control of quantization, our method outperforms a codec from the literature with the same quantization scheme. In future work, a more sophisticated quantization scheme would probably allow our method to challenge standard codecs e.g. AAC

    A quasi-orthogonal, invertible, and perceptually relevant time-frequency transform for audio coding

    No full text
    International audienceWe describe ERB-MDCT, an invertible real-valued time-frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform with a resolution evolving across frequency to match the perceptual ERB frequency scale, while the frequency scale in most invertible transforms (e.g. MDCT) is uniform. ERB-MDCT has mostly the same frequency scale as ERBLet, but the main improvement is that atoms are quasi-orthogonal, i.e. its redundancy is close to 1. Furthermore, the energy is more sparse in the time-frequency plane. Thus, it is more suitable for audio coding than ERBLet

    Towards a Hybrid Audio Coder

    No full text
    International audienceThe main features of a novel approach for audio signal encoding are described. The approach combines non-linear transform coding and structured approximation techniques, together with hybrid modeling of the signal class under consideration. Essentially, several different components of the signal are estimated and transform coded using an appropriately chosen orthonormal basis. Different models and estimation procedures are discussed, and numerical results are provided

    Sparse signal decomposition on hybrid dictionaries using musical priors

    Get PDF
    International audienceThis paper investigates the use of musical priors for sparse expansion of audio signals of music on overcomplete dictionaries taken from the union of two orthonormal bases. More specifically, chord information is used to build structured model that take into account dependencies between coefficients of the decomposition. Evaluation on various music signals shows that our approach provides results whose quality measured by the signal-to-noise ratio corresponds to state-of-the-art approaches, and shows that our model is relevant to represent audio signals of Western tonal music and opens new perspectives

    Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

    Get PDF
    date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000

    Sparse and structured decomposition of audio signals on hybrid dictionaries using musical priors

    No full text
    International audienceThis paper investigates the use of musical priors for sparse expansion of audio signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music audio signals

    Audio Source Separation Using Sparse Representations

    Get PDF
    This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research

    Sparsity and persistence in time-frequency sound representations

    No full text
    13 pagesInternational audienceIt is a well known fact that the time-frequency domain is very well adapted for representing audio signals. The main two features of time-frequency representations of many classes of audio signals are sparsity (signals are generally well approximated using a small number of coefficients) and persistence (significant coefficients are not isolated, and tend to form clusters). This contribution presents signal approximation algorithms that exploit these properties, in the framework of hierarchical probabilistic models. Given a time-frequency frame (i.e. a Gabor frame, or a union of several Gabor frames or time-frequency bases), coefficients are first gathered into groups. A group of coefficients is then modeled as a random vector, whose distribution is governed by a hidden state associated with the group. Algorithms for parameter inference and hidden state estimation from analysis coefficients are described. The role of the chosen dictionary, and more particularly its structure, is also investigated. The proposed approach bears some resemblance with variational approaches previously proposed by the authors (in particular the variational approach exploiting mixed norms based regularization terms). In the framework of audio signal applications, the time-frequency frame under consideration is a union of two MDCT bases or two Gabor frames, in order to generate estimates for tonal and transient layers. Groups corresponding to tonal (resp. transient) coefficients are constant frequency (resp. constant time) time-frequency coefficients of a frequency-selective (resp. time-selective) MDCT basis or Gabor frame

    Audio Signal Representations for Factorization in the sparse domain

    Get PDF
    International audienceIn this paper, a new class of audio representations is introduced, together with a corresponding fast decomposition algorithm. The main feature of these representations is that they are both sparse and approximately shift-invariant, which allows similarity search in a sparse domain. The common sparse support of detected similar patterns is then used to factorize their representations. The potential of this method for simultaneous structural analysis and compressing tasks is illustrated by preliminary experiments on simple musical data
    • …
    corecore