2,245 research outputs found

    Towards a Hybrid Audio Coder

    No full text
    International audienceThe main features of a novel approach for audio signal encoding are described. The approach combines non-linear transform coding and structured approximation techniques, together with hybrid modeling of the signal class under consideration. Essentially, several different components of the signal are estimated and transform coded using an appropriately chosen orthonormal basis. Different models and estimation procedures are discussed, and numerical results are provided

    Sparse and structured decomposition of audio signals on hybrid dictionaries using musical priors

    No full text
    International audienceThis paper investigates the use of musical priors for sparse expansion of audio signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music audio signals

    Speech Enhancement using Transient Speech Components

    Get PDF
    We believe that the auditory system, like the visual system, may besensitive to abrupt stimulus changes and the transient component inspeech may be particularly critical to speech perception. If thiscomponent can be identified and selectively amplified, improvedspeech perception in background noise may be possible.This project describes a method to decompose speech into tonal,transient, and residual components. The modified discrete cosinetransform (MDCT) and the wavelet transform are transforms used tocapture tonal and transient features in speech. The tonal andtransient components were identified by using a small number of MDCTand wavelet coefficients, respectively. In previous studies, all ofthe MDCT and all of the wavelet coefficients were assumed to beindependent, and identifications of the significant MDCT and thesignificant wavelet coefficients were achieved by thresholds.However, an appropriate threshold is not known and the MDCT and thewavelet coefficients show statistical dependencies, described by theclustering and persistence properties.In this work, the hidden Markov chain (HMC) model and the hiddenMarkov tree (HMT) model were applied to describe the clustering andpersistence properties between the MDCT coefficients and between thewavelet coefficients. The MDCT coefficients in each frequency indexwere modeled as a two-state mixture of two univariate Gaussiandistributions. The wavelet coefficients in each scale of each treewere modeled as a two-state mixture of two univariate Gaussiandistributions. The initial parameters of Gaussian mixtures wereestimated by the greedy EM algorithm. By utilizing the Viterbi andthe MAP algorithms used to find the optimal state distribution, thesignificant MDCT and the significant wavelet coefficients weredetermined without relying on a threshold.The transient component isolated by our method was selectivelyamplified and recombined with the original speech to generateenhanced speech, with energy adjusted to equal to the energy of theoriginal speech. The intelligibility of the original and enhancedspeech was evaluated in eleven human subjects using the modifiedrhyme protocol. Word recognition rate results show that theenhanced speech can improve speech intelligibility at low SNR levels(8% at -15 dB, 14% at -20dB, and 18% at -25 dB)

    Automatic music transcription: challenges and future directions

    Get PDF
    Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

    Data Discovery and Anomaly Detection using Atypicality.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017
    corecore