157 research outputs found
Score extraction usign MPEG-4 T/F partial encoding
This paper describes the preliminary work in the development of an MPEG-4 audio transcoder between the time/frequency (T/F) and the structured audio (SA) formats. Our approach consists in not going from T/F format through to waveform data and back again to SA, but extracting the score information from an intermediate stage. For this intermediate form we have chosen the input of the filterbank and block switching tool, which consists of frequency data. This data is the result of windowing and applying the modified discrete cosine transform (MDCT) to the signal. The size of the window to be used is determined in a frame-by-frame basis by a psychoacoustics analysis of the data. In this paper we show that this approach is feasible by developing a system which extracts the score information from the filterbank and block switching tool output in a MPEG-4 T/F encoder by adapting and fine-tuning some existing processing techniques.Peer ReviewedPostprint (published version
Coding overcomplete representations of audio using the MCLT
We propose a system for audio coding using the modulated complex
lapped transform (MCLT). In general, it is difficult to encode signals using
overcomplete representations without avoiding a penalty in rate-distortion
performance. We show that the penalty can be significantly reduced for
MCLT-based representations, without the need for iterative methods of
sparsity reduction. We achieve that via a magnitude-phase polar quantization
and the use of magnitude and phase prediction. Compared to systems based
on quantization of orthogonal representations such as the modulated lapped
transform (MLT), the new system allows for reduced warbling artifacts and
more precise computation of frequency-domain auditory masking functions
Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods
date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000
Audio Inpainting
(c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211
Scalable and perceptual audio compression
This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner
- …