11,805 research outputs found
Turbo-detected unequal protection audio and speech transceivers using serially concatenated convolutional codes, trellis coded modulation and space-time trellis coding
The MPEG-4 TwinVQ audio codec and the AMR-WB speech codec are investigated in the context of a jointly optimised turbo transceiver capable of providing unequal error protection. The transceiver advocated consists of serially concatenated Space-Time Trellis Coding (STTC), Trellis Coded Modulation (TCM) and two different-rate Non-Systematic Convolutional codes (NSCs) used for unequal error protection. A benchmarker scheme combining STTC and a single-class protection NSC is used for comparison with the proposed scheme. The audio and speech performance of both schemes is evaluated, when communicating over uncorrelated Rayleigh fading channels. An value of about 2.5 (3.5)~dB is required for near-unimpaired audio (speech) transmission, which is about 3.07 (4.2)~dB from the capacity of the system
Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides
poor reconstruction quality because of the inadequacy of the model used. We
describe how a WaveNet generative speech model can be used to generate high
quality speech from the bit stream of a standard parametric coder operating at
2.4 kb/s. We compare this parametric coder with a waveform coder based on the
same generative model and show that approximating the signal waveform incurs a
large rate penalty. Our experiments confirm the high performance of the WaveNet
based coder and show that the speech produced by the system is able to
additionally perform implicit bandwidth extension and does not significantly
impair recognition of the original speaker for the human listener, even when
that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
Performance and Complexity Co-Evaluations of MPEG4-ALS Compression Standard for Low-Latency Music Compression
In this thesis compression ratio and latency of different classical audio music tracks are analyzed with various encoder options of MPEG4ALS. Different tracks of audio music tracks are tested with MPEG4-ALS coder with different options to find the optimum values for various parameters to obtain maximum compression ratio with minimum CPU time (encoder and decoder time). Optimum frame length for which the compression ratio saturates for music audio is found out by analyzing the results when different classical music tracks are experimented with various frame lengths. Also music tracks with varying sampling rate are tested and the compression ratio and latency relationship with sampling rate are analyzed and plotted. It is found that the compression gain rate was higher when the codec complexity is less, and joint channel correlation and long term correlations are not significant and latency trade off make the more complex codec options unsuitable for applications where latency is critical. When the two entropy coding options, Rice code and BGMC (Block Gilbert-Moore Codes) are applied on various classical music tracks, it was obvious that the Rice code is more suitable for low-latency applications compared to the more complex BGMC coding, as BGMC improved compression performance with the expense of latency, making it unsuitable in real-time applications
Cross modal perception of body size in domestic dogs (Canis familiaris)
While the perception of size-related acoustic variation in animal vocalisations is well documented, little attention has been given to how this information might be integrated with corresponding visual information. Using a cross-modal design, we tested the ability of domestic dogs to match growls resynthesised to be typical of either a large or a small dog to size- matched models. Subjects looked at the size-matched model significantly more often and for a significantly longer duration than at the incorrect model, showing that they have the ability to relate information about body size from the acoustic domain to the appropriate visual category. Our study suggests that the perceptual and cognitive mechanisms at the basis of size assessment in mammals have a multisensory nature, and calls for further investigations of the multimodal processing of size information across animal species
Frequency-warped autoregressive modeling and filtering
This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles.
Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications.
Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
- …