11,805 research outputs found

    Turbo-detected unequal protection audio and speech transceivers using serially concatenated convolutional codes, trellis coded modulation and space-time trellis coding

    No full text
    The MPEG-4 TwinVQ audio codec and the AMR-WB speech codec are investigated in the context of a jointly optimised turbo transceiver capable of providing unequal error protection. The transceiver advocated consists of serially concatenated Space-Time Trellis Coding (STTC), Trellis Coded Modulation (TCM) and two different-rate Non-Systematic Convolutional codes (NSCs) used for unequal error protection. A benchmarker scheme combining STTC and a single-class protection NSC is used for comparison with the proposed scheme. The audio and speech performance of both schemes is evaluated, when communicating over uncorrelated Rayleigh fading channels. An Eb/N0E_b/N_0 value of about 2.5 (3.5)~dB is required for near-unimpaired audio (speech) transmission, which is about 3.07 (4.2)~dB from the capacity of the system

    Wavenet based low rate speech coding

    Full text link
    Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

    Performance and Complexity Co-Evaluations of MPEG4-ALS Compression Standard for Low-Latency Music Compression

    Get PDF
    In this thesis compression ratio and latency of different classical audio music tracks are analyzed with various encoder options of MPEG4ALS. Different tracks of audio music tracks are tested with MPEG4-ALS coder with different options to find the optimum values for various parameters to obtain maximum compression ratio with minimum CPU time (encoder and decoder time). Optimum frame length for which the compression ratio saturates for music audio is found out by analyzing the results when different classical music tracks are experimented with various frame lengths. Also music tracks with varying sampling rate are tested and the compression ratio and latency relationship with sampling rate are analyzed and plotted. It is found that the compression gain rate was higher when the codec complexity is less, and joint channel correlation and long term correlations are not significant and latency trade off make the more complex codec options unsuitable for applications where latency is critical. When the two entropy coding options, Rice code and BGMC (Block Gilbert-Moore Codes) are applied on various classical music tracks, it was obvious that the Rice code is more suitable for low-latency applications compared to the more complex BGMC coding, as BGMC improved compression performance with the expense of latency, making it unsuitable in real-time applications

    Cross modal perception of body size in domestic dogs (Canis familiaris)

    Get PDF
    While the perception of size-related acoustic variation in animal vocalisations is well documented, little attention has been given to how this information might be integrated with corresponding visual information. Using a cross-modal design, we tested the ability of domestic dogs to match growls resynthesised to be typical of either a large or a small dog to size- matched models. Subjects looked at the size-matched model significantly more often and for a significantly longer duration than at the incorrect model, showing that they have the ability to relate information about body size from the acoustic domain to the appropriate visual category. Our study suggests that the perceptual and cognitive mechanisms at the basis of size assessment in mammals have a multisensory nature, and calls for further investigations of the multimodal processing of size information across animal species

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
    • …
    corecore