10 research outputs found

    Frequency Domain Linear Prediction for QMF Sub-bands and Applications to Audio Coding

    Get PDF
    This paper proposes an analysis technique for wide-band audio applications based on the predictability of the temporal evolution of Quadrature Mirror Filter (QMF) sub-band signals. The input audio signal is first decomposed into 64 sub-band signals using QMF decomposition. The temporal envelopes in critically sampled QMF sub-bands are approximated using frequency domain linear prediction applied over relatively long time segments (e.g. 1000 ms). Line Spectral Frequency parameters related to autoregressive models are computed and quantized in each frequency sub-band. The sub-band residuals are quantized in the frequency domain using a combination of split Vector Quantization (VQ) (for magnitudes) and uniform scalar quantization (for phases). In the decoder, the sub-band signal is reconstructed using the quantized residual and the corresponding quantized envelope. Finally, application of inverse QMF reconstructs the audio signal. Even with simple quantization techniques and without any sophisticated modules, the proposed audio coder provides encouraging results in objective quality tests. Also, the proposed coder is easily scalable across a wide range of bit-rates

    Scalable Wide-band Audio Codec based on Frequency Domain Linear Prediction

    Get PDF
    This paper proposes a technique for wide-band audio applications based on the predictability of the temporal evolution of Quadrature Mirror Filter (QMF) sub-band signals. An input audio signal is first decomposed into 64 frequency sub-band signals using QMF decomposition. The temporal envelopes in critically sampled QMF sub-bands are approximated using frequency domain linear prediction applied over relatively long time segments (e.g. 10001000 ms). Line Spectral Frequency parameters related to autoregressive models are computed and quantized in each frequency sub-band. The sub-band residual signals are quantized in the frequency domain using a split Vector Quantization (VQ) technique. In the decoder, the sub-band signal is reconstructed using the quantized residual and the corresponding quantized envelope. Finally, application of inverse QMF reconstructs the audio signal. Even with simple quantization techniques and without any psychoacoustic model, the proposed audio coder provides encouraging results on objective quality tests

    Audio quality and capacity issues in network design

    Get PDF
    Data networks are an integral part of the professional audio environment. The emphasis in audio networking is moving towards the real-time transfer of audio data. This requires the timely and reliable delivery of audio data. This paper examines a number of quality and capacity issues surrounding the transfer of high quality audio data in real-time over data networks. Areas such as noise, jitter and other types of errors are examined in relation to audio transfer. The paper also includes an overview of the hearing process, compression schemes and a review of perceptual encoding techniques. Important network tools such as Quality of Service and Error Correction are also examined in the context of transferring audio data

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    MPEG-4's BIFS-Anim protocol: using MPEG-4 for streaming of 3D animations

    Get PDF
    This thesis explores issues related to the generation and animation of synthetic objects within the context of MPEG-4. MPEG-4 was designed to provide a standard that will deliver rich multimedia content on many different platforms and networks. MPEG-4 should be viewed as a toolbox rather than as a monolithic standard as each implementer of the standard will pick the necessary tools adequate to their needs, likely to be a small subset of the available tools. The subset of MPEG-4 that will be examined here are the tools relating to the generation of 3D scenes and to the animation of those scenes. A comparison with the most popular 3D standard, Virtual Reality Modeling Language (VRML) will be included. An overview of the MPEG-4 standard will be given, describing the basic concepts. MPEG-4 uses a scene description language called Binary Format for Scene (BIFS) for the composition of scenes, this description language will be described. The potential for the technology used in BIFS to provide low bitrate streaming 3D animations will be analysed and some examples of the possible uses of this technology will be given. A tool for the encoding of streaming 3D animations will be described and results will be shown that MPEG-4 provides a more efficient way of encoding 3D data when compared to VRML. Finally a look will be taken at the future of 3D content on the Internet

    Object coding of music using expressive MIDI

    Get PDF
    PhDStructured audio uses a high level representation of a signal to produce audio output. When it was first introduced in 1998, creating a structured audio representation from an audio signal was beyond the state-of-the-art. Inspired by object coding and structured audio, we present a system to reproduce audio using Expressive MIDI, high-level parameters being used to represent pitch expression from an audio signal. This allows a low bit-rate MIDI sketch of the original audio to be produced. We examine optimisation techniques which may be suitable for inferring Expressive MIDI parameters from estimated pitch trajectories, considering the effect of data codings on the difficulty of optimisation. We look at some less common Gray codes and examine their effect on algorithm performance on standard test problems. We build an expressive MIDI system, estimating parameters from audio and synthesising output from those parameters. When the parameter estimation succeeds, we find that the system produces note pitch trajectories which match source audio to within 10 pitch cents. We consider the quality of the system in terms of both parameter estimation and the final output, finding that improvements to core components { audio segmentation and pitch estimation, both active research fields { would produce a better system. We examine the current state-of-the-art in pitch estimation, and find that some estimators produce high precision estimates but are prone to harmonic errors, whilst other estimators produce fewer harmonic errors but are less precise. Inspired by this, we produce a novel pitch estimator combining the output of existing estimators

    Frequency-domain bandwidth extension for low-delay audio coding applications

    Get PDF
    MPEG-4 Spectral Band Replication (SBR) is a sophisticated high-frequency reconstruction (HFR) tool for speech and natural audio which when used in conjunction with an audio codec delivers a broadband high-quality signal at a bit rate of 48 kbps or even below. The major drawback of this technique is that it significantly increases the delay of the underlying core codec. The idea of synthetic signal reconstruction is of particular interest also in real-time communications. There, a HFR method can be employed to further loosen the channel capacity requirements. In this thesis a delay-optimized derivative of SBR is elaborated, which can be used together with a low-delay speech and audio coder like the Fraunhofer ULD. The presented approach is based on a short-time subband representation of an acoustic signal of natural or artificial origin, and as such it utilizes a filter bank for the extraction and the manipulation of sound characteristics. The system delay for a combination of the ULD coder with the proposed low-delay bandwidth extension (LD-BWE) tool adds up to 12 ms at a sampling rate of 48 kHz. At the present stage, LD-BWE generates a subjectively confirmed excellent-quality highband replica at a simulated mean data rate of 12.8 kbps.MPEG-4 Spectral Band Replication (SBR) ist ein technisch ausgereiftes Verfahren zur Rückgewinnung von hochfrequenten Signalkomponenten für Sprache und natürliches Audio, das in Verbindung mit einem Audiocodec angewandt ein hochwertiges Breitbandsignal bei einer Bitrate von nicht mehr als 48 kbps liefert. Ein wesentlicher Nachteil dieser Methode ist, dass sie die Zeitverzögerung des darunter liegenden Kerncodecs maßgeblich vergrößert. Die Idee der synthetischen Signalwiederherstellung ist in Echtzeitkommunikation ebenso von besonderem Interesse. Ein derartiges Verfahren könnte dort eingesetzt werden, um die Anforderungen an die Kanalkapazität weiter zu lockern. In dieser Arbeit wird ein latenzoptimiertes Derivat von SBR ausgearbeitet, welches zusammen mit einem minimal verzögernden Sprach- und Audiocoder, wie dem Fraunhofer ULD, verwendet werden kann. Der vorgestellte Ansatz basiert auf einer Kurzzeit-Teilband-Darstellung eines akustischen Signals natürlichen oder künstlichen Ursprungs, und greift als solcher auf eine Filterbank zur Extraktion und Manipulation von Klangcharakteristika zurück. Die Verzögerungszeit des Gesamtsystems bestehend aus dem ULD-Coder und der vorgeschlagenen Bandbreitenerweiterung beläuft sich bei einer Abtastrate von 48 kHz auf 12 ms. Einem subjektiven Hörtest zufolge, erzeugt die neu entwickelte Bandbreitenerweiterung in ihrem derzeitigen Stadium eine Kopie des Hochbandes von hervorragender Qualität bei einer simulierten mittleren Datenrate von 12.8 kbps.Ilmenau, Techn. Univ., Masterarbeit, 201

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    MPEG-4 natural audio coding

    No full text
    MPEG-4 audio represents a new kind of audio coding standard. Unlike its predecessors, MPEG-1 and MPEG-2 high-quality audio coding, and unlike the speech coding standards which have been completed by the ITU-T, it describes not a single or small set of highly e$cient compression schemes but a complete toolbox to do everything from low bit-rate speech coding to high-quality audio coding or music synthesis. The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms and their combination to enable new functionalities like scalability (hierarchical coding) across the boundaries of coding algorithms. This paper gives an overview of the basic algorithms and how they can be combined
    corecore