93 research outputs found

    A constrained joint source/channel coder design and vector quantization of nonstationary sources

    Get PDF
    The emergence of broadband ISDN as the network for the future brings with it the promise of integration of all proposed services in a flexible environment. In order to achieve this flexibility, asynchronous transfer mode (ATM) has been proposed as the transfer technique. During this period a study was conducted on the bridging of network transmission performance and video coding. The successful transmission of variable bit rate video over ATM networks relies on the interaction between the video coding algorithm and the ATM networks. Two aspects of networks that determine the efficiency of video transmission are the resource allocation algorithm and the congestion control algorithm. These are explained in this report. Vector quantization (VQ) is one of the more popular compression techniques to appear in the last twenty years. Numerous compression techniques, which incorporate VQ, have been proposed. While the LBG VQ provides excellent compression, there are also several drawbacks to the use of the LBG quantizers including search complexity and memory requirements, and a mismatch between the codebook and the inputs. The latter mainly stems from the fact that the VQ is generally designed for a specific rate and a specific class of inputs. In this work, an adaptive technique is proposed for vector quantization of images and video sequences. This technique is an extension of the recursively indexed scalar quantization (RISQ) algorithm

    On Multiple Description Coding of Sources with Memory

    Full text link

    Study and simulation of low rate video coding schemes

    Get PDF
    The semiannual report is included. Topics covered include communication, information science, data compression, remote sensing, color mapped images, robust coding scheme for packet video, recursively indexed differential pulse code modulation, image compression technique for use on token ring networks, and joint source/channel coder design

    Quantisation mechanisms in multi-protoype waveform coding

    Get PDF
    Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    Some new developments in image compression

    Get PDF
    This study is divided into two parts. The first part involves an investigation of near-lossless compression of digitized images using the entropy-coded DPCM method with a large number of quantization levels. Through the investigation, a new scheme that combines both lossy and lossless DPCM methods into a common framework is developed. This new scheme uses known results on the design of predictors and quantizers that incorporate properties of human visual perception. In order to enhance the compression performance of the scheme, an adaptively generated source model with multiple contexts is employed for the coding of the quantized prediction errors, rather than a memoryless model as in the conventional DPCM method. Experiments show that the scheme can provide compression in the range from 4 to 11 with a peak SNR of about 50 dB for 8-bit medical images. Also, the use of multiple contexts is found to improve compression performance by about 25% to 35%;The second part of the study is devoted to the problem of lossy image compression using tree-structured vector quantization. As a result of the study, a new design method for codebook generation is developed together with four different implementation algorithms. In the new method, an unbalanced tree-structured vector codebook is designed in a greedy fashion under the constraint of rate-distortion trade-off which can then be used to implement a variable-rate compression system. From experiments, it is found that the new method can achieve a very good rate-distortion performance while being computationally efficient. Also, due to the tree-structure of the codebook, the new method is amenable to progressive transmission applications

    Optimal soft-decoding combined trellis-coded quantization/modulation.

    Get PDF
    Chei Kwok-hung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 66-73).Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Typical Digital Communication Systems --- p.2Chapter 1.1.1 --- Source coding --- p.3Chapter 1.1.2 --- Channel coding --- p.5Chapter 1.2 --- Joint Source-Channel Coding System --- p.5Chapter 1.3 --- Thesis Organization --- p.7Chapter Chapter 2 --- Trellis Coding --- p.9Chapter 2.1 --- Convolutional Codes --- p.9Chapter 2.2 --- Trellis-Coded Modulation --- p.12Chapter 2.2.1 --- Set Partitioning --- p.13Chapter 2.3 --- Trellis-Coded Quantization --- p.14Chapter 2.4 --- Joint TCQ/TCM System --- p.17Chapter 2.4.1 --- The Combined Receiver --- p.17Chapter 2.4.2 --- Viterbi Decoding --- p.19Chapter 2.4.3 --- Sequence MAP Decoding --- p.20Chapter 2.4.4 --- Sliding Window Decoding --- p.21Chapter 2.4.5 --- Block-Based Decoding --- p.23Chapter Chapter 3 --- Soft Decoding Joint TCQ/TCM over AWGN Channel --- p.25Chapter 3.1 --- System Model --- p.26Chapter 3.2 --- TCQ with Optimal Soft-Decoder --- p.27Chapter 3.3 --- Gaussian Memoryless Source --- p.30Chapter 3.3.1 --- Theorem Limit --- p.31Chapter 3.3.2 --- Performance on PAM Constellations --- p.32Chapter 3.3.3 --- Performance on PSK Constellations --- p.36Chapter 3.4 --- Uniform Memoryless Source --- p.38Chapter 3.4.1 --- Theorem Limit --- p.38Chapter 3.4.2 --- Performance on PAM Constellations --- p.39Chapter 3.4.3 --- Performance on PSK Constellations --- p.40Chapter Chapter 4 --- Soft Decoding Joint TCQ/TCM System over Rayleigh Fading Channel --- p.42Chapter 4.1 --- Wireless Channel --- p.43Chapter 4.2 --- Rayleigh Fading Channel --- p.44Chapter 4.3 --- Idea Interleaving --- p.45Chapter 4.4 --- Receiver Structure --- p.46Chapter 4.5 --- Numerical Results --- p.47Chapter 4.5.1 --- Performance on 4-PAM Constellations --- p.48Chapter 4.5.2 --- Performance on 8-PAM Constellations --- p.50Chapter 4.5.3 --- Performance on 16-PAM Constellations --- p.52Chapter Chapter 5 --- Joint TCVQ/TCM System --- p.54Chapter 5.1 --- Trellis-Coded Vector Quantization --- p.55Chapter 5.1.1 --- Set Partitioning in TCVQ --- p.56Chapter 5.2 --- Joint TCVQ/TCM --- p.59Chapter 5.2.1 --- Set Partitioning and Index Assignments --- p.60Chapter 5.2.2 --- Gaussian-Markov Sources --- p.61Chapter 5.3 --- Simulation Results and Discussion --- p.62Chapter Chapter 6 --- Conclusion and Future Work --- p.64Chapter 6.1 --- Conclusion --- p.64Chapter 6.2 --- Future Works --- p.65Bibliography --- p.66Appendix-Publications --- p.7

    A robust low bit rate quad-band excitation LSP vocoder.

    Get PDF
    by Chiu Kim Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 103-108).Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Speech production --- p.2Chapter 1.2 --- Low bit rate speech coding --- p.4Chapter Chapter 2 --- Speech analysis & synthesis --- p.8Chapter 2.1 --- Linear prediction of speech signal --- p.8Chapter 2.2 --- LPC vocoder --- p.11Chapter 2.2.1 --- Pitch and voiced/unvoiced decision --- p.11Chapter 2.2.2 --- Spectral envelope representation --- p.15Chapter 2.3 --- Excitation --- p.16Chapter 2.3.1 --- Regular pulse excitation and Multipulse excitation --- p.16Chapter 2.3.2 --- Coded excitation and vector sum excitation --- p.19Chapter 2.4 --- Multiband excitation --- p.22Chapter 2.5 --- Multiband excitation vocoder --- p.25Chapter Chapter 3 --- Dual-band and Quad-band excitation --- p.31Chapter 3.1 --- Dual-band excitation --- p.31Chapter 3.2 --- Quad-band excitation --- p.37Chapter 3.3 --- Parameters determination --- p.41Chapter 3.3.1 --- Pitch detection --- p.41Chapter 3.3.2 --- Voiced/unvoiced pattern generation --- p.43Chapter 3.4 --- Excitation generation --- p.47Chapter Chapter 4 --- A low bit rate Quad-Band Excitation LSP Vocoder --- p.51Chapter 4.1 --- Architecture of QBELSP vocoder --- p.51Chapter 4.2 --- Coding of excitation parameters --- p.58Chapter 4.2.1 --- Coding of pitch value --- p.58Chapter 4.2.2 --- Coding of voiced/unvoiced pattern --- p.60Chapter 4.3 --- Spectral envelope estimation and coding --- p.62Chapter 4.3.1 --- Spectral envelope & the gain value --- p.62Chapter 4.3.2 --- Line Spectral Pairs (LSP) --- p.63Chapter 4.3.3 --- Coding of LSP frequencies --- p.68Chapter 4.3.4 --- Coding of gain value --- p.77Chapter Chapter 5 --- Performance evaluation --- p.80Chapter 5.1 --- Spectral analysis --- p.80Chapter 5.2 --- Subjective listening test --- p.93Chapter 5.2.1 --- Mean Opinion Score (MOS) --- p.93Chapter 5.2.2 --- Diagnostic Rhyme Test (DRT) --- p.96Chapter Chapter 6 --- Conclusions and discussions --- p.99References --- p.103Appendix A Subroutine of pitch detection --- p.A-I - A-IIIAppendix B Subroutine of voiced/unvoiced decision --- p.B-I - B-VAppendix C Subroutine of LPC coefficients calculation using Durbin's recursive method --- p.C-I - C-IIAppendix D Subroutine of LSP calculation using Chebyshev Polynomials --- p.D-I - D-IIIAppendix E Single syllable word pairs for Diagnostic Rhyme Test --- p.E-
    corecore