80 research outputs found

    Error resilience and concealment techniques for high-efficiency video coding

    Get PDF
    This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

    Compression and interoperable representation of genomic information

    Get PDF

    Distributed learning and inference in deep models

    Get PDF
    In recent years, the size of deep learning problems has been increased significantly, both in terms of the number of available training samples as well as the number of parameters and complexity of the model. In this thesis, we considered the challenges encountered in training and inference of large deep models, especially on nodes with limited computational power and capacity. We studied two classes of related problems; 1) distributed training of deep models, and 2) compression and restructuring of deep models for efficient distributed and parallel execution to reduce inference times. Especially, we considered the communication bottleneck in distributed training and inference of deep models. Data compression is a viable tool to mitigate the communication bottleneck in distributed deep learning. However, the existing methods suffer from a few drawbacks, such as the increased variance of stochastic gradients (SG), slower convergence rate, or added bias to SG. In my Ph.D. research, we have addressed these challenges from three different perspectives: 1) Information Theory and the CEO Problem, 2) Indirect SG compression via Matrix Factorization, and 3) Quantized Compressive Sampling. We showed, both theoretically and via simulations, that our proposed methods can achieve smaller MSE than other unbiased compression methods with fewer communication bit-rates, resulting in superior convergence rates. Next, we considered federated learning over wireless multiple access channels (MAC). Efficient communication requires the communication algorithm to satisfy the constraints imposed by the nodes in the network and the communication medium. To satisfy these constraints and take advantage of the over-the-air computation inherent in MAC, we proposed a framework based on random linear coding and developed efficient power management and channel usage techniques to manage the trade-offs between power consumption and communication bit-rate. In the second part of this thesis, we considered the distributed parallel implementation of an already-trained deep model on multiple workers. Since latency due to the synchronization and data transfer among workers adversely affects the performance of the parallel implementation, it is desirable to have minimum interdependency among parallel sub-models on the workers. To achieve this goal, we developed and analyzed RePurpose, an efficient algorithm to rearrange the neurons in the neural network and partition them (without changing the general topology of the neural network) such that the interdependency among sub-models is minimized under the computations and communications constraints of the workers.Ph.D

    Increasing temporal, structural, and spectral resolution in images using exemplar-based priors

    Get PDF
    In the past decade, camera manufacturers have offered smaller form factors, smaller pixel sizes (leading to higher resolution images), and faster processing chips to increase the performance of consumer cameras. However, these conventional approaches have failed to capitalize on the spatio-temporal redundancy inherent in images, nor have they adequately provided a solution for finding 33D point correspondences for cameras sampling different bands of the visible spectrum. In this thesis, we pose the following question---given the repetitious nature of image patches, and appropriate camera architectures, can statistical models be used to increase temporal, structural, or spectral resolution? While many techniques have been suggested to tackle individual aspects of this question, the proposed solutions either require prohibitively expensive hardware modifications and/or require overly simplistic assumptions about the geometry of the scene. We propose a two-stage solution to facilitate image reconstruction; 1) design a linear camera system that optically encodes scene information and 2) recover full scene information using prior models learned from statistics of natural images. By leveraging the tendency of small regions to repeat throughout an image or video, we are able to learn prior models from patches pulled from exemplar images. The quality of this approach will be demonstrated for two application domains, using low-speed video cameras for high-speed video acquisition and multi-spectral fusion using an array of cameras. We also investigate a conventional approach for finding 3D correspondence that enables a generalized assorted array of cameras to operate in multiple modalities, including multi-spectral, high dynamic range, and polarization imaging of dynamic scenes

    Quality aspects of Internet telephony

    Get PDF
    Internet telephony has had a tremendous impact on how people communicate. Many now maintain contact using some form of Internet telephony. Therefore the motivation for this work has been to address the quality aspects of real-world Internet telephony for both fixed and wireless telecommunication. The focus has been on the quality aspects of voice communication, since poor quality leads often to user dissatisfaction. The scope of the work has been broad in order to address the main factors within IP-based voice communication. The first four chapters of this dissertation constitute the background material. The first chapter outlines where Internet telephony is deployed today. It also motivates the topics and techniques used in this research. The second chapter provides the background on Internet telephony including signalling, speech coding and voice Internetworking. The third chapter focuses solely on quality measures for packetised voice systems and finally the fourth chapter is devoted to the history of voice research. The appendix of this dissertation constitutes the research contributions. It includes an examination of the access network, focusing on how calls are multiplexed in wired and wireless systems. Subsequently in the wireless case, we consider how to handover calls from 802.11 networks to the cellular infrastructure. We then consider the Internet backbone where most of our work is devoted to measurements specifically for Internet telephony. The applications of these measurements have been estimating telephony arrival processes, measuring call quality, and quantifying the trend in Internet telephony quality over several years. We also consider the end systems, since they are responsible for reconstructing a voice stream given loss and delay constraints. Finally we estimate voice quality using the ITU proposal PESQ and the packet loss process. The main contribution of this work is a systematic examination of Internet telephony. We describe several methods to enable adaptable solutions for maintaining consistent voice quality. We have also found that relatively small technical changes can lead to substantial user quality improvements. A second contribution of this work is a suite of software tools designed to ascertain voice quality in IP networks. Some of these tools are in use within commercial systems today

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

    Residual-excited linear predictive (RELP) vocoder system with TMS320C6711 DSK and vowel characterization

    Get PDF
    The area of speech recognition by machine is one of the most popular and complicated subjects in the current multimedia field. Linear predictive coding (LPC) is a useful technique for voice coding in speech analysis and synthesis. The first objective of this research was to establish a prototype of the residual-excited linear predictive (RELP) vocoder system in a real-time environment. Although its transmission rate is higher, the quality of synthesized speech of the RELP vocoder is superior to that of other vocoders. As well, it is rather simple and robust to implement. The RELP vocoder uses residual signals as excitation rather than periodic pulse or white noise. The RELP vocoder was implemented with Texas Instruments TMS320C6711 DSP starter kit (DSK) using C. Identifying vowel sounds is an important element in recognizing speech contents. The second objective of research was to explore a method of characterizing vowels by means of parameters extracted by the RELP vocoder, which was not known to have been used in speech recognition, previously. Five English vowels were chosen for the experimental sample. Utterances of individual vowel sounds and of the vowel sounds in one-syllable-words were recorded and saved as WAVE files. A large sample of 20-ms vowel segments was obtained from these utterances. The presented method utilized 20 samples of a segment's frequency response, taken equally in logarithmic scale, as a LPC frequency response vector. The average of each vowel's vectors was calculated. The Euclidian distances between the average vectors of the five vowels and an unknown vector were compared to classify the unknown vector into a certain vowel group. The results indicate that, when a vowel is uttered alone, the distance to its average vector is smaller than to the other vowels' average vectors. By examining a given vowel frequency response against all known vowels' average vectors, individually, one can determine to which vowel group the given vowel belongs. When a vowel is uttered with consonants, however, variances and covariances increase. In some cases, distinct differences may not be recognized among the distances to a vowel's own average vector and the distances to the other vowels' average vectors. Overall, the results of vowel characterization did indicate an ability of the RELP vocoder to identify and classify single vowel sounds
    • …
    corecore