23 research outputs found

    Rate-distortion analysis and traffic modeling of scalable video coders

    Get PDF
    In this work, we focus on two important goals of the transmission of scalable video over the Internet. The first goal is to provide high quality video to end users and the second one is to properly design networks and predict network performance for video transmission based on the characteristics of existing video traffic. Rate-distortion (R-D) based schemes are often applied to improve and stabilize video quality; however, the lack of R-D modeling of scalable coders limits their applications in scalable streaming. Thus, in the first part of this work, we analyze R-D curves of scalable video coders and propose a novel operational R-D model. We evaluate and demonstrate the accuracy of our R-D function in various scalable coders, such as Fine Granular Scalable (FGS) and Progressive FGS coders. Furthermore, due to the time-constraint nature of Internet streaming, we propose another operational R-D model, which is accurate yet with low computational cost, and apply it to streaming applications for quality control purposes. The Internet is a changing environment; however, most quality control approaches only consider constant bit rate (CBR) channels and no specific studies have been conducted for quality control in variable bit rate (VBR) channels. To fill this void, we examine an asymptotically stable congestion control mechanism and combine it with our R-D model to present smooth visual quality to end users under various network conditions. Our second focus in this work concerns the modeling and analysis of video traffic, which is crucial to protocol design and efficient network utilization for video transmission. Although scalable video traffic is expected to be an important source for the Internet, we find that little work has been done on analyzing or modeling it. In this regard, we develop a frame-level hybrid framework for modeling multi-layer VBR video traffic. In the proposed framework, the base layer is modeled using a combination of wavelet and time-domain methods and the enhancement layer is linearly predicted from the base layer using the cross-layer correlation

    Novel Motion Anchoring Strategies for Wavelet-based Highly Scalable Video Compression

    Full text link
    This thesis investigates new motion anchoring strategies that are targeted at wavelet-based highly scalable video compression (WSVC). We depart from two practices that are deeply ingrained in existing video compression systems. Instead of the commonly used block motion, which has poor scalability attributes, we employ piecewise-smooth motion together with a highly scalable motion boundary description. The combination of this more “physical” motion description together with motion discontinuity information allows us to change the conventional strategy of anchoring motion at target frames to anchoring motion at reference frames, which improves motion inference across time. In the proposed reference-based motion anchoring strategies, motion fields are mapped from reference to target frames, where they serve as prediction references; during this mapping process, disoccluded regions are readily discovered. Observing that motion discontinuities displace with foreground objects, we propose motion-discontinuity driven motion mapping operations that handle traditionally challenging regions around moving objects. The reference-based motion anchoring exposes an intricate connection between temporal frame interpolation (TFI) and video compression. When employed in a compression system, all anchoring strategies explored in this thesis perform TFI once all residual information is quantized to zero at a given temporal level. The interpolation performance is evaluated on both natural and synthetic sequences, where we show favourable comparisons with state-of-the-art TFI schemes. We explore three reference-based motion anchoring strategies. In the first one, the motion anchoring is “flipped” with respect to a hierarchical B-frame structure. We develop an analytical model to determine the weights of the different spatio-temporal subbands, and assess the suitability and benefits of this reference-based WSVC for (highly scalable) video compression. Reduced motion coding cost and improved frame prediction, especially around moving objects, result in improved rate-distortion performance compared to a target-based WSVC. As the thesis evolves, the motion anchoring is progressively simplified to one where all motion is anchored at one base frame; this central motion organization facilitates the incorporation of higher-order motion models, which improve the prediction performance in regions following motion with non-constant velocity

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Scalable video compression with optimized visual performance and random accessibility

    Full text link
    This thesis is concerned with maximizing the coding efficiency, random accessibility and visual performance of scalable compressed video. The unifying theme behind this work is the use of finely embedded localized coding structures, which govern the extent to which these goals may be jointly achieved. The first part focuses on scalable volumetric image compression. We investigate 3D transform and coding techniques which exploit inter-slice statistical redundancies without compromising slice accessibility. Our study shows that the motion-compensated temporal discrete wavelet transform (MC-TDWT) practically achieves an upper bound to the compression efficiency of slice transforms. From a video coding perspective, we find that most of the coding gain is attributed to offsetting the learning penalty in adaptive arithmetic coding through 3D code-block extension, rather than inter-frame context modelling. The second aspect of this thesis examines random accessibility. Accessibility refers to the ease with which a region of interest is accessed (subband samples needed for reconstruction are retrieved) from a compressed video bitstream, subject to spatiotemporal code-block constraints. We investigate the fundamental implications of motion compensation for random access efficiency and the compression performance of scalable interactive video. We demonstrate that inclusion of motion compensation operators within the lifting steps of a temporal subband transform incurs a random access penalty which depends on the characteristics of the motion field. The final aspect of this thesis aims to minimize the perceptual impact of visible distortion in scalable reconstructed video. We present a visual optimization strategy based on distortion scaling which raises the distortion-length slope of perceptually significant samples. This alters the codestream embedding order during post-compression rate-distortion optimization, thus allowing visually sensitive sites to be encoded with higher fidelity at a given bit-rate. For visual sensitivity analysis, we propose a contrast perception model that incorporates an adaptive masking slope. This versatile feature provides a context which models perceptual significance. It enables scene structures that otherwise suffer significant degradation to be preserved at lower bit-rates. The novelty in our approach derives from a set of "perceptual mappings" which account for quantization noise shaping effects induced by motion-compensated temporal synthesis. The proposed technique reduces wavelet compression artefacts and improves the perceptual quality of video

    Real-time scalable video coding for surveillance applications on embedded architectures

    Get PDF

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Quality of service technologies for multimedia applications in next generation networks

    Get PDF
    Next Generation Networks are constantly evolving towards solutions that allow the operator to provide advanced multimedia applications with QoS guarantees in heterogeneous, multi-domain and multi-services networks. Other than the unquestionable advantages inherent the ability to simultaneously handle traffic flows at different QoS levels, these architectures require management systems to efficiently perform quality guarantees and network resource utilization. These issues have been addressed in this thesis. DiffServ-aware Traffic Engineering (DS-TE) has been considered as reference architecture for the deployment of the quality management systems. It represents the most advanced technology to accomplish either network scalability and service granularity goals. On the basis of DS-TE features, a methodology for traffic and network resource management has been defined. It provides some rules for QoS service characterization and allows to implement Traffic Engineering policies with a class-based approach. A set of basic parameters for quality evaluation has been defined, that are the Key Performance Indicators; some mathematical model to derive the statistical nature of traffic have been analyzed and an algorithm to improve the fulfillment of quality of service targets and to optimize network resource utilization. It is aimed at reducing the complexity inherent the setting of some of the key parameters in the NGN architectures. Multidomain scenarios with technologies different from DS-TE have been also evaluated, defining some methodologies for network interoperability. Simulations with Opnet Modeler confirmed the efficacy of the proposed system in computing network configurations with QoS targets. With regard to QoS performance at the application level, video streaming applications in wireless domains have been particularly addressed. A rate control algorithm to adjust the rate on a per-window basis has been defined, making use of a short-term prediction of the network delay to keep the probability of playback buffer starvation lower than a desired threshold during each window. Finally, a framework for mutual authentication in web applications has been proposed and evaluated. It integrates an IBA password technique with a challenge-response scheme based on a shared secret key for image scrambling. The wireless environment is mainly addressed by the proposed system, which tries to overcome the severe constraints on security, data transmission capability and user friendliness imposed by such environment

    Discontinuity-Aware Base-Mesh Modeling of Depth for Scalable Multiview Image Synthesis and Compression

    Full text link
    This thesis is concerned with the challenge of deriving disparity from sparsely communicated depth for performing disparity-compensated view synthesis for compression and rendering of multiview images. The modeling of depth is essential for deducing disparity at view locations where depth is not available and is also critical for visibility reasoning and occlusion handling. This thesis first explores disparity derivation methods and disparity-compensated view synthesis approaches. Investigations reveal the merits of adopting a piece-wise continuous mesh description of depth for deriving disparity at target view locations to enable disparity-compensated backward warping of texture. Visibility information can be reasoned due to the correspondence relationship between views that a mesh model provides, while the connectivity of a mesh model assists in resolving depth occlusion. The recent JPEG 2000 Part-17 extension defines tools for scalable coding of discontinuous media using breakpoint-dependent DWT, where breakpoints describe discontinuity boundary geometry. This thesis proposes a method to efficiently reconstruct depth coded using JPEG 2000 Part-17 as a piece-wise continuous mesh, where discontinuities are driven by the encoded breakpoints. Results show that the proposed mesh can accurately represent decoded depth while its complexity scales along with decoded depth quality. The piece-wise continuous mesh model anchored at a single viewpoint or base-view can be augmented to form a multi-layered structure where the underlying layers carry depth information of regions that are occluded at the base-view. Such a consolidated mesh representation is termed a base-mesh model and can be projected to many viewpoints, to deduce complete disparity fields between any pair of views that are inherently consistent. Experimental results demonstrate the superior performance of the base-mesh model in multiview synthesis and compression compared to other state-of-the-art methods, including the JPEG Pleno light field codec. The proposed base-mesh model departs greatly from conventional pixel-wise or block-wise depth models and their forward depth mapping for deriving disparity ingrained in existing multiview processing systems. When performing disparity-compensated view synthesis, there can be regions for which reference texture is unavailable, and inpainting is required. A new depth-guided texture inpainting algorithm is proposed to restore occluded texture in regions where depth information is either available or can be inferred using the base-mesh model

    Exposing a waveform interface to the wireless channel for scalable video broadcast

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 157-167).Video broadcast and mobile video challenge the conventional wireless design. In broadcast and mobile scenarios the bit-rate supported by the channel differs across receivers and varies quickly over time. The conventional design however forces the source to pick a single bit-rate and degrades sharply when the channel cannot support it. This thesis presents SoftCast, a clean-slate design for wireless video where the source transmits one video stream that each receiver decodes to a video quality commensurate with its specific instantaneous channel quality. To do so, SoftCast ensures the samples of the digital video signal transmitted on the channel are linearly related to the pixels' luminance. Thus, when channel noise perturbs the transmitted signal samples, the perturbation naturally translates into approximation in the original video pixels. Hence, a receiver with a good channel (low noise) obtains a high fidelity video, and a receiver with a bad channel (high noise) obtains a low fidelity video. SoftCast's linear design in essence resembles the traditional analog approach to communication, which was abandoned in most major communication systems, as it does not enjoy the theoretical opimality of the digital separate design in point-topoint channels nor its effectiveness at compressing the source data. In this thesis, I show that in combination with decorrelating transforms common to modern digital video compression, the analog approach can achieve performance competitive with the prevalent digital design for a wide variety of practical point-to-point scenarios, and outperforms it in the broadcast and mobile scenarios. Since the conventional bit-pipe interface of the wireless physical layer (PHY) forces the separation of source and channel coding, to realize SoftCast, architectural changes to the wireless PHY are necessary. This thesis discusses the design of RawPHY, a reorganization of the PHY which exposes a waveform interface to the channel while shielding the designers of the higher layers from much of the perplexity of the wireless channel. I implement SoftCast and RawPHY using the GNURadio software and the USRP platform. Results from a 20-node testbed show that SoftCast improves the average video quality (i.e., PSNR) across diverse broadcast receivers in our testbed by up to 5.5 dB in comparison to conventional single- or multi-layer video. Even for a single receiver, it eliminates video glitches caused by mobility and increases robustness to packet loss by an order of magnitude.by Szymon Kazimierz Jakubczak.Ph.D

    Semantic and effective communications

    Get PDF
    Shannon and Weaver categorized communications into three levels of problems: the technical problem, which tries to answer the question "how accurately can the symbols of communication be transmitted?"; the semantic problem, which asks the question "how precisely do the transmitted symbols convey the desired meaning?"; the effectiveness problem, which strives to answer the question "how effectively does the received meaning affect conduct in the desired way?". Traditionally, communication technologies mainly addressed the technical problem, ignoring the semantics or the effectiveness problems. Recently, there has been increasing interest to address the higher level semantic and effectiveness problems, with proposals ranging from semantic to goal oriented communications. In this thesis, we propose to formulate the semantic problem as a joint source-channel coding (JSCC) problem and the effectiveness problem as a multi-agent partially observable Markov decision process (MA-POMDP). As such, for the semantic problem, we propose DeepWiVe, the first-ever end-to-end JSCC video transmission scheme that leverages the power of deep neural networks (DNNs) to directly map video signals to channel symbols, combining video compression, channel coding, and modulation steps into a single neural transform. We also further show that it is possible to use predefined constellation designs as well as secure the physical layer communication against eavesdroppers for deep learning (DL) driven JSCC schemes, making such schemes much more viable for deployment in the real world. For the effectiveness problem, we propose a novel formulation by considering multiple agents communicating over a noisy channel in order to achieve better coordination and cooperation in a multi-agent reinforcement learning (MARL) framework. Specifically, we consider a MA-POMDP, in which the agents, in addition to interacting with the environment, can also communicate with each other over a noisy communication channel. The noisy communication channel is considered explicitly as part of the dynamics of the environment, and the message each agent sends is part of the action that the agent can take. As a result, the agents learn not only to collaborate with each other but also to communicate "effectively'' over a noisy channel. Moreover, we show that this framework generalizes both the semantic and technical problems. In both instances, we show that the resultant communication scheme is superior to one where the communication is considered separately from the underlying semantic or goal of the problem.Open Acces
    corecore