274 research outputs found

    Rate Control for Low Delay Video Communication of H.264 Standard

    Get PDF

    Complexity management of H.264/AVC video compression.

    Get PDF
    The H. 264/AVC video coding standard offers significantly improved compression efficiency and flexibility compared to previous standards. However, the high computational complexity of H. 264/AVC is a problem for codecs running on low-power hand held devices and general purpose computers. This thesis presents new techniques to reduce, control and manage the computational complexity of an H. 264/AVC codec. A new complexity reduction algorithm for H. 264/AVC is developed. This algorithm predicts "skipped" macroblocks prior to motion estimation by estimating a Lagrange ratedistortion cost function. Complexity savings are achieved by not processing the macroblocks that are predicted as "skipped". The Lagrange multiplier is adaptively modelled as a function of the quantisation parameter and video sequence statistics. Simulation results show that this algorithm achieves significant complexity savings with a negligible loss in rate-distortion performance. The complexity reduction algorithm is further developed to achieve complexity-scalable control of the encoding process. The Lagrangian cost estimation is extended to incorporate computational complexity. A target level of complexity is maintained by using a feedback algorithm to update the Lagrange multiplier associated with complexity. Results indicate that scalable complexity control of the encoding process can be achieved whilst maintaining near optimal complexity-rate-distortion performance. A complexity management framework is proposed for maximising the perceptual quality of coded video in a real-time processing-power constrained environment. A real-time frame-level control algorithm and a per-frame complexity control algorithm are combined in order to manage the encoding process such that a high frame rate is maintained without significantly losing frame quality. Subjective evaluations show that the managed complexity approach results in higher perceptual quality compared to a reference encoder that drops frames in computationally constrained situations. These novel algorithms are likely to be useful in implementing real-time H. 264/AVC standard encoders in computationally constrained environments such as low-power mobile devices and general purpose computers

    Platforms for handling and development of audiovisual data

    Get PDF
    Estágio realizado na MOG Solutions e orientado por Vítor TeixeiraTese de mestrado integrado. Engenharia Informátca e Computação. Faculdade de Engenharia. Universidade do Porto. 200

    Acquisition, compression and rendering of depth and texture for multi-view video

    Get PDF
    Three-dimensional (3D) video and imaging technologies is an emerging trend in the development of digital video systems, as we presently witness the appearance of 3D displays, coding systems, and 3D camera setups. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. This technique especially enables applications such as freeviewpoint video or 3D-TV. Free-viewpoint video applications provide the feature to interactively select and render a virtual viewpoint of the scene. A 3D experience such as for example in 3D-TV is obtained if the data representation and display enable to distinguish the relief of the scene, i.e., the depth within the scene. With 3D-TV, the depth of the scene can be perceived using a multi-view display that renders simultaneously several views of the same scene. To render these multiple views on a remote display, an efficient transmission, and thus compression of the multi-view video is necessary. However, a major problem when dealing with multiview video is the intrinsically large amount of data to be compressed, decompressed and rendered. We aim at an efficient and flexible multi-view video system, and explore three different aspects. First, we develop an algorithm for acquiring a depth signal from a multi-view setup. Second, we present efficient 3D rendering algorithms for a multi-view signal. Third, we propose coding techniques for 3D multi-view signals, based on the use of an explicit depth signal. This motivates that the thesis is divided in three parts. The first part (Chapter 3) addresses the problem of 3D multi-view video acquisition. Multi-view video acquisition refers to the task of estimating and recording a 3D geometric description of the scene. A 3D description of the scene can be represented by a so-called depth image, which can be estimated by triangulation of the corresponding pixels in the multiple views. Initially, we focus on the problem of depth estimation using two views, and present the basic geometric model that enables the triangulation of corresponding pixels across the views. Next, we review two calculation/optimization strategies for determining corresponding pixels: a local and a one-dimensional optimization strategy. Second, to generalize from the two-view case, we introduce a simple geometric model for estimating the depth using multiple views simultaneously. Based on this geometric model, we propose a new multi-view depth-estimation technique, employing a one-dimensional optimization strategy that (1) reduces the noise level in the estimated depth images and (2) enforces consistent depth images across the views. The second part (Chapter 4) details the problem of multi-view image rendering. Multi-view image rendering refers to the process of generating synthetic images using multiple views. Two different rendering techniques are initially explored: a 3D image warping and a mesh-based rendering technique. Each of these methods has its limitations and suffers from either high computational complexity or low image rendering quality. As a consequence, we present two image-based rendering algorithms that improves the balance on the aforementioned issues. First, we derive an alternative formulation of the relief texture algorithm which was extented to the geometry of multiple views. The proposed technique features two advantages: it avoids rendering artifacts ("holes") in the synthetic image and it is suitable for execution on a standard Graphics Processor Unit (GPU). Second, we propose an inverse mapping rendering technique that allows a simple and accurate re-sampling of synthetic pixels. Experimental comparisons with 3D image warping show an improvement of rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique. The third part concentrates on the compression problem of multi-view texture and depth video (Chapters 5–7). In Chapter 5, we extend the standard H.264/MPEG-4 AVC video compression algorithm for handling the compression of multi-view video. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed encoder peforms the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. To this end, two different approaches for predictive coding of views have been investigated: a block-based disparity-compensated prediction technique and a View Synthesis Prediction (VSP) scheme. Whereas VSP relies on an accurate depth image, the block-based disparity-compensated prediction scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4AVC disparitycompensated prediction. Additionally, we discuss the trade-off between the random-access to a user-selected view and the coding efficiency. Experimental results illustrating and quantifying this trade-off are provided. In Chapter 6, we focus on the compression of a depth signal. We present a novel depth image coding algorithm which concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using parameterized piecewiselinear functions and sharp edges by a straight line, so that it is more efficient than a conventional transform-based encoder. To optimize the quality of the coding system for a given bit rate, a special global rate-distortion optimization balances the rate against the accuracy of the signal representation. For typical bit rates, i.e., between 0.01 and 0.25 bit/pixel, experiments have revealed that the coder outperforms a standard JPEG-2000 encoder by 0.6-3.0 dB. Preliminary results were published in the Proceedings of 26th Symposium on Information Theory in the Benelux. In Chapter 7, we propose a novel joint depth-texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves, to obtain a single R-D surface that allows the optimization of the joint bit-allocation in relation to the obtained rendering quality. Experimental results show an estimated gain of 1 dB compared to a compression performed without joint bit-allocation optimization. Besides this, our joint R-D model can be readily integrated into an multi-view H.264/MPEG-4 AVC coder because it yields the optimal compression setting with a limited computation effort

    Analysis of coding tools and improvement of text readability for screen content

    Full text link
    Abstract—Current video coding standards perform well for video sequences captured by a real camera. The aperture of the camera’s optical system smooths the content and attenuates higher frequencies. New application scenarios, enabled by the growing number of high bit rate internet gateways, however, make it necessary to take a closer look at the efficiency of such standards in handling artificial content. Remote desktop appli-cations for example often include text parts. As a consequence, these content types contain sharp edges or high frequencies, which are considered less important in natural video and are therefore treated less carefully. The frequent result is an increased occurrence of artefacts or the loss of information that is actually important to the user. This paper gives an analysis of such artificially created video sequences, evaluates the performance of current coding tools for this type of content and proposes a simple, yet effective way to maintain readability of text within video material using only well considered encoder control and without the need of large additional modules. I

    Serviços Web para o processamento e gestão de conteúdo A/V em ambientes profissionais de TV

    Get PDF
    Estágio realizado na MOG Solutions, S. ATese de mestrado integrado. Engenharia Electrotécnica e de Computadores - Major Telecommunications. Faculdade de Engenharia. Universidade do Porto. 200

    Seminario sullo Standard MPEG-4: utilizzo ed aspetti implementativi

    Get PDF
    Una delle tecnologie chiave che hanno permesso il grande sviluppo della televisione digitale è la compressione video. La tecnologia di codifica video nota come MPEG-2, sviluppata nei primi anni novanta, è diventata lo standard di trasmissione DTV (Digital TV) sia satellitare sia terrestre in quasi tutti i paesi del mondo. Da allora la velocità dei microprocessori e le capacità di memoria dei dispositivi hardware per la codifica e la decodifica sono migliorate significativamente rendendo possibile lo sviluppo e l’implementazione di algoritmi di codifica innovativi in grado di abbattere significativamente i limiti di compressione dello standard MPEG-2. Tali innovazioni, sfociate nel 2003 nello standard MPEG-4 AVC (Advanced Video Coding), non hanno permesso di mantenere la compatibilità all’indietro con l’MPEG-2, e questo ha inizialmente costituito un limite alla loro introduzione nei sistemi di trasmissione DTV. Tuttavia, negli ultimi anni la codifica MPEG-4 AVC si è diffusa rapidamente, è stata adottata dal progetto DVB, recentemente dall’ATSC, ed è lo standard di codifica nell’IPTV. L’obiettivo di questo seminario, che si articola in due giornate, è quello di presentare lo standard di codifica MPEG-4 AVC con particolare attenzione agli aspetti implementativi del livello di codifica video.2008-11-18Sardegna Ricerche, Edificio 2, Località Piscinamanna 09010 Pula (CA) - ItaliaSeminario sullo Standard MPEG-4: utilizzo ed aspetti implementativ

    Understanding user experience of mobile video: Framework, measurement, and optimization

    Get PDF
    Since users have become the focus of product/service design in last decade, the term User eXperience (UX) has been frequently used in the field of Human-Computer-Interaction (HCI). Research on UX facilitates a better understanding of the various aspects of the user’s interaction with the product or service. Mobile video, as a new and promising service and research field, has attracted great attention. Due to the significance of UX in the success of mobile video (Jordan, 2002), many researchers have centered on this area, examining users’ expectations, motivations, requirements, and usage context. As a result, many influencing factors have been explored (Buchinger, Kriglstein, Brandt & Hlavacs, 2011; Buchinger, Kriglstein & Hlavacs, 2009). However, a general framework for specific mobile video service is lacking for structuring such a great number of factors. To measure user experience of multimedia services such as mobile video, quality of experience (QoE) has recently become a prominent concept. In contrast to the traditionally used concept quality of service (QoS), QoE not only involves objectively measuring the delivered service but also takes into account user’s needs and desires when using the service, emphasizing the user’s overall acceptability on the service. Many QoE metrics are able to estimate the user perceived quality or acceptability of mobile video, but may be not enough accurate for the overall UX prediction due to the complexity of UX. Only a few frameworks of QoE have addressed more aspects of UX for mobile multimedia applications but need be transformed into practical measures. The challenge of optimizing UX remains adaptations to the resource constrains (e.g., network conditions, mobile device capabilities, and heterogeneous usage contexts) as well as meeting complicated user requirements (e.g., usage purposes and personal preferences). In this chapter, we investigate the existing important UX frameworks, compare their similarities and discuss some important features that fit in the mobile video service. Based on the previous research, we propose a simple UX framework for mobile video application by mapping a variety of influencing factors of UX upon a typical mobile video delivery system. Each component and its factors are explored with comprehensive literature reviews. The proposed framework may benefit in user-centred design of mobile video through taking a complete consideration of UX influences and in improvement of mobile videoservice quality by adjusting the values of certain factors to produce a positive user experience. It may also facilitate relative research in the way of locating important issues to study, clarifying research scopes, and setting up proper study procedures. We then review a great deal of research on UX measurement, including QoE metrics and QoE frameworks of mobile multimedia. Finally, we discuss how to achieve an optimal quality of user experience by focusing on the issues of various aspects of UX of mobile video. In the conclusion, we suggest some open issues for future study

    Description-driven Adaptation of Media Resources

    Get PDF
    The current multimedia landscape is characterized by a significant diversity in terms of available media formats, network technologies, and device properties. This heterogeneity has resulted in a number of new challenges, such as providing universal access to multimedia content. A solution for this diversity is the use of scalable bit streams, as well as the deployment of a complementary system that is capable of adapting scalable bit streams to the constraints imposed by a particular usage environment (e.g., the limited screen resolution of a mobile device). This dissertation investigates the use of an XML-driven (Extensible Markup Language) framework for the format-independent adaptation of scalable bit streams. Using this approach, the structure of a bit stream is first translated into an XML description. In a next step, the resulting XML description is transformed to reflect a desired adaptation of the bit stream. Finally, the transformed XML description is used to create an adapted bit stream that is suited for playback in the targeted usage environment. The main contribution of this dissertation is BFlavor, a new tool for exposing the syntax of binary media resources as an XML description. Its development was inspired by two other technologies, i.e. MPEG-21 BSDL (Bitstream Syntax Description Language) and XFlavor (Formal Language for Audio-Visual Object Representation, extended with XML features). Although created from a different point of view, both languages offer solutions for translating the syntax of a media resource into an XML representation for further processing. BFlavor (BSDL+XFlavor) harmonizes the two technologies by combining their strengths and eliminating their weaknesses. The expressive power and performance of a BFlavor-based content adaptation chain, compared to tool chains entirely based on either BSDL or XFlavor, were investigated by several experiments. One series of experiments targeted the exploitation of multi-layered temporal scalability in H.264/AVC, paying particular attention to the use of sub-sequences and hierarchical coding patterns, as well as to the use of metadata messages to communicate the bit stream structure to the adaptation logic. BFlavor was the only tool to offer an elegant and practical solution for XML-driven adaptation of H.264/AVC bit streams in the temporal domain
    • …
    corecore