192 research outputs found

    XML-driven exploitation of combined scalability in scalable H.264/AVC bitstreams

    Get PDF
    The heterogeneity in the contemporary multimedia environments requires a format-agnostic adaptation framework for the consumption of digital video content. Scalable bitstreams can be used in order to satisfy as many circumstances as possible. In this paper, the scalable extension on the H.264/AVC specification is used to obtain the parent bitstreams. The adaptation along the combined scalability axis of the bitstreams is done in a format-independent manner. Therefore, an abstraction layer of the bitstream is needed. In this paper, XML descriptions are used representing the high-level structure of the bitstreams by relying on the MPEG-21 Bitstream Syntax Description Language standard. The exploitation of the combined scalability is executed in the XML domain by implementing the adaptation process in a Streaming Transformation for XML (STX) stylesheet. The algorithm used in the transformation of the XML description is discussed in detail in this paper. From the performance measurements, one can conclude that the STX transformation in the XML domain and the generation of the corresponding adapted bitstream can be realized in real time

    Optimized algorithms for multimedia streaming

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    On the impact of the GOP size in a temporal H.264/AVC-to-SVC transcoder in baseline and main profile

    Get PDF
    Scalable video coding is a recent extension of the advanced video coding H.264/AVC standard developed jointly by ISO/IEC and ITU-T, which allows adapting the bitstream easily by dropping parts of it named layers. This adaptation makes it possible for a single bitstream to meet the requirements for reliable delivery of video to diverse clients over heterogeneous networks using temporal, spatial or quality scalability, combined or separately. Since the scalable video coding design requires scalability to be provided at the encoder side, existing content cannot benefit from it. Efficient techniques for converting contents without scalability to a scalable format are desirable. In this paper, an approach for temporal scalability transcoding from H.264/AVC to scalable video coding in baseline and main profile is presented and the impact of the GOP size is analyzed. Independently of the GOP size chosen, time savings of around 63 % for baseline profile and 60 % for main profile are achieved while maintaining the coding efficiency

    Personalizing quality aspects for video communication in constrained heterogeneous environments

    Get PDF
    The world of multimedia communication is drastically evolving since a few years. Advanced compression formats for audiovisual information arise, new types of wired and wireless networks are developed, and a broad range of different types of devices capable of multimedia communication appear on the market. The era where multimedia applications available on the Internet were the exclusive domain of PC users has passed. The next generation multimedia applications will be characterized by heterogeneity: differences in terms of the networks, devices and user expectations. This heterogeneity causes some new challenges: transparent consumption of multimedia content is needed in order to be able to reach a broad audience. Recently, two important technologies have appeared that can assist in realizing such transparent Universal Multimedia Access. The first technology consists of new scalable or layered content representation schemes. Such schemes are needed in order to make it possible that a multimedia stream can be consumed by devices with different capabilities and transmitted over network connections with different characteristics. The second technology does not focus on the content representation itself, but rather on linking information about the content, so-called metadata, to the content itself. One of the possible uses of metadata is in the automatic selection and adaptation of multimedia presentations. This is one of the main goals of the MPEG-21 Multimedia Framework. Within the MPEG-21 standard, two formats were developed that can be used for bitstream descriptions. Such descriptions can act as an intermediate layer between a scalable bitstream and the adaptation process. This way, format-independent bitstream adaptation engines can be built. Furthermore, it is straightforward to add metadata information to the bitstream description, and use this information later on during the adaptation process. Because of the efforts spent on bitstream descriptions during our research, a lot of attention is devoted to this topic in this thesis. We describe both frameworks for bitstream descriptions that were standardized by MPEG. Furthermore, we focus on our own contributions in this domain: we developed a number of bitstream schemas and transformation examples for different types of multimedia content. The most important objective of this thesis is to describe a content negotiation process that uses scalable bitstreams in a generic way. In order to be able to express such an application, we felt the need for a better understanding of the data structures, in particular scalable bitstreams, on which this content negotiation process operates. Therefore, this thesis introduces a formal model we developed capable of describing the fundamental concepts of scalable bitstreams and their relations. Apart from the definition of the theoretical model itself, we demonstrate its correctness by applying it to a number of existing formats for scalable bitstreams. Furthermore, we attempt to formulate a content negotiation process as a constrained optimization problem, by means of the notations defined in the abstract model. In certain scenarios, the representation of a content negotiation process as a constrained optimization problem does not sufficiently reflect reality, especially when scalable bitstreams with multiple quality dimensions are involved. In such case, several versions of the same original bitstream can meet all constraints imposed by the system. Sometimes one version clearly offers a better quality towards the end user than another one, but in some cases, it is not possible to objectively compare two versions without additional information. In such a situation, a trade-off will have to be made between the different quality aspects. We use Pareto's theory of multi-criteria optimization for formally describing the characteristics of a content negotiation process for scalable bitstreams with multiple quality dimensions. This way, we can modify our definition of a content negotiation process into a multi-criteria optimization problem. One of the most important problems with multi-criteria optimization problems is that multiple candidate optimal solutions may exist. Additional information, e.g. user preferences, is needed if a single optimal solution has to be selected. Such multi-criteria optimization problems are not new. Unfortunately, existing solutions for selecting one optimal version are not suitable in a content negotiation scenario, because they expect detailed understanding of the problem from the decision maker, in our case the end user. In this thesis, we propose a scenario in which a so-called content negotiation agent would give some sample video sequences to the end user, asking him to select which sequence he liked the most. This information would be used for training the agent: a model would be built representing the preferences of the end user, and this model can be used later on for selecting one solution from a set of candidate optimal solutions. Based on a literature study, we propose two candidate algorithms in this thesis that can be used in such a content negotiation agent. It is possible to use these algorithms for constructing a model of the user's preferences by means of a number of examples, and to use this model when selecting an optimal version. The first algorithm considers the quality of a video sequence as a weighted sum of a number of independent quality aspects, and derives a system of linear inequalities from the example decisions. The second algorithm, called 1ARC, is actually a nearest-neighbor approach, where predictions are made based on the similarity with the example decisions entered by the user. This thesis analyzes the strengths and weaknesses of both algorithms from multiple points of view. The computational complexity of both algorithms is discussed, possible parameters that can influence the reliability of the algorithm, and the reliability itself. For measuring this kind of performance, we set up a test in which human subjects are asked to make a number of pairwise decisions between two versions of the same original video sequence. The reliability of the two algorithms we proposed is tested by selecting a part of these decisions for training a model, and by observing if this model is able to predict other decisions entered by the same user. We not only compare both algorithms, but we also observe the result of modifying several parameters on both algorithms. Ultimately, we conclude that the 1ARC algorithm has an acceptable performance, certainly if the training set is sufficiently large. The reliability is better than what would be theoretically achievable by any other algorithm that selects one optimal version from a set of candidate versions, but does not try to capture the user's preferences. Still, the results that we achieve are not as good as what we initially hoped. One possible cause may be the fact that the algorithms we proposed currently do not take sequence characteristics (e.g. the amount of motion) into account. Other improvements may be possible by means of a more accurate description of the quality aspects that we take into account, in particular the spatial resolution, the amount of distortion and the smoothness of a video sequence. Despite the limitations of the algorithms we proposed, in their performance as well as in their application area, we think that this thesis contains an initial and original contribution to the emerging objective of realizing Quality of Experience in multimedia applications

    Layered Wyner-Ziv video coding: a new approach to video compression and delivery

    Get PDF
    Following recent theoretical works on successive Wyner-Ziv coding, we propose a practical layered Wyner-Ziv video coder using the DCT, nested scalar quantiza- tion, and irregular LDPC code based Slepian-Wolf coding (or lossless source coding with side information at the decoder). Our main novelty is to use the base layer of a standard scalable video coder (e.g., MPEG-4/H.26L FGS or H.263+) as the decoder side information and perform layered Wyner-Ziv coding for quality enhance- ment. Similar to FGS coding, there is no performance di®erence between layered and monolithic Wyner-Ziv coding when the enhancement bitstream is generated in our proposed coder. Using an H.26L coded version as the base layer, experiments indicate that Wyner-Ziv coding gives slightly worse performance than FGS coding when the channel (for both the base and enhancement layers) is noiseless. However, when the channel is noisy, extensive simulations of video transmission over wireless networks conforming to the CDMA2000 1X standard show that H.26L base layer coding plus Wyner-Ziv enhancement layer coding are more robust against channel errors than H.26L FGS coding. These results demonstrate that layered Wyner-Ziv video coding is a promising new technique for video streaming over wireless networks. For scalable video transmission over the Internet and 3G wireless networks, we propose a system for receiver-driven layered multicast based on layered Wyner-Ziv video coding and digital fountain coding. Digital fountain codes are near-capacity erasure codes that are ideally suited for multicast applications because of their rate- less property. By combining an error-resilient Wyner-Ziv video coder and rateless fountain codes, our system allows reliable multicast of high-quality video to an arbi- trary number of heterogeneous receivers without the requirement of feedback chan- nels. Extending this work on separate source-channel coding, we consider distributed joint source-channel coding by using a single channel code for both video compression (via Slepian-Wolf coding) and packet loss protection. We choose Raptor codes - the best approximation to a digital fountain - and address in detail both encoder and de- coder designs. Simulation results show that, compared to one separate design using Slepian-Wolf compression plus erasure protection and another based on FGS coding plus erasure protection, the proposed joint design provides better video quality at the same number of transmitted packets

    DYNAMIC RESOURCE ALLOCATION FOR MULTIUSER VIDEO STREAMING

    Get PDF
    With the advancement of video compression technology and wide deployment of wired/wireless networks, there is an increasing demand of multiuser video communication services. A multiuser video transmission system should consider not only the reconstructed video quality in the individual-user level but also the service objectives among all users on the network level. There are many design challenges to support multiuser video communication services, such as fading channels, limited radio resources of wireless networks, heterogeneity of video content complexity, delay and decoding dependency constraints of video bitstreams, and mixed integer optimization. To overcome these challenges, a general strategy is to dynamically allocate resources according to the changing environments and requirements, so as to improve the overall system performance and ensure quality of service (QoS) for each user. In this dissertation, we address the aforementioned design challenges from a resource-allocation point of view and two aspects of system and algorithm designs, namely, a cross-layer design that jointly optimizes resource utilization from physical layer to application layer, and multiuser diversity that explores the source and channel heterogeneity among different users. We also address the impacts on systems caused by dynamic environment along time domain and consider the time-heterogeneity of video sources and time-varying characteristics of channel conditions. To achieve the desired service objectives, a general resource allocation framework is formulated in terms of constrained optimization problems to dynamically allocate resources and control the quality of multiple video bitstreams. Based on the design methodology of multiuser cross-layer optimization, we propose several systems to efficiently transmit multiple video streams, encoded by current and emerging video codecs, over major types of wireless networks such as 3G cellular system, Wireless Local Area Network, 4G cellular system, and future Wireless Metropolitan Area Networks. Owing to the integer nature of some system parameters, the formulated optimization problems are often integer or mixed integer programming problem and involve high computation to search the optimal solutions. Fast algorithms are proposed to provide real-time services. We demonstrate the advantages of dynamic and joint resource allocation for multiple video sources compared to static strategy. We also show the improvement of exploring diversity on frequency, time, and transmission path, and the benefits from multiuser cross-layer optimization
    corecore