82 research outputs found

    Perceptually-Driven Video Coding with the Daala Video Codec

    Full text link
    The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which ones worked and which did not. We evaluate which tools are easy to integrate into a more traditional codec design, and show results in the context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital Image Processing (ADIP), 201

    An Analysis of VP8, a new video codec for the web

    Get PDF
    Video is an increasingly ubiquitous part of our lives. Fast and efficient video codecs are necessary to satisfy the increasing demand for video on the web and mobile devices. However, open standards and patent grants are paramount to the adoption of video codecs across different platforms and browsers. Google On2 released VP8 in May 2010 to compete with H.264, the current standard of video codecs, complete with source code, specification and a perpetual patent grant. As the amount of video being created every day is growing rapidly, the decision of which codec to encode this video with is paramount; if a low quality codec or a restrictively licensed codec is used, the video recorded might be of little to no use. We sought to study VP8 and its quality versus its resource consumption compared to H.264 -- the most popular current video codec -- so that reader may make an informed decision for themselves or for their organizations about whether to use H.264 or VP8, or something else entirely. We examined VP8 in detail, compared its theoretical complexity to H.264 and measured the efficiency of its current implementation. VP8 shares many facets of its design with H.264 and other Discrete Cosine Transform (DCT) based video codecs. However, VP8 is both simpler and less feature rich than H.264, which may allow for rapid hardware and software implementations. As it was designed for the Internet and newer mobile devices, it contains fewer legacy features, such as interlacing, than H.264 supports. To perform quality measurements, the open source VP8 implementation libvpx was used. This is the reference implementation. For H.264, the open source H.264 encoder x264 was used. This encoder has very high performance, and is often rated at the top of its field in efficiency. The JM reference encoder was used to establish a baseline quality for H.264. Our findings indicate that VP8 performs very well at low bitrates, at resolutions at and below CIF. VP8 may be able to successfully displace H.264 Baseline in the mobile streaming video domain. It offers higher quality at a lower bitrate for low resolution images due to its high performing entropy coder and non-contiguous macroblock segmentation. At higher resolutions, VP8 still outperforms H.264 Baseline, but H.264 High profile leads. At HD resolution (720p and above), H.264 is significantly better than VP8 due to its superior motion estimation and adaptive coding. There is little significant difference between the intra-coding performance between H.264 and VP8. VP8\u27s in-loop deblocking filter outperforms H.264\u27s version. H.264\u27s inter-coding, with full support for B frames and weighting outperforms VP8\u27s alternate reference scheme, although this may improve in the future. On average, VP8\u27s feature set is less complex than H.264\u27s equivalents, which, along with its open source implementation, may spur development in the future. These findings indicate that VP8 has strong fundamentals when compared with H.264, but that it lacks optimization and maturity. It will likely improve as engineers optimize VP8\u27s reference implementation, or when a competing implementation is developed. We recommend several areas that the VP8 developers should focus on in the future

    Orchestrating Service Migration for Low Power MEC-Enabled IoT Devices

    Full text link
    Multi-Access Edge Computing (MEC) is a key enabling technology for Fifth Generation (5G) mobile networks. MEC facilitates distributed cloud computing capabilities and information technology service environment for applications and services at the edges of mobile networks. This architectural modification serves to reduce congestion, latency, and improve the performance of such edge colocated applications and devices. In this paper, we demonstrate how reactive service migration can be orchestrated for low-power MEC-enabled Internet of Things (IoT) devices. Here, we use open-source Kubernetes as container orchestration system. Our demo is based on traditional client-server system from user equipment (UE) over Long Term Evolution (LTE) to the MEC server. As the use case scenario, we post-process live video received over web real-time communication (WebRTC). Next, we integrate orchestration by Kubernetes with S1 handovers, demonstrating MEC-based software defined network (SDN). Now, edge applications may reactively follow the UE within the radio access network (RAN), expediting low-latency. The collected data is used to analyze the benefits of the low-power MEC-enabled IoT device scheme, in which end-to-end (E2E) latency and power requirements of the UE are improved. We further discuss the challenges of implementing such schemes and future research directions therein

    Video QoS/QoE over IEEE802.11n/ac: A Contemporary Survey

    Get PDF
    The demand for video applications over wireless networks has tremendously increased, and IEEE 802.11 standards have provided higher support for video transmission. However, providing Quality of Service (QoS) and Quality of Experience (QoE) for video over WLAN is still a challenge due to the error sensitivity of compressed video and dynamic channels. This thesis presents a contemporary survey study on video QoS/QoE over WLAN issues and solutions. The objective of the study is to provide an overview of the issues by conducting a background study on the video codecs and their features and characteristics, followed by studying QoS and QoE support in IEEE 802.11 standards. Since IEEE 802.11n is the current standard that is mostly deployed worldwide and IEEE 802.11ac is the upcoming standard, this survey study aims to investigate the most recent video QoS/QoE solutions based on these two standards. The solutions are divided into two broad categories, academic solutions, and vendor solutions. Academic solutions are mostly based on three main layers, namely Application, Media Access Control (MAC) and Physical (PHY) which are further divided into two major categories, single-layer solutions, and cross-layer solutions. Single-layer solutions are those which focus on a single layer to enhance the video transmission performance over WLAN. Cross-layer solutions involve two or more layers to provide a single QoS solution for video over WLAN. This thesis has also presented and technically analyzed QoS solutions by three popular vendors. This thesis concludes that single-layer solutions are not directly related to video QoS/QoE, and cross-layer solutions are performing better than single-layer solutions, but they are much more complicated and not easy to be implemented. Most vendors rely on their network infrastructure to provide QoS for multimedia applications. They have their techniques and mechanisms, but the concept of providing QoS/QoE for video is almost the same because they are using the same standards and rely on Wi-Fi Multimedia (WMM) to provide QoS

    Video CODEC with adaptive frame rate control for intelligent transportation system applications

    Get PDF
    Video cameras are one of the important types of devices in Intelligent Transportation Systems (ITS). The camera images are practical, widely deployable and beneficial for traffic management and congestion control. The advent of image processing has established several applications based on ITS camera images, including vehicle detection, weather monitoring, smart work zones, etc. Unlike digital video entertainment applications, the camera images in ITS applications require high video image quality but usually not a high video frame rate. Traditional block-based video compression standards, which were developed primarily with the video entertainment industry in mind, are dependent on adaptive rate control algorithms to control the video quality and the video frame rate. Modern rate control algorithms range from simple frame skipping to complicated adaptive algorithms based on optimal rate-distortion theory. In this dissertation, I presented an innovative video frame rate control scheme based on adaptive frame dropping. Video transmission schemes were also discussed and a new strategy to reduce the video traffic on the network was presented. Experimental results in a variety of network scenarios shown that the proposed technique could improve video quality in both the temporal and spatial dimensions, as quantified by standard video metrics (up to 6 percent of PSNR, 5 percent of SSIM, and 10 percent VQM compared to the original video). Another benefit of the proposed technique is that video traffic and network congestion are generally reduced. Both FPGA and embedded Linux implementations are considered for video encoder development

    Compare multimedia frameworks in mobile platforms

    Get PDF
    Multimedia feature is currently one of the most important features in mobile devices. Many modern mobile platforms use a centralized software stack to handle multimedia requirements that software stack is called multimedia framework. Multimedia framework belongs to the middleware layer of mobile operating system. It can be considered as a bridge that connects mobile operating system kernel, hardware drivers with UI applications. It supplies high level APIs that offers simple and easy solutions for complicated multimedia tasks to UI application developers. Multimedia Framework also manages and utilizes low lever system software and hardware in an efficient manner. It offers a centralize solution between high level demands and low level system resources. In this M.Sc. thesis project we have studied, analyzed and compared open source GStreamer, Android Stagefright and Microsoft Silverlight Media Framework from several perspectives. Some of the comparison perspectives are architecture, supported use cases, extensibility, implementation language and program language support (bindings), developer support, and legal status aspects. One of the main contributions of this thesis work is that clarifying in details the strength and weaknesses of each framework. Furthermore, the thesis should serve decision-making guidance when on needs to select a multimedia framework for a project. Moreover, and to enhance the impression with the three multimedia frameworks, a basic media player implementation is demonstrated with source code in the thesis.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

    Fast Algorithm Designs of Multiple-Mode Discrete Integer Transforms with Cost-Effective and Hardware-Sharing Architectures for Multistandard Video Coding Applications

    Get PDF
    In this chapter, first we give a brief view of transform-based video coding. Second, the basic matrix decomposition scheme for fast algorithm and hardware-sharing-based integer transform design are described. Finally, two case studies for fast algorithm and hardware-sharing-based architecture designs of discrete integer transforms are presented, where one is for the single-standard multiple-mode video transform-coding application, and the other is for the multiple-standard multiple-mode video transform-coding application

    Hardware-accelerated high-resolution video coding in Virtual Network Functions

    Get PDF
    Network Function Virtualization (NFV) has become a widely acclaimed approach to facilitate the management and orchestration of network services. However, after rapidly achieving a widespread success, NFV is now challenged by the overwhelming demand of computing power originated by the never-ending growth of innovative applications coming from the Internet world. To overcome this problem, the use of h/w acceleration combined with NFV has been proposed. This way, the computing performance of commodity servers can be greatly enhanced, without losing the advantages offered by NFV in service management. In this paper, to demonstrate the potentialities of NFV and h/w acceleration, a Virtual Network Function for video coding (video Transcoding Unit - vTU) is presented. The vTU is accelerated by a General Purpose GPU, and is based on Open Source software packages for media processing. The vTU architecture is firstly described in details. A thorough characterization of its computing performance is then reported, and the obtained results are compared to those achieved with non-accelerated and/or non-virtualized versions of the vTU itself. Also, the performance provided by an original, GPU accelerated version of the VP8 encoder is presented. The activities described in this paper have been carried out within the EU FP7 T-NOVA project

    Re-encoding Resistance: Towards Robust Covert Channels over WebRTC Video Streaming

    Get PDF
    Internet censorship is an ongoing phenomenon, where state level agents attempt to control the free access to information on the internet for purposes like dissent suppression and control. In response, research has been dedicated to propose and implement censorship circumvention solutions. One approach to circumvention involves the use of steganography, the process of embedding a hidden message into a cover medium (e.g., image, video, or audio file), such that sensitive or restricted information can be exchanged without a censoring agent being able to detect this exchange. Stegozoa, one such steganography tool, proposes using WebRTC video conferencing as the channel for embedding, to allow a party within a restricted area to freely receive information from a party located outside of this area, circumventing censorship. This project on itself, is an extension of an earlier implementation, and it assumes a stronger threat model, where WebRTC connections are not peer-to-peer but instead mediated by a gateway server, which may be controlled, or influenced, by the censoring agent. In this threat model, it is argued that an attacker (or censor) may inspect the data being transmitted directly, but has no incentive to change the video data. With our work, we seek to challenge this last assumption, since many applications using this WebRTC architecture can and will in fact modify the video, likely for non malicious purposes. By implementing our own test WebRTC application, we have shown that performing video re-encoding (that is decoding a VP8 format video into raw format and then back) on the transmitted data, is enough to render an implementation like Stegozoa inoperable. We argue that re-encoding is commonly a non-malicious operation, which may be justified by the application setup (for example to perform video filtering, or integrity checks, or other types of computer vision operations), and that does not affect a regular non-Stegozoa user. It is for this reason, that we proposed that re-encoding robustness is a necessary feature for steganographic systems. To this end, first we performed characterization experiments on a popular WebRTC video codec (VP8), to understand the effects of re-encoding. Similarly, we tested the effects of this operation when a hidden message is embed in a similar fashion to Stegozoa. We were able to show that, DCT coefficients, which are used commonly as the target for message embedding, change enough to cause loss of message integrity due to re-encoding, without the use of any error correction. Our experiments showed that higher frequency Discrete Cosine Transform (DCT) coefficients are more likely to remain stable for message embedding after re-encoding. We also showed that a dynamically calculated embedding space (that is the set of coefficients that may actually be used for embedding), akin to Stegozoa’s implementation, is very likely to be different after re-encoding, which creates a mismatch between sender and receiver. With these observations, we then sought to test a more robust implementation for embedding. To do so, we combined the usage of error correction (in the form of Reed-Solomon codes), and a static embedding space. We showed that message re-transmission (that is, embedding in multiple frames) and error correction are enough to send a message that will be received correctly. Our experiments showed that this can be used as a low-bandwidth non time-sensitive channel for covert communications. Finally, we combined our results to provide a set of guidelines that we believe are needed to implement a WebRTC based, VP8 encoded, censorship circumvention

    Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

    Full text link
    In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64Ă—\times64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate
    • …
    corecore