113 research outputs found
Study of Compression Statistics and Prediction of Rate-Distortion Curves for Video Texture
Encoding textural content remains a challenge for current standardised video
codecs. It is therefore beneficial to understand video textures in terms of
both their spatio-temporal characteristics and their encoding statistics in
order to optimize encoding performance. In this paper, we analyse the
spatio-temporal features and statistics of video textures, explore the
rate-quality performance of different texture types and investigate models to
mathematically describe them. For all considered theoretical models, we employ
machine-learning regression to predict the rate-quality curves based solely on
selected spatio-temporal features extracted from uncompressed content. All
experiments were performed on homogeneous video textures to ensure validity of
the observations. The results of the regression indicate that using an
exponential model we can more accurately predict the expected rate-quality
curve (with a mean Bj{\o}ntegaard Delta rate of 0.46% over the considered
dataset) while maintaining a low relative complexity. This is expected to be
adopted by in the loop processes for faster encoding decisions such as
rate-distortion optimisation, adaptive quantization, partitioning, etc.Comment: 17 page
Dynamically Reconfigurable Architectures and Systems for Time-varying Image Constraints (DRASTIC) for Image and Video Compression
In the current information booming era, image and video consumption is ubiquitous. The associated image and video coding operations require significant computing resources for both small-scale computing systems as well as over larger network systems. For different scenarios, power, bitrate and image quality can impose significant time-varying constraints. For example, mobile devices (e.g., phones, tablets, laptops, UAVs) come with significant constraints on energy and power. Similarly, computer networks provide time-varying bandwidth that can depend on signal strength (e.g., wireless networks) or network traffic conditions. Alternatively, the users can impose different constraints on image quality based on their interests. Traditional image and video coding systems have focused on rate-distortion optimization. More recently, distortion measures (e.g., PSNR) are being replaced by more sophisticated image quality metrics. However, these systems are based on fixed hardware configurations that provide limited options over power consumption. The use of dynamic partial reconfiguration with Field Programmable Gate Arrays (FPGAs) provides an opportunity to effectively control dynamic power consumption by jointly considering software-hardware configurations. This dissertation extends traditional rate-distortion optimization to rate-quality-power/energy optimization and demonstrates a wide variety of applications in both image and video compression. In each application, a family of Pareto-optimal configurations are developed that allow fine control in the rate-quality-power/energy optimization space. The term Dynamically Reconfiguration Architecture Systems for Time-varying Image Constraints (DRASTIC) is used to describe the derived systems. DRASTIC covers both software-only as well as software-hardware configurations to achieve fine optimization over a set of general modes that include: (i) maximum image quality, (ii) minimum dynamic power/energy, (iii) minimum bitrate, and (iv) typical mode over a set of opposing constraints to guarantee satisfactory performance. In joint software-hardware configurations, DRASTIC provides an effective approach for dynamic power optimization. For software configurations, DRASTIC provides an effective method for energy consumption optimization by controlling processing times. The dissertation provides several applications. First, stochastic methods are given for computing quantization tables that are optimal in the rate-quality space and demonstrated on standard JPEG compression. Second, a DRASTIC implementation of the DCT is used to demonstrate the effectiveness of the approach on motion JPEG. Third, a reconfigurable deblocking filter system is investigated for use in the current H.264/AVC systems. Fourth, the dissertation develops DRASTIC for all 35 intra-prediction modes as well as intra-encoding for the emerging High Efficiency Video Coding standard (HEVC)
Recommended from our members
Transform domain distributed video coding using larger transform blocks
Distributed Video Coding (DVC) displays promising performance at low spatial resolutions but begins to struggle as the resolution increases. One of the limiting aspects is its 4x4 block size of Discrete Cosine Transform (DCT) which is often impractical at higher resolutions. This paper investigates the impact of exploiting larger DCT block sizes on the performance of transform domain DVC at higher spatial resolutions. In order to utilize a larger block size in DVC, appropriate quantisers have to be selected and this has been solved by means of incorporating a content-aware quantisation mechanism to generate image specific quantisation matrix for any DCT block size. Experimental results confirm that the larger 8x8 block size consistently exhibit superior RD performance for CIF resolution sequences compared to the smaller 4x4 block sizes. Significant PSNR improvement has been observed for 16x16 block size at 4CIF resolution with up to 1.78dB average PSNR gain compared to its smaller block alternatives
Video Stream Adaptation In Computer Vision Systems
Computer Vision (CV) has been deployed recently in a wide range of applications, including surveillance and automotive industries. According to a recent report, the market for CV technologies will grow to $33.3 billion by 2019. Surveillance and automotive industries share over 20% of this market. This dissertation considers the design of real-time CV systems with live video streaming, especially those over wireless and mobile networks. Such systems include video cameras/sensors and monitoring stations. The cameras should adapt their captured videos based on the events and/or available resources and time requirement. The monitoring station receives video streams from all cameras and run CV algorithms for decisions, warnings, control, and/or other actions. Real-time CV systems have constraints in power, computational, and communicational resources. Most video adaptation techniques considered the video distortion as the primary metric. In CV systems, however, the main objective is enhancing the event/object detection/recognition/tracking accuracy. The accuracy can essentially be thought of as the quality perceived by machines, as opposed to the human perceptual quality. High-Efficiency Video Coding (HEVC) is a recent encoding standard that seeks to address the limited communication bandwidth problem as a result of the popularity of High Definition (HD) videos. Unfortunately, HEVC adopts algorithms that greatly slow down the encoding process, and thus results in complications in real-time systems.
This dissertation presents a method for adapting live video streams to limited and varying network bandwidth and energy resources. It analyzes and compares the rate-accuracy and rate-energy characteristics of various video streams adaptation techniques in CV systems. We model the video capturing, encoding, and transmission aspects and then provide an overall model of the power consumed by the video cameras and/or sensors. In addition to modeling the power consumption, we model the achieved bitrate of video encoding. We validate and analyze the power consumption models of each phase as well as the aggregate power consumption model through extensive experiments. The analysis includes examining individual parameters separately and examining the impacts of changing more than one parameter at a time. For HEVC, we develop an algorithm that predicts the size of the block without iterating through the exhaustive Rate Distortion Optimization (RDO) method. We demonstrate the effectiveness of the proposed algorithm in comparison with existing algorithms. The proposed algorithm achieves approximately 5 times the encoding speed of the RDO algorithm and 1.42 times the encoding speed of the fastest analyzed algorithm
LDMIC: Learning-based Distributed Multi-view Image Coding
Multi-view image compression plays a critical role in 3D-related
applications. Existing methods adopt a predictive coding architecture, which
requires joint encoding to compress the corresponding disparity as well as
residual information. This demands collaboration among cameras and enforces the
epipolar geometric constraint between different views, which makes it
challenging to deploy these methods in distributed camera systems with randomly
overlapping fields of view. Meanwhile, distributed source coding theory
indicates that efficient data compression of correlated sources can be achieved
by independent encoding and joint decoding, which motivates us to design a
learning-based distributed multi-view image coding (LDMIC) framework. With
independent encoders, LDMIC introduces a simple yet effective joint context
transfer module based on the cross-attention mechanism at the decoder to
effectively capture the global inter-view correlations, which is insensitive to
the geometric relationships between images. Experimental results show that
LDMIC significantly outperforms both traditional and learning-based MIC methods
while enjoying fast encoding speed. Code will be released at
https://github.com/Xinjie-Q/LDMIC.Comment: Accepted by ICLR 202
Video QoS/QoE over IEEE802.11n/ac: A Contemporary Survey
The demand for video applications over wireless networks has tremendously increased, and IEEE 802.11 standards have provided higher support for video transmission. However, providing Quality of Service (QoS) and Quality of Experience (QoE) for video over WLAN is still a challenge due to the error sensitivity of compressed video and dynamic channels. This thesis presents a contemporary survey study on video QoS/QoE over WLAN issues and solutions. The objective of the study is to provide an overview of the issues by conducting a background study on the video codecs and their features and characteristics, followed by studying QoS and QoE support in IEEE 802.11 standards. Since IEEE 802.11n is the current standard that is mostly deployed worldwide and IEEE 802.11ac is the upcoming standard, this survey study aims to investigate the most recent video QoS/QoE solutions based on these two standards. The solutions are divided into two broad categories, academic solutions, and vendor solutions. Academic solutions are mostly based on three main layers, namely Application, Media Access Control (MAC) and Physical (PHY) which are further divided into two major categories, single-layer solutions, and cross-layer solutions. Single-layer solutions are those which focus on a single layer to enhance the video transmission performance over WLAN. Cross-layer solutions involve two or more layers to provide a single QoS solution for video over WLAN. This thesis has also presented and technically analyzed QoS solutions by three popular vendors. This thesis concludes that single-layer solutions are not directly related to video QoS/QoE, and cross-layer solutions are performing better than single-layer solutions, but they are much more complicated and not easy to be implemented. Most vendors rely on their network infrastructure to provide QoS for multimedia applications. They have their techniques and mechanisms, but the concept of providing QoS/QoE for video is almost the same because they are using the same standards and rely on Wi-Fi Multimedia (WMM) to provide QoS
- …