57 research outputs found
Advanced heterogeneous video transcoding
PhDVideo transcoding is an essential tool to promote inter-operability
between different video communication systems. This thesis presents
two novel video transcoders, both operating on bitstreams of the cur-
rent H.264/AVC standard. The first transcoder converts H.264/AVC
bitstreams to a Wavelet Scalable Video Codec (W-SVC), while the second targets the emerging High Efficiency Video Coding (HEVC).
Scalable Video Coding (SVC) enables low complexity adaptation
of compressed video, providing an efficient solution for content delivery
through heterogeneous networks. The transcoder proposed here aims at
exploiting the advantages offered by SVC technology when dealing with
conventional coders and legacy video, efficiently reusing information
found in the H.264/AVC bitstream to achieve a high rate-distortion
performance at a low complexity cost. Its main features include new
mode mapping algorithms that exploit the W-SVC larger macroblock
sizes, and a new state-of-the-art motion vector composition algorithm
that is able to tackle different coding configurations in the H.264/AVC
bitstream, including IPP or IBBP with multiple reference frames.
The emerging video coding standard, HEVC, is currently approaching the final stage of development prior to standardization. This thesis
proposes and evaluates several transcoding algorithms for the HEVC
codec. In particular, a transcoder based on a new method that is capable of complexity scalability, trading off rate-distortion performance
for complexity reduction, is proposed. Furthermore, other transcoding solutions are explored, based on a novel content-based modeling
approach, in which the transcoder adapts its parameters based on the
contents of the sequence being encoded.
Finally, the application of this research is not constrained to these
transcoders, as many of the techniques developed aim to contribute
to advance the research on this field, and have the potential to be
incorporated in different video transcoding architectures
End to end Multi-Objective Optimisation of H.264 and HEVC Codecs
All multimedia devices now incorporate video CODECs that comply with international video coding standards such as H.264 / MPEG4-AVC and the new High Efficiency Video Coding Standard (HEVC) otherwise known as H.265. Although the standard CODECs have been designed to include algorithms with optimal efficiency, large number of coding parameters can be used to fine tune their operation, within known constraints of for e.g., available computational power, bandwidth, consumer QoS requirements, etc. With large number of such parameters involved, determining which parameters will play a significant role in providing optimal quality of service within given constraints is a further challenge that needs to be met. Further how to select the values of the significant parameters so that the CODEC performs optimally under the given constraints is a further important question to be answered.
This thesis proposes a framework that uses machine learning algorithms to model the performance of a video CODEC based on the significant coding parameters. Means of modelling both the Encoder and Decoder performance is proposed. We define objective functions that can be used to model the performance related properties of a CODEC, i.e., video quality, bit-rate and CPU time. We show that these objective functions can be practically utilised in video Encoder/Decoder designs, in particular in their performance optimisation within given operational and practical constraints. A Multi-objective Optimisation framework based on Genetic Algorithms is thus proposed to optimise the performance of a video codec. The framework is designed to jointly minimize the CPU Time, Bit-rate and to maximize the quality of the compressed video stream. The thesis presents the use of this framework in the performance modelling and multi-objective optimisation of the most widely used video coding standard in practice at present, H.264 and the latest video coding standard, H.265/HEVC.
When a communication network is used to transmit video, performance related parameters of the communication channel will impact the end-to-end performance of the video CODEC. Network delays and packet loss will impact the quality of the video that is received at the decoder via the communication channel, i.e., even if a video CODEC is optimally configured network conditions will make the experience sub-optimal. Given the above the thesis proposes a design, integration and testing of a novel approach to simulating a wired network and the use of UDP protocol for the transmission of video data. This network is subsequently used to simulate the impact of packet loss and network delays on optimally coded video based on the framework previously proposed for the modelling and optimisation of video CODECs. The quality of received video under different levels of packet loss and network delay is simulated, concluding the impact on transmitted video based on their content and features
Filling the gaps in video transcoder deployment in the cloud
Cloud-based deployment of content production and broadcast workflows has
continued to disrupt the industry after the pandemic. The key tools required
for unlocking cloud workflows, e.g., transcoding, metadata parsing, and
streaming playback, are increasingly commoditized. However, as video traffic
continues to increase there is a need to consider tools which offer
opportunities for further bitrate/quality gains as well as those which
facilitate cloud deployment. In this paper we consider preprocessing,
rate/distortion optimisation and cloud cost prediction tools which are only
just emerging from the research community. These tools are posed as part of the
per-clip optimisation approach to transcoding which has been adopted by large
streaming media processing entities but has yet to be made more widely
available for the industry.Comment: Camera-ready version of BEIT Conference at NAB 202
Recommended from our members
Adaptive intra refresh for robust wireless multi-view video
This thesis was submitted for the award of PhD and was awarded by Brunel University LondonMobile wireless communication technology is a fast developing field and every day new mobile communication techniques and means are becoming available. In this thesis multi-view video (MVV) is also refers to as 3D video. Thus, the 3D video signals through wireless communication are shaping telecommunication industry and academia. However, wireless channels are prone to high level of bit and burst errors that largely deteriorate the quality of service (QoS). Noise along the wireless transmission path can introduce distortion or make a compressed bitstream lose vital information. The error caused by noise progressively spread to subsequent frames and among multiple views due to prediction. This error may compel the receiver to pause momentarily and wait for the subsequent INTRA picture to continue decoding. The pausing of video stream affects the user's Quality of Experience (QoE). Thus, an error resilience strategy is needed to protect the compressed bitstream against transmission errors. This thesis focuses on error resilience Adaptive Intra Refresh (AIR) technique. The AIR method is developed to make the compressed 3D video more robust to channel errors. The process involves periodic injection of Intra-coded macroblocks in a cyclic pattern using H.264/AVC standard. The algorithm takes into account individual features in each macroblock and the feedback information sent by the decoder about the channel condition in order to generate an MVV-AIR map. MVV-AIR map generation regulates the order of packets arrival and identifies the motion activities in each macroblock. Based on the level of motion activity contained in each macroblock, the MVV-AIR map classifies frames as high or low motion macroblocks. A proxy MVV-AIR transcoder is used to validate the efficiency of the generated MVV-AIR map. The MVV-AIR transcoding algorithm uses spatial and views downscaling scheme to convert from MVV to single view. Various experimental results indicate that the proposed error resilient MVV-AIR transcoder technique effectively improves the quality of reconstructed 3D video in wireless networks. A comparison of MVV-AIR transcoder algorithm with some traditional error resilience techniques demonstrates that MVV-AIR algorithm performs better in an error prone channel. Results of simulation revealed significant improvements in both objective and subjective qualities. No additional computational complexity emanates from the scheme while the QoS and QoE requirements are still fully met.Tertiary Institution Trust Fund (TETFund) of Nigeri
A Bayesian Approach to Block Structure Inference in AV1-based Multi-rate Video Encoding
Due to differences in frame structure, existing multi-rate video encoding
algorithms cannot be directly adapted to encoders utilizing special reference
frames such as AV1 without introducing substantial rate-distortion loss. To
tackle this problem, we propose a novel bayesian block structure inference
model inspired by a modification to an HEVC-based algorithm. It estimates the
posterior probabilistic distributions of block partitioning, and adapts early
terminations in the RDO procedure accordingly. Experimental results show that
the proposed method provides flexibility for controlling the tradeoff between
speed and coding efficiency, and can achieve an average time saving of 36.1%
(up to 50.6%) with negligible bitrate cost.Comment: published in IEEE Data Compression Conference, 201
A Survey on Energy Consumption and Environmental Impact of Video Streaming
Climate change challenges require a notable decrease in worldwide greenhouse
gas (GHG) emissions across technology sectors. Digital technologies, especially
video streaming, accounting for most Internet traffic, make no exception. Video
streaming demand increases with remote working, multimedia communication
services (e.g., WhatsApp, Skype), video streaming content (e.g., YouTube,
Netflix), video resolution (4K/8K, 50 fps/60 fps), and multi-view video, making
energy consumption and environmental footprint critical. This survey
contributes to a better understanding of sustainable and efficient video
streaming technologies by providing insights into the state-of-the-art and
potential future directions for researchers, developers, and engineers, service
providers, hosting platforms, and consumers. We widen this survey's focus on
content provisioning and content consumption based on the observation that
continuously active network equipment underneath video streaming consumes
substantial energy independent of the transmitted data type. We propose a
taxonomy of factors that affect the energy consumption in video streaming, such
as encoding schemes, resource requirements, storage, content retrieval,
decoding, and display. We identify notable weaknesses in video streaming that
require further research for improved energy efficiency: (1) fixed bitrate
ladders in HTTP live streaming; (2) inefficient hardware utilization of
existing video players; (3) lack of comprehensive open energy measurement
dataset covering various device types and coding parameters for reproducible
research
Receiver-Driven Video Adaptation
In the span of a single generation, video technology has made an incredible impact on daily life. Modern use cases for video are wildly diverse, including teleconferencing, live streaming, virtual reality, home entertainment, social networking, surveillance, body cameras, cloud gaming, and autonomous driving. As these applications continue to grow more sophisticated and heterogeneous, a single representation of video data can no longer satisfy all receivers. Instead, the initial encoding must be adapted to each receiver's unique needs. Existing adaptation strategies are fundamentally flawed, however, because they discard the video's initial representation and force the content to be re-encoded from scratch. This process is computationally expensive, does not scale well with the number of videos produced, and throws away important information embedded in the initial encoding. Therefore, a compelling need exists for the development of new strategies that can adapt video content without fully re-encoding it. To better support the unique needs of smart receivers, diverse displays, and advanced applications, general-use video systems should produce and offer receivers a more flexible compressed representation that supports top-down adaptation strategies from an original, compressed-domain ground truth. This dissertation proposes an alternate model for video adaptation that addresses these challenges. The key idea is to treat the initial compressed representation of a video as the ground truth, and allow receivers to drive adaptation by dynamically selecting which subsets of the captured data to receive. In support of this model, three strategies for top-down, receiver-driven adaptation are proposed. First, a novel, content-agnostic entropy coding technique is implemented in which symbols are selectively dropped from an input abstract symbol stream based on their estimated probability distributions to hit a target bit rate. Receivers are able to guide the symbol dropping process by supplying the encoder with an appropriate rate controller algorithm that fits their application needs and available bandwidths. Next, a domain-specific adaptation strategy is implemented for H.265/HEVC coded video in which the prediction data from the original source is reused directly in the adapted stream, but the residual data is recomputed as directed by the receiver. By tracking the changes made to the residual, the encoder can compensate for decoder drift to achieve near-optimal rate-distortion performance. Finally, a fully receiver-driven strategy is proposed in which the syntax elements of a pre-coded video are cataloged and exposed directly to clients through an HTTP API. Instead of requesting the entire stream at once, clients identify the exact syntax elements they wish to receive using a carefully designed query language. Although an implementation of this concept is not provided, an initial analysis shows that such a system could save bandwidth and computation when used by certain targeted applications.Doctor of Philosoph
- …