81 research outputs found
Dyadic spatial resolution reduction transcoding for H.264/AVC
In this paper, we examine spatial resolution downscaling transcoding for H.264/AVC video coding. A number of advanced coding tools limit the applicability of techniques, which were developed for previous video coding standards. We present a spatial resolution reduction transcoding architecture for H.264/AVC, which extends open-loop transcoding with a low-complexity compensation technique in the reduced-resolution domain. The proposed architecture tackles the problems in H.264/AVC and avoids visual artifacts in the transcoded sequence, while keeping complexity significantly lower than more traditional cascaded decoder-encoder architectures. The refinement step of the proposed architecture can be used to further improve rate-distortion performance, at the cost of additional complexity. In this way, a dynamic-complexity transcoder is rendered possible. We present a thorough investigation of the problems related to motion and residual data mapping, leading to a transcoding solution resulting in fully compliant reduced-size H.264/AVC bitstreams
An Efficient Motion Estimation Method for H.264-Based Video Transcoding with Arbitrary Spatial Resolution Conversion
As wireless and wired network connectivity is rapidly expanding
and the number of network users is steadily increasing, it has become more
and more important to support universal access of multimedia
content over the whole network. A big challenge, however, is
the great diversity of network devices from full screen computers
to small smart phones. This leads to research on transcoding,
which involves in efficiently reformatting compressed data from
its original high resolution to a desired spatial resolution
supported by the displaying device. Particularly, there is a
great momentum in the multimedia industry for H.264-based
transcoding as H.264 has been widely employed as a mandatory
player feature in applications ranging from television broadcast
to video for mobile devices.
While H.264 contains many new features for effective video
coding with excellent rate distortion (RD) performance, a major issue
for transcoding H.264 compressed video from one spatial resolution
to another is the computational complexity. Specifically, it is
the motion compensated prediction (MCP) part. MCP is the main
contributor to the excellent RD performance
of H.264 video compression, yet it is very time consuming. In general,
a brute-force search is used to find the best motion vectors for MCP.
In the scenario of transcoding, however, an immediate idea for
improving the MCP efficiency for the re-encoding procedure is to
utilize the motion vectors in the original compressed stream.
Intuitively, motion in the high resolution scene is highly related
to that in the down-scaled scene.
In this thesis, we study homogeneous video transcoding from H.264
to H.264. Specifically, for the video transcoding with arbitrary
spatial resolution conversion, we propose a motion vector estimation
algorithm based on a multiple linear regression model, which
systematically utilizes the motion information in the original scenes.
We also propose a practical solution for efficiently determining a
reference frame to take the advantage of the new feature of multiple
references in H.264. The performance of the algorithm was assessed
in an H.264 transcoder. Experimental results show that, as compared
with a benchmark solution, the proposed method significantly reduces
the transcoding complexity without degrading much the video quality
DCT-based video downscaling transcoder using split and merge technique
2005-2006 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe
Algorithms & implementation of advanced video coding standards
Advanced video coding standards have become widely deployed coding techniques used in numerous products, such as broadcast, video conference, mobile television and blu-ray disc, etc. New compression techniques are gradually included in video coding standards so that a 50% compression rate reduction is achievable every five years. However, the trend also has brought many problems, such as, dramatically increased computational complexity, co-existing multiple standards and gradually increased development time. To solve the above problems, this thesis intends to investigate efficient algorithms for the latest video coding standard, H.264/AVC. Two aspects of H.264/AVC standard are inspected in this thesis: (1) Speeding up intra4x4 prediction with parallel architecture. (2) Applying an efficient rate control algorithm based on deviation measure to intra frame. Another aim of this thesis is to work on low-complexity algorithms for MPEG-2 to H.264/AVC transcoder. Three main mapping algorithms and a computational complexity reduction algorithm are focused by this thesis: motion vector mapping, block mapping, field-frame mapping and efficient modes ranking algorithms. Finally, a new video coding framework methodology to reduce development time is examined. This thesis explores the implementation of MPEG-4 simple profile with the RVC framework. A key technique of automatically generating variable length decoder table is solved in this thesis. Moreover, another important video coding standard, DV/DVCPRO, is further modeled by RVC framework. Consequently, besides the available MPEG-4 simple profile and China audio/video standard, a new member is therefore added into the RVC framework family. A part of the research work presented in this thesis is targeted algorithms and implementation of video coding standards. In the wide topic, three main problems are investigated. The results show that the methodologies presented in this thesis are efficient and encourage
On transcoding a B-frame to a P-frame in the compressed domain
2007-2008 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe
Image and Video Coding/Transcoding: A Rate Distortion Approach
Due to the lossy nature of image/video compression and the expensive bandwidth and computation resources in a multimedia system, one of the key design issues for image and video coding/transcoding is to optimize trade-off among distortion, rate, and/or complexity. This thesis studies the application of rate distortion (RD) optimization approaches to image and video coding/transcoding for exploring the best RD performance of a video codec compatible to the newest video coding standard H.264 and for designing computationally efficient down-sampling algorithms with high visual fidelity in the discrete Cosine transform (DCT) domain.
RD optimization for video coding in this thesis considers two objectives, i.e., to achieve the best encoding efficiency in terms of minimizing the actual RD cost and to maintain decoding compatibility with the newest video coding standard H.264. By the actual RD cost, we mean a cost based on the final reconstruction error and the entire coding rate. Specifically, an operational RD method is proposed based on a soft decision quantization (SDQ) mechanism, which has its root in a fundamental RD theoretic study on fixed-slope lossy data compression. Using SDQ instead of hard decision quantization, we establish a general framework in which motion prediction, quantization, and entropy coding in a hybrid video coding scheme such as H.264 are jointly designed to minimize the actual RD cost on a frame basis. The proposed framework is applicable to optimize any hybrid video coding scheme, provided that specific algorithms are designed corresponding to coding syntaxes of a given standard codec, so as to maintain compatibility with the standard.
Corresponding to the baseline profile syntaxes and the main profile syntaxes of H.264, respectively, we have proposed three RD algorithms---a graph-based algorithm for SDQ given motion prediction and quantization step sizes, an algorithm for residual coding optimization given motion prediction, and an iterative overall algorithm for jointly optimizing motion prediction, quantization, and entropy coding---with them embedded in the indicated order. Among the three algorithms, the SDQ design is the core, which is developed based on a given entropy coding method. Specifically, two SDQ algorithms have been developed based on the context adaptive variable length coding (CAVLC) in H.264 baseline profile and the context adaptive binary arithmetic coding (CABAC) in H.264 main profile, respectively.
Experimental results for the H.264 baseline codec optimization show that for a set of typical testing sequences, the proposed RD method for H.264 baseline coding achieves a better trade-off between rate and distortion, i.e., 12\% rate reduction on average at the same distortion (ranging from 30dB to 38dB by PSNR) when compared with the RD optimization method implemented in H.264 baseline reference codec. Experimental results for optimizing H.264 main profile coding with CABAC show 10\% rate reduction over a main profile reference codec using CABAC, which also suggests 20\% rate reduction over the RD optimization method implemented in H.264 baseline reference codec, leading to our claim of having developed the best codec in terms of RD performance, while maintaining the compatibility with H.264.
By investigating trade-off between distortion and complexity, we have also proposed a designing framework for image/video transcoding with spatial resolution reduction, i.e., to down-sample compressed images/video with an arbitrary ratio in the DCT domain. First, we derive a set of DCT-domain down-sampling methods, which can be represented by a linear transform with double-sided matrix multiplication (LTDS) in the DCT domain. Then, for a pre-selected pixel-domain down-sampling method, we formulate an optimization problem for finding an LTDS to approximate the given pixel-domain method to achieve the best trade-off between visual quality and computational complexity. The problem is then solved by modeling an LTDS with a multi-layer perceptron network and using a structural learning with forgetting algorithm for training the network. Finally, by selecting a pixel-domain reference method with the popular Butterworth lowpass filtering and cubic B-spline interpolation, the proposed framework discovers an LTDS with better visual quality and lower computational complexity when compared with state-of-the-art methods in the literature
- …