17 research outputs found
VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard
The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard.
First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case.
Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32ร32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder.
In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost
Spatial Correlation-Based Motion-Vector Prediction for Video-Coding Efficiency Improvement
H.265/HEVC achieves an average bitrate reduction of 50% for fixed video quality compared with the H.264/AVC standard, while computation complexity is significantly increased. The purpose of this work is to improve coding efficiency for the next-generation video-coding standards. Therefore, by developing a novel spatial neighborhood subset, efficient spatial correlation-based motion vector prediction (MVP) with the coding-unit (CU) depth-prediction algorithm is proposed to improve coding efficiency. Firstly, by exploiting the reliability of neighboring candidate motion vectors (MVs), the spatial-candidate MVs are used to determine the optimized MVP for motion-data coding. Secondly, the spatial correlation-based coding-unit depth-prediction is presented to achieve a better trade-off between coding efficiency and computation complexity for interprediction. This approach can satisfy an extreme requirement of high coding efficiency with not-high requirements for real-time processing. The simulation results demonstrate that overall bitrates can be reduced, on average, by 5.35%, up to 9.89% compared with H.265/HEVC reference software in terms of the Bjontegaard Metric
Efficient HEVC-based video adaptation using transcoding
In a video transmission system, it is important to take into account the great diversity of the network/end-user constraints. On the one hand, video content is typically streamed over a network that is characterized by different bandwidth capacities. In many cases, the bandwidth is insufficient to transfer the video at its original quality. On the other hand, a single video is often played by multiple devices like PCs, laptops, and cell phones. Obviously, a single video would not satisfy their different constraints.
These diversities of the network and devices capacity lead to the need for video adaptation techniques, e.g., a reduction of the bit rate or spatial resolution. Video transcoding, which modifies a property of the video without the change of the coding format, has been well-known as an efficient adaptation solution. However, this approach comes along with a high computational complexity, resulting in huge energy consumption in the network and possibly network latency.
This presentation provides several optimization strategies for the transcoding process of HEVC (the latest High Efficiency Video Coding standard) video streams. First, the computational complexity of a bit rate transcoder (transrater) is reduced. We proposed several techniques to speed-up the encoder of a transrater, notably a machine-learning-based approach and a novel coding-mode evaluation strategy have been proposed. Moreover, the motion estimation process of the encoder has been optimized with the use of decision theory and the proposed fast search patterns. Second, the issues and challenges of a spatial transcoder have been solved by using machine-learning algorithms. Thanks to their great performance, the proposed techniques are expected to significantly help HEVC gain popularity in a wide range of modern multimedia applications
Hardware based High Accuracy Integer Motion Estimation and Merge Mode Estimation
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2017. 8. ์ดํ์ฌ.HEVC๋ H.264/AVC ๋๋น 2๋ฐฐ์ ๋ฐ์ด๋ ์์ถ ํจ์จ์ ๊ฐ์ง์ง๋ง, ๋ง์ ์์ถ ๊ธฐ์ ์ด ์ฌ์ฉ๋จ์ผ๋ก์จ, ์ธ์ฝ๋ ์ธก์ ๊ณ์ฐ ๋ณต์ก๋๋ฅผ ํฌ๊ฒ ์ฆ๊ฐ์์ผฐ๋ค. HEVC์ ๋์ ๊ณ์ฐ ๋ณต์ก๋๋ฅผ ์ค์ด๊ธฐ ์ํ ๋ง์ ์ฐ๊ตฌ๋ค์ด ์ด๋ฃจ์ด์ก์ง๋ง, ๋๋ถ๋ถ์ ์ฐ๊ตฌ๋ค์ H.264/AVC๋ฅผ ์ํ ๊ณ์ฐ ๋ณต์ก๋ ๊ฐ์ ๋ฐฉ๋ฒ์ ํ์ฅ ์ ์ฉํ๋ ๋ฐ์ ๊ทธ์ณ, ๋ง์กฑ์ค๋ฝ์ง ์์ ๊ณ์ฐ ๋ณต์ก๋ ๊ฐ์ ์ฑ๋ฅ์ ๋ณด์ด๊ฑฐ๋, ์ง๋์น๊ฒ ํฐ ์์ถ ํจ์จ ์์ค์ ๋๋ฐํ์ฌ HEVC์ ์ต๋ ์์ถ ์ฑ๋ฅ์ ๋์ด๋ด์ง ๋ชปํ๋ค. ํนํ ์์ ์ฐ๊ตฌ๋ ํ๋์จ์ด ๊ธฐ๋ฐ์ ์ธ์ฝ๋๋ ์ค์๊ฐ ์ธ์ฝ๋์ ์คํ์ด ์ฐ์ ๋์ด ์์ถ ํจ์จ์ ํฌ์์ด ๋งค์ฐ ํฌ๋ค. ๊ทธ๋ฌ๋ฏ๋ก, ๋ณธ ์ฐ๊ตฌ์์๋ ํ๋์จ์ด ๊ธฐ๋ฐ Inter prediction์ ๊ณ ์ํ๋ฅผ ์ด๋ฃธ๊ณผ ๋์์ HEVC๊ฐ ๊ฐ์ง ์์ถ ์ฑ๋ฅ์ ์์ค์ ์ต์ํํ๊ณ , ์ค์๊ฐ ์ฝ๋ฉ์ด ๊ฐ๋ฅํ ํ๋์จ์ด ๊ตฌ์กฐ๋ฅผ ์ ์ํ์๋ค. ๋ณธ ์ฐ๊ตฌ์์ ์ ์ํ bottom-up MV ์์ธก ๋ฐฉ๋ฒ์ ๊ธฐ์กด์ ๊ณต๊ฐ์ , ์๊ฐ์ ์ผ๋ก ์ธ์ ํ PU๋ก๋ถํฐ MV๋ฅผ ์์ธกํ๋ ๋ฐฉ๋ฒ์ด ์๋, HEVC์ ๊ณ์ธต์ ์ผ๋ก ์ธ์ ํ PU๋ก๋ถํฐ MV๋ฅผ ์์ธกํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ์ฌ MV ์์ธก์ ์ ํ๋๋ฅผ ํฐ ํญ์ผ๋ก ํฅ์์์ผฐ๋ค. ๊ฒฐ๊ณผ์ ์ผ๋ก ์์ถ ํจ์จ์ ๋ณํ ์์ด IME์ ๊ณ์ฐ ๋ณต์ก๋๋ฅผ 67% ๊ฐ์์ํฌ ์ ์์๋ค. ๋ํ, ๋ณธ ์ฐ๊ตฌ์์๋ ์ ์๋ bottom-up IME ์๊ณ ๋ฆฌ์ฆ์ ์ ์ฉํ์ฌ ์ค์๊ฐ ๋์์ด ๊ฐ๋ฅํ ํ๋์จ์ด ๊ธฐ๋ฐ์ IME๋ฅผ ์ ์ํ์๋ค. ๊ธฐ์กด์ ํ๋์จ์ด ๊ธฐ๋ฐ IME๋ ๊ณ ์ IME ์๊ณ ๋ฆฌ์ฆ์ด ๊ฐ๋ ๋จ๊ณ๋ณ ์์กด์ฑ์ผ๋ก ์ธํ idle cycle์ ๋ฐ์๊ณผ ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ ๋ฌธ์ ๋ก ์ธํด, ๊ณ ์ IME ์๊ณ ๋ฆฌ์ฆ์ ์ฌ์ฉํ์ง ์๊ฑฐ๋ ๋๋ ํ๋์จ์ด์ ๋ง๊ฒ ๊ณ ์ IME ์๊ณ ๋ฆฌ์ฆ์ ์์ ํ์๊ธฐ ๋๋ฌธ์ ์์ถ ํจ์จ์ ์ ํ๊ฐ ์ ํผ์ผํธ ์ด์์ผ๋ก ๋งค์ฐ ์ปธ๋ค. ๊ทธ๋ฌ๋ ๋ณธ ์ฐ๊ตฌ์์๋ ๊ณ ์ IME ์๊ณ ๋ฆฌ์ฆ์ธ TZS ์๊ณ ๋ฆฌ์ฆ์ ์ฑํํ์ฌ TZS ์๊ณ ๋ฆฌ์ฆ์ ๊ณ์ฐ ๋ณต์ก๋ ๊ฐ์ ์ฑ๋ฅ์ ํผ์ํ์ง ์๋ ํ๋์จ์ด ๊ธฐ๋ฐ์ IME๋ฅผ ์ ์ํ์๋ค. ๊ณ ์ IME ์๊ณ ๋ฆฌ์ฆ์ ํ๋์จ์ด์์ ์ฌ์ฉํ๊ธฐ ์ํด์ ๋ค์ ์ธ ๊ฐ์ง ์ฌํญ์ ์ ์ํ๊ณ ํ๋์จ์ด์ ์ ์ฉํ์๋ค. ์ฒซ ์งธ๋ก, ๊ณ ์ IME ์๊ณ ๋ฆฌ์ฆ์ ๊ณ ์ง์ ๋ฌธ์ ์ธ idle cycle ๋ฐ์ ๋ฌธ์ ๋ฅผ ์๋ก ๋ค๋ฅธ ์ฐธ์กฐ ํฝ์ณ์ ์๋ก ๋ค๋ฅธ depth์ ๋ํ IME๋ฅผ ์ปจํ
์คํธ ์ค์์นญ์ ํตํด ํด๊ฒฐํ์๋ค. ๋ ์งธ๋ก, ์ฐธ์กฐ ๋ฐ์ดํฐ๋ก์ ๋น ๋ฅด๊ณ ์์ ๋ก์ด ์ ๊ทผ์ ์ํด ์ฐธ์กฐ ๋ฐ์ดํฐ์ locality ์ด์ฉํ multi bank SRAM ๊ตฌ์กฐ๋ฅผ ์ ์ํ์๋ค. ์
์งธ๋ก, ์ง๋์น๊ฒ ์์ ๋ก์ด ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ์ด ๋ฐ์์ํค๋ ๋๋์ ์ค์์นญ mux์ ์ฌ์ฉ์ ํผํ๊ธฐ ์ํด ํ์ ์ค์ฌ์ ๊ธฐ์ค์ผ๋ก ํ๋ ์ ํ๋ ์์ ๋์ ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ์ ์ ์ํ์๋ค. ๊ฒฐ๊ณผ ์ ์๋ IME ํ๋์จ์ด๋ HEVC์ ๋ชจ๋ ๋ธ๋ก ํฌ๊ธฐ๋ฅผ ์ง์ํ๋ฉด์, ์ฐธ์กฐ ํฝ์ฒ 4์ฅ๋ฅผ ์ฌ์ฉํ์ฌ, 4k UHD ์์์ 60fps์ ์๋๋ก ์ฒ๋ฆฌํ ์ ์์ผ๋ฉฐ ์ด ๋ ์์ถ ํจ์จ์ ์์ค์ 0.11%๋ก ๊ฑฐ์ ๋ํ๋์ง ์๋๋ค. ์ด ๋ ์ฌ์ฉ๋๋ ํ๋์จ์ด ๋ฆฌ์์ค๋ 1.27M gates์ด๋ค.
HEVC์ ์๋ก์ด ์ฑํ๋ merge mode estimation์ ์์ถ ํจ์จ ๊ฐ์ ํจ๊ณผ๊ฐ ๋ฐ์ด๋ ์๋ก์ด ๊ธฐ์ ์ด์ง๋ง, ๋งค PU ๋ง๋ค ๊ณ์ฐ ๋ณต์ก๋์ ๋ณ๋ ํญ์ด ์ปค์ ํ๋์จ์ด๋ก ๊ตฌํ๋๋ ๊ฒฝ์ฐ ํ๋์จ์ด ๋ฆฌ์์ค์ ๋ญ๋น๊ฐ ๋ง๋ค. ๊ทธ๋ฌ๋ฏ๋ก ๋ณธ ์ฐ๊ตฌ์์๋ ํจ์จ์ ์ธ ํ๋์จ์ด ๊ธฐ๋ฐ MME ๋ฐฉ๋ฒ๊ณผ ํ๋์จ์ด ๊ตฌ์กฐ๋ฅผ ํจ๊ป ์ ์ํ์๋ค. ๊ธฐ์กด MME ๋ฐฉ์์ ์ด์ PU์ ์ํด ๋ณด๊ฐ ํํฐ ์ ์ฉ ์ฌ๋ถ๊ฐ ๊ฒฐ์ ๋๊ธฐ ๋๋ฌธ์, ๋ณด๊ฐ ํํฐ์ ์ฌ์ฉ๋ฅ ์ 50% ์ดํ๋ฅผ ๋ํ๋ธ๋ค. ๊ทธ๋ผ์๋ ๋ถ๊ตฌํ๊ณ ํ๋์จ์ด๋ ๋ณด๊ฐ ํํฐ๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ์ ๋ง์ถ์ด ์ค๊ณ๋์ด์๊ธฐ ๋๋ฌธ์ ํ๋์จ์ด ๋ฆฌ์์ค์ ์ฌ์ฉ ํจ์จ์ด ๋ฎ์๋ค. ๋ณธ ์ฐ๊ตฌ์์๋ ๊ฐ์ฅ ํ๋์จ์ด ๋ฆฌ์์ค๋ฅผ ๋ง์ด ์ฌ์ฉํ๋ ์ธ๋ก ๋ฐฉํฅ ๋ณด๊ฐ ํํฐ๋ฅผ ์ ๋ฐ ํฌ๊ธฐ๋ก ์ค์ธ ๋ ๊ฐ์ ๋ฐ์ดํฐ ํจ์ค๋ฅผ ๊ฐ๋ MME ํ๋์จ์ด ๊ตฌ์กฐ๋ฅผ ์ ์ํ์๊ณ , ๋์ ํ๋์จ์ด ์ฌ์ฉ๋ฅ ์ ์ ์งํ๋ฉด์ ์์ถ ํจ์จ ์์ค์ ์ต์ํ ํ๋ merge ํ๋ณด ํ ๋น ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ์๋ค. ๊ฒฐ๊ณผ, ๊ธฐ์กด ํ๋์จ์ด ๊ธฐ๋ฐ MME ๋ณด๋ค 24% ์ ์ ํ๋์จ์ด ๋ฆฌ์์ค๋ฅผ ์ฌ์ฉํ๋ฉด์๋ 7.4% ๋ ๋น ๋ฅธ ์ํ ์๊ฐ์ ๊ฐ๋ ์๋ก์ด ํ๋์จ์ด ๊ธฐ๋ฐ์ MME๋ฅผ ๋ฌ์ฑํ์๋ค. ์ ์๋ ํ๋์จ์ด ๊ธฐ๋ฐ์ MME๋ 460.8K gates์ ํ๋์จ์ด ๋ฆฌ์์ค๋ฅผ ์ฌ์ฉํ๊ณ 4k UHD ์์์ 30 fps์ ์๋๋ก ์ฒ๋ฆฌํ ์ ์๋ค.์ 1 ์ฅ ์ ๋ก 1
1.1 ์ฐ๊ตฌ ๋ฐฐ๊ฒฝ 1
1.2 ์ฐ๊ตฌ ๋ด์ฉ 3
1.3 ๊ณตํต ์คํ ํ๊ฒฝ 5
1.4 ๋
ผ๋ฌธ ๊ตฌ์ฑ 6
์ 2 ์ฅ ๊ด๋ จ ์ฐ๊ตฌ 7
2.1 HEVC ํ์ค 7
2.1.1 ์ฟผ๋-ํธ๋ฆฌ ๊ธฐ๋ฐ์ ๊ณ์ธต์ ๋ธ๋ก ๊ตฌ์กฐ 7
2.1.2 HEVC ์ Inter Prediction 9
2.2 ํ๋ฉด ๊ฐ ์์ธก์ ์๋ ํฅ์์ ์ํ ์ด์ ์ฐ๊ตฌ 17
2.2.1 ๊ณ ์ Integer Motion Estimation ์๊ณ ๋ฆฌ์ฆ 17
2.2.2 ๊ณ ์ Merge Mode Estimation ์๊ณ ๋ฆฌ์ฆ 20
2.3 ํ๋ฉด ๊ฐ ์์ธก ํ๋์จ์ด ๊ตฌ์กฐ์ ๋ํ ์ด์ ์ฐ๊ตฌ 21
2.3.1 ํ๋์จ์ด ๊ธฐ๋ฐ Integer Motion Estimation ์ฐ๊ตฌ 21
2.3.2 ํ๋์จ์ด ๊ธฐ๋ฐ Merge Mode Estimation ์ฐ๊ตฌ 25
์ 3 ์ฅ Bottom-up Integer Motion Estimation 26
3.1 ์๋ก ๋ค๋ฅธ ๊ณ์ธต ๊ฐ์ Motion Vector ๊ด๊ณ ๊ด์ฐฐ 26
3.1.1 ์๋ก ๋ค๋ฅธ ๊ณ์ธต ๊ฐ์ Motion Vector ๊ด๊ณ ๋ถ์ 26
3.1.2 Top-down ๋ฐ Bottom-up ๋ฐฉํฅ์ Motion Vector ๊ด๊ณ ๋ถ์ 30
3.2 Bottom-up Motion Vector Prediction 33
3.3 Bottom-up Integer Motion Estimation 37
3.3.1 Bottom-up Integer Motion Estimation - Single MVP 37
3.3.2 Bottom-up Integer Motion Estimation - Multiple MVP 38
3.4 ์คํ ๊ฒฐ๊ณผ 40
์ 4 ์ฅ ํ๋์จ์ด ๊ธฐ๋ฐ Integer Motion Estimation 46
4.1 Bottom-up Integer Motion Estimation์ ํ๋์จ์ด ์ ์ฉ 46
4.2 ํ๋์จ์ด๋ฅผ ์ํ ์์ ๋ Test Zone Search 47
4.2.1 SAD-tree๋ฅผ ํ์ฉํ CU ๋ด PU์ ๋ณ๋ ฌ ์ฒ๋ฆฌ 47
4.2.2 Grid ๊ธฐ๋ฐ์ Sampled Raster Search 53
4.2.3 ์๋ก ๋ค๋ฅธ PU ๊ฐ์ ์ค๋ณต ์ฐ์ฐ ์ ๊ฑฐ 55
4.3 Idle cycle์ด ๊ฐ์๋ 5-stage ํ์ดํ๋ผ์ธ ์ค์ผ์ค 56
4.3.1 ํ์ดํ๋ผ์ธ ์คํ
์ด์ง ๋ณ ๋์ 56
4.3.2 Test Zone Search์ ์์กด์ฑ์ผ๋ก ์ธํ Idle cycle ๋์
58
4.3.3 ์ปจํ
์คํธ ์ค์์นญ์ ํตํ Idle cycle ๊ฐ์ 60
4.4 ๊ณ ์ ๋์์ ์ํ ์ฐธ์กฐ ๋ฐ์ดํฐ ๊ณต๊ธ ๋ฐฉ๋ฒ 63
4.4.1 ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ ํจํด ๋ฐ ์ ๊ทผ ์ง์ฐ ๋ฐ์ ์ ๋ฌธ์ ์ 63
4.4.2 Search Points์ Locality๋ฅผ ํ์ฉํ ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ 64
4.4.3 ๋จ์ผ cycle ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ์ ์ํ Multi Bank ๋ฉ๋ชจ๋ฆฌ ๊ตฌ์กฐ 66
4.4.4 ์ฐธ์กฐ ๋ฐ์ดํฐ ์ ๊ทผ์ ์์ ๋ ์ ์ด๋ฅผ ํตํ ์ค์์นญ ๋ณต์ก๋ ์ ๊ฐ ๋ฐฉ๋ฒ 68
4.5 ํ๋์จ์ด ๊ตฌ์กฐ 72
4.5.1 ์ ์ฒด ํ๋์จ์ด ๊ตฌ์กฐ 72
4.5.2 ํ๋์จ์ด ์ธ๋ถ ์ค์ผ์ค 78
4.6 ํ๋์จ์ด ๊ตฌํ ๊ฒฐ๊ณผ ๋ฐ ์คํ ๊ฒฐ๊ณผ 82
4.6.1 ํ๋์จ์ด ๊ตฌํ ๊ฒฐ๊ณผ 82
4.6.2 ์ํ ์๊ฐ ๋ฐ ์์ถ ํจ์จ 84
4.6.3 ์ ์ ๋ฐฉ๋ฒ ์ ์ฉ ๋จ๊ณ ๋ณ ์ฑ๋ฅ ๋ณํ 88
4.6.4 ์ด์ ์ฐ๊ตฌ์์ ๋น๊ต 91
์ 5 ์ฅ ํ๋์จ์ด ๊ธฐ๋ฐ Merge Mode Estimation 96
5.1 ๊ธฐ์กด Merge Mode Estimation์ ํ๋์จ์ด ๊ด์ ์์์ ๊ณ ์ฐฐ 96
5.1.1 ๊ธฐ์กด Merge Mode Estimation 96
5.1.2 ๊ธฐ์กด Merge Mode Estimation ํ๋์จ์ด ๊ตฌ์กฐ ๋ฐ ๋ถ์ 98
5.1.3 ๊ธฐ์กด Merge Mode Estimation์ ํ๋์จ์ด ์ฌ์ฉ๋ฅ ์ ํ ๋ฌธ์ 100
5.2 ์ฐ์ฐ๋ ๋ณ๋ํญ์ ๊ฐ์์ํจ ์๋ก์ด Merge Mode Estimation 103
5.3 ์๋ก์ด Merge Mode Estimation์ ํ๋์จ์ด ๊ตฌํ 106
5.3.1 ํ๋ณด ํ์
๋ณ ๋
๋ฆฝ์ path๋ฅผ ๊ฐ๋ ํ๋์จ์ด ๊ตฌ์กฐ 106
5.3.2 ํ๋์จ์ด ์ฌ์ฉ๋ฅ ์ ๋์ด๊ธฐ ์ํ ์ ์์ ํ๋ณด ํ ๋น ๋ฐฉ๋ฒ 109
5.3.3 ์ ์์ ํ๋ณด ํ ๋น ๋ฐฉ๋ฒ์ ์ ์ฉํ ํ๋์จ์ด ์ค์ผ์ค 111
5.4 ์คํ ๊ฒฐ๊ณผ ๋ฐ ํ๋์จ์ด ๊ตฌํ ๊ฒฐ๊ณผ 114
5.4.1 ์ํ ์๊ฐ ๋ฐ ์์ถ ํจ์จ ๋ณํ 114
5.4.2 ํ๋์จ์ด ๊ตฌํ ๊ฒฐ๊ณผ 116
์ 6 ์ฅ Overall Inter Prediction 117
6.1 CTU ๋จ์์ 3-stage ํ์ดํ๋ผ์ธ Inter Prediction 117
6.2 Two-way Encoding Order 119
6.2.1 Top-down ์ธ์ฝ๋ฉ ์์์ Bottom-up ์ธ์ฝ๋ฉ ์์ 119
6.2.2 ๊ธฐ์กด ๊ณ ์ ์๊ณ ๋ฆฌ์ฆ๊ณผ ํธํ๋๋ Two-way Encoding Order 120
6.2.3 ๊ธฐ์กด ๊ณ ์ ์๊ณ ๋ฆฌ์ฆ๊ณผ ๊ฒฐํฉ ๋ฐ ๋น๊ต ์คํ ๊ฒฐ๊ณผ 123
์ 7 ์ฅ Next Generation Video Coding์ผ๋ก์ ํ์ฅ 127
7.1 Bottom-up Motion Vector Prediction์ ํ์ฅ 127
7.2 Bottom-up Integer Motion Estimation์ ํ์ฅ 130
์ 8 ์ฅ ๊ฒฐ ๋ก 132Docto
Recommended from our members
Error control strategies in H.265|HEVC video transmission
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonWith the rapid development in video coding technologies in the last decade, high-resolution video delivery suffers from packet loss due to unreliable transmission channels (time-varying characteristics). The error Resilience approaches at channel coding level are less efficient to implement in real time video transmission as the encoded video samples are in variable code length. Therefore, error resilience in video coding standard plays a vital role to reduce the effect of error propagation and improve the perceived visual quality. The main work in this thesis is to develop an efficient error resilience mechanism for H.265|HEVC video coding standard to reduce the effects of error propagation in error-prone conditions. In this thesis, two error resilience algorithms are proposed. The first one is Adaptive Slice Encoding (ASE) error resilience algorithm. The concept of this algorithm is to extract and protect the most active slices in the coded bitstream based on the adaptive search window. This algorithm can be applied in low delay video transmission with and without using a feedback channel. It is also designed to be compatible with reference coding software manual (HM16) for H.265|HEVC coding standard. The second proposed algorithm is a joint encoder-decoder error resilience called Error resilience based on Supplemental Enhancement Information (ERSEI) algorithm. A feedback message status is used from the decoder to notify the encoder to start encoding clean random-access picture adaptively based on the decoded picture hash message status from the decoder. At the same time, the decoder will be notified to start the error concealment process whilst waiting to receive correct video data. A recovery point message from the decoder feedback channel is used to update the encoder with error messages.
In this thesis, extensive experimental work, evaluation, and comparison with state-of-the-art related algorithms have been conducted to evaluate the proposed algorithms. Furthermore, the best trade-off between the coding efficiency of the proposed error resilience algorithms and error resilience performance has been considered at the design stage. The experimental work evaluation includes both encoding conditions, i.e. error-free and error-prone. The results achieved from the experiments show significant improvements, in (Y-PSNR) results and subjective quality of the decoded bitstream, using the proposed algorithm in error-prone conditions with a variety of packet loss rates.
Moreover, experimental work is conducted to test the algorithms complexity in terms of required processing execution time at both encoding and decoding stages. Additionally, the video coding standard performance for both H.264|AVC and H.265|HEVC coding standards are evaluated in error-free and error-prone environments.
For ASE algorithm and when compared with improved region of interest (IROI) and region of interest (ROI) algorithms, a significant improvement in visual quality was the most obvious finding from the obtained results with PLRs of 2-18 (%).
For ERSEI algorithm and when compared with the default HM16 with pixel copy concealment and motion compensated error concealment (MCEC) techniques, the evaluation results indicate clear visual quality enhancement under different packet loss rates PLRs (1,2 6, 8) %.The Ministry of Higher Education and Scientific Research in Ira
Error resilience and concealment techniques for high-efficiency video coding
This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods
Visual Saliency Estimation Via HEVC Bitstream Analysis
Abstract
Since Information Technology developed dramatically from the last century 50's, digital images and video are ubiquitous. In the last decade, image and video processing have become more and more popular in biomedical, industrial, art and other fields. People made progress in the visual information such as images or video display, storage and transmission. The attendant problem is that video processing tasks in time domain become particularly arduous.
Based on the study of the existing compressed domain video saliency detection model, a new saliency estimation model for video based on High Efficiency Video Coding (HEVC) is presented. First, the relative features are extracted from HEVC encoded bitstream. The naive Bayesian model is used to train and test features based on original YUV videos and ground truth. The intra frame saliency map can be achieved after training and testing intra features. And inter frame saliency can be achieved by intra saliency with moving motion vectors. The ROC of our proposed intra mode is 0.9561. Other classification methods such as support vector machine (SVM), k nearest neighbors (KNN) and the decision tree are presented to compare the experimental outcomes. The variety of compression ratio has been analysis to affect the saliency
The impact of Tiles on video coding performance: a case study on HEVC and AV1 video coding standards
Receiver-Driven Video Adaptation
In the span of a single generation, video technology has made an incredible impact on daily life. Modern use cases for video are wildly diverse, including teleconferencing, live streaming, virtual reality, home entertainment, social networking, surveillance, body cameras, cloud gaming, and autonomous driving. As these applications continue to grow more sophisticated and heterogeneous, a single representation of video data can no longer satisfy all receivers. Instead, the initial encoding must be adapted to each receiver's unique needs. Existing adaptation strategies are fundamentally flawed, however, because they discard the video's initial representation and force the content to be re-encoded from scratch. This process is computationally expensive, does not scale well with the number of videos produced, and throws away important information embedded in the initial encoding. Therefore, a compelling need exists for the development of new strategies that can adapt video content without fully re-encoding it. To better support the unique needs of smart receivers, diverse displays, and advanced applications, general-use video systems should produce and offer receivers a more flexible compressed representation that supports top-down adaptation strategies from an original, compressed-domain ground truth. This dissertation proposes an alternate model for video adaptation that addresses these challenges. The key idea is to treat the initial compressed representation of a video as the ground truth, and allow receivers to drive adaptation by dynamically selecting which subsets of the captured data to receive. In support of this model, three strategies for top-down, receiver-driven adaptation are proposed. First, a novel, content-agnostic entropy coding technique is implemented in which symbols are selectively dropped from an input abstract symbol stream based on their estimated probability distributions to hit a target bit rate. Receivers are able to guide the symbol dropping process by supplying the encoder with an appropriate rate controller algorithm that fits their application needs and available bandwidths. Next, a domain-specific adaptation strategy is implemented for H.265/HEVC coded video in which the prediction data from the original source is reused directly in the adapted stream, but the residual data is recomputed as directed by the receiver. By tracking the changes made to the residual, the encoder can compensate for decoder drift to achieve near-optimal rate-distortion performance. Finally, a fully receiver-driven strategy is proposed in which the syntax elements of a pre-coded video are cataloged and exposed directly to clients through an HTTP API. Instead of requesting the entire stream at once, clients identify the exact syntax elements they wish to receive using a carefully designed query language. Although an implementation of this concept is not provided, an initial analysis shows that such a system could save bandwidth and computation when used by certain targeted applications.Doctor of Philosoph