14 research outputs found

    Complexity Analysis Of Next-Generation VVC Encoding and Decoding

    Full text link
    While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202

    An โ€˜on-demandโ€™ Data Communication Architecture for Supplying Multiple Applications from a Single Data Source: An Industrial Application Case Study

    Get PDF
    A key aspect of automation is the manipulation of feedback sensor data for the automated control of particular process actuators. Often in practice this data can be reused for other applications, such as the live update of a graphical user interface, a fault detection application or a business intelligence process performance engine in real-time. In order for this data to be reused effectively, appropriate data communication architecture must be utilised to provide such functionality. This architecture must accommodate the dependencies of the system and sustain the required data transmission speed to ensure stability and data integrity. Such an architecture is presented in this paper, which shows how the data needs of multiple applications are satisfied from a single source of data. It shows how the flexibility of this architecture enables the integration of additional data sources as the data dependencies grow. This research is based on the development of a fully integrated automation system for the test of fuel controls used on civil transport aircraft engines

    A high performance hardware architecture for one bit transform based motion estimation

    Get PDF
    Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (IBT) based ME algorithms have low computational complexity. Therefore, in this paper, we propose a high performance systolic hardware architecture for IBT based ME. The proposed hardware performs full search ME for 4 Macroblocks in parallel and it is the fastest IBT based ME hardware reported in the literature. In addition, it uses less on-chip memory than the previous IBT based ME hardware by using a novel data reuse scheme and memory organization. The proposed hardware is implemented in Verilog HDL. It consumes %34 of the slices in a Xilinx XC2VP30-7 FPGA. It works at 115 MHz in the same FPGA and is capable of processing 50 1920x1080 full High Definition frames per second. Therefore, it can be used in consumer electronics products that require real-time video processing or compression

    High-Efficient Video Transmission for HDTV Broadcasting

    Get PDF
    Before broadcasting a video signal, redundant data should be removed from the transmitted video signal. This redundancy operation can be performed using many video coding standards such as H.264/Advanced Video Coding (AVC) and H.265/High-Efficient Video Coding (HEVC) standards. Although both standards produce a great video resolution, too much data are considered to be still redundant. The most exhaustive process in video encoding process is the Motion Estimation (ME) process. The more the resolution of the transmitted video signal, the more the video data to be fetched from the main memory. This will increase the required memory access time for performing the Motion Estimation process. In This chapter, a smart ME coprocessor architecture, which greatly reduces the memory access time, is presented. Data reuse algorithm is used to minimize the memory access time. The discussed coprocessor effectively reuses the data of the search area to minimize the overall memory access time (I/O memory bandwidth) while fully using all resources and hardware. This would speed up the video broadcasting process. For a search range of 32 ร— 32 and block size of 16 ร— 16, the architecture can perform Motion Estimation for 30 fps of HDTV video and easily outperforms many fast full-search architectures

    Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

    Get PDF

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using todayโ€™s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Hardware based High Accuracy Integer Motion Estimation and Merge Mode Estimation

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ์ดํ˜์žฌ.HEVC๋Š” H.264/AVC ๋Œ€๋น„ 2๋ฐฐ์˜ ๋›ฐ์–ด๋‚œ ์••์ถ• ํšจ์œจ์„ ๊ฐ€์ง€์ง€๋งŒ, ๋งŽ์€ ์••์ถ• ๊ธฐ์ˆ ์ด ์‚ฌ์šฉ๋จ์œผ๋กœ์จ, ์ธ์ฝ”๋” ์ธก์˜ ๊ณ„์‚ฐ ๋ณต์žก๋„๋ฅผ ํฌ๊ฒŒ ์ฆ๊ฐ€์‹œ์ผฐ๋‹ค. HEVC์˜ ๋†’์€ ๊ณ„์‚ฐ ๋ณต์žก๋„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋งŽ์€ ์—ฐ๊ตฌ๋“ค์ด ์ด๋ฃจ์–ด์กŒ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ์—ฐ๊ตฌ๋“ค์€ H.264/AVC๋ฅผ ์œ„ํ•œ ๊ณ„์‚ฐ ๋ณต์žก๋„ ๊ฐ์†Œ ๋ฐฉ๋ฒ•์„ ํ™•์žฅ ์ ์šฉํ•˜๋Š” ๋ฐ์— ๊ทธ์ณ, ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ์•Š์€ ๊ณ„์‚ฐ ๋ณต์žก๋„ ๊ฐ์†Œ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ฑฐ๋‚˜, ์ง€๋‚˜์น˜๊ฒŒ ํฐ ์••์ถ• ํšจ์œจ ์†์‹ค์„ ๋™๋ฐ˜ํ•˜์—ฌ HEVC์˜ ์ตœ๋Œ€ ์••์ถ• ์„ฑ๋Šฅ์„ ๋Œ์–ด๋‚ด์ง€ ๋ชปํ–ˆ๋‹ค. ํŠนํžˆ ์•ž์„œ ์—ฐ๊ตฌ๋œ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜์˜ ์ธ์ฝ”๋”๋Š” ์‹ค์‹œ๊ฐ„ ์ธ์ฝ”๋”์˜ ์‹คํ˜„์ด ์šฐ์„ ๋˜์–ด ์••์ถ• ํšจ์œจ์˜ ํฌ์ƒ์ด ๋งค์šฐ ํฌ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ Inter prediction์˜ ๊ณ ์†ํ™”๋ฅผ ์ด๋ฃธ๊ณผ ๋™์‹œ์— HEVC๊ฐ€ ๊ฐ€์ง„ ์••์ถ• ์„ฑ๋Šฅ์˜ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๊ณ , ์‹ค์‹œ๊ฐ„ ์ฝ”๋”ฉ์ด ๊ฐ€๋Šฅํ•œ ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆํ•œ bottom-up MV ์˜ˆ์ธก ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ ๊ณต๊ฐ„์ , ์‹œ๊ฐ„์ ์œผ๋กœ ์ธ์ ‘ํ•œ PU๋กœ๋ถ€ํ„ฐ MV๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์•„๋‹Œ, HEVC์˜ ๊ณ„์ธต์ ์œผ๋กœ ์ธ์ ‘ํ•œ PU๋กœ๋ถ€ํ„ฐ MV๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ MV ์˜ˆ์ธก์˜ ์ •ํ™•๋„๋ฅผ ํฐ ํญ์œผ๋กœ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์••์ถ• ํšจ์œจ์˜ ๋ณ€ํ™” ์—†์ด IME์˜ ๊ณ„์‚ฐ ๋ณต์žก๋„๋ฅผ 67% ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ œ์•ˆ๋œ bottom-up IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ๋™์ž‘์ด ๊ฐ€๋Šฅํ•œ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜์˜ IME๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ธฐ์กด์˜ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ IME๋Š” ๊ณ ์† IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๊ฐ–๋Š” ๋‹จ๊ณ„๋ณ„ ์˜์กด์„ฑ์œผ๋กœ ์ธํ•œ idle cycle์˜ ๋ฐœ์ƒ๊ณผ ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ๋ฌธ์ œ๋กœ ์ธํ•ด, ๊ณ ์† IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ๋˜๋Š” ํ•˜๋“œ์›จ์–ด์— ๋งž๊ฒŒ ๊ณ ์† IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ˆ˜์ •ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— ์••์ถ• ํšจ์œจ์˜ ์ €ํ•˜๊ฐ€ ์ˆ˜ ํผ์„ผํŠธ ์ด์ƒ์œผ๋กœ ๋งค์šฐ ์ปธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ณ ์† IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ TZS ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ฑ„ํƒํ•˜์—ฌ TZS ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ณ„์‚ฐ ๋ณต์žก๋„ ๊ฐ์†Œ ์„ฑ๋Šฅ์„ ํ›ผ์†ํ•˜์ง€ ์•Š๋Š” ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜์˜ IME๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ณ ์† IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•˜๋“œ์›จ์–ด์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ์‚ฌํ•ญ์„ ์ œ์•ˆํ•˜๊ณ  ํ•˜๋“œ์›จ์–ด์— ์ ์šฉํ•˜์˜€๋‹ค. ์ฒซ ์งธ๋กœ, ๊ณ ์† IME ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ณ ์งˆ์  ๋ฌธ์ œ์ธ idle cycle ๋ฐœ์ƒ ๋ฌธ์ œ๋ฅผ ์„œ๋กœ ๋‹ค๋ฅธ ์ฐธ์กฐ ํ”ฝ์ณ์™€ ์„œ๋กœ ๋‹ค๋ฅธ depth์— ๋Œ€ํ•œ IME๋ฅผ ์ปจํ…์ŠคํŠธ ์Šค์œ„์นญ์„ ํ†ตํ•ด ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋‘˜ ์งธ๋กœ, ์ฐธ์กฐ ๋ฐ์ดํ„ฐ๋กœ์˜ ๋น ๋ฅด๊ณ  ์ž์œ ๋กœ์šด ์ ‘๊ทผ์„ ์œ„ํ•ด ์ฐธ์กฐ ๋ฐ์ดํ„ฐ์˜ locality ์ด์šฉํ•œ multi bank SRAM ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์…‹ ์งธ๋กœ, ์ง€๋‚˜์น˜๊ฒŒ ์ž์œ ๋กœ์šด ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์ด ๋ฐœ์ƒ์‹œํ‚ค๋Š” ๋Œ€๋Ÿ‰์˜ ์Šค์œ„์นญ mux์˜ ์‚ฌ์šฉ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ํƒ์ƒ‰ ์ค‘์‹ฌ์„ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋Š” ์ œํ•œ๋œ ์ž์œ ๋„์˜ ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ ์ œ์•ˆ๋œ IME ํ•˜๋“œ์›จ์–ด๋Š” HEVC์˜ ๋ชจ๋“  ๋ธ”๋ก ํฌ๊ธฐ๋ฅผ ์ง€์›ํ•˜๋ฉด์„œ, ์ฐธ์กฐ ํ”ฝ์ฒ˜ 4์žฅ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, 4k UHD ์˜์ƒ์„ 60fps์˜ ์†๋„๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด ๋•Œ ์••์ถ• ํšจ์œจ์˜ ์†์‹ค์€ 0.11%๋กœ ๊ฑฐ์˜ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š”๋‹ค. ์ด ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค๋Š” 1.27M gates์ด๋‹ค. HEVC์— ์ƒˆ๋กœ์ด ์ฑ„ํƒ๋œ merge mode estimation์€ ์••์ถ• ํšจ์œจ ๊ฐœ์„  ํšจ๊ณผ๊ฐ€ ๋›ฐ์–ด๋‚œ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์ด์ง€๋งŒ, ๋งค PU ๋งˆ๋‹ค ๊ณ„์‚ฐ ๋ณต์žก๋„์˜ ๋ณ€๋™ ํญ์ด ์ปค์„œ ํ•˜๋“œ์›จ์–ด๋กœ ๊ตฌํ˜„๋˜๋Š” ๊ฒฝ์šฐ ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค์˜ ๋‚ญ๋น„๊ฐ€ ๋งŽ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํšจ์œจ์ ์ธ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ MME ๋ฐฉ๋ฒ•๊ณผ ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๋ฅผ ํ•จ๊ป˜ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ธฐ์กด MME ๋ฐฉ์‹์€ ์ด์›ƒ PU์— ์˜ํ•ด ๋ณด๊ฐ„ ํ•„ํ„ฐ ์ ์šฉ ์—ฌ๋ถ€๊ฐ€ ๊ฒฐ์ •๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ณด๊ฐ„ ํ•„ํ„ฐ์˜ ์‚ฌ์šฉ๋ฅ ์€ 50% ์ดํ•˜๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ํ•˜๋“œ์›จ์–ด๋Š” ๋ณด๊ฐ„ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์— ๋งž์ถ”์–ด ์„ค๊ณ„๋˜์–ด์™”๊ธฐ ๋•Œ๋ฌธ์— ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค์˜ ์‚ฌ์šฉ ํšจ์œจ์ด ๋‚ฎ์•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ฐ€์žฅ ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ์„ธ๋กœ ๋ฐฉํ–ฅ ๋ณด๊ฐ„ ํ•„ํ„ฐ๋ฅผ ์ ˆ๋ฐ˜ ํฌ๊ธฐ๋กœ ์ค„์ธ ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํŒจ์Šค๋ฅผ ๊ฐ–๋Š” MME ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๊ณ , ๋†’์€ ํ•˜๋“œ์›จ์–ด ์‚ฌ์šฉ๋ฅ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์••์ถ• ํšจ์œจ ์†์‹ค์„ ์ตœ์†Œํ™” ํ•˜๋Š” merge ํ›„๋ณด ํ• ๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ, ๊ธฐ์กด ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ MME ๋ณด๋‹ค 24% ์ ์€ ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ๋„ 7.4% ๋” ๋น ๋ฅธ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ๊ฐ–๋Š” ์ƒˆ๋กœ์šด ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜์˜ MME๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜์˜ MME๋Š” 460.8K gates์˜ ํ•˜๋“œ์›จ์–ด ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  4k UHD ์˜์ƒ์„ 30 fps์˜ ์†๋„๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.์ œ 1 ์žฅ ์„œ ๋ก  1 1.1 ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ 1 1.2 ์—ฐ๊ตฌ ๋‚ด์šฉ 3 1.3 ๊ณตํ†ต ์‹คํ—˜ ํ™˜๊ฒฝ 5 1.4 ๋…ผ๋ฌธ ๊ตฌ์„ฑ 6 ์ œ 2 ์žฅ ๊ด€๋ จ ์—ฐ๊ตฌ 7 2.1 HEVC ํ‘œ์ค€ 7 2.1.1 ์ฟผ๋“œ-ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ๊ณ„์ธต์  ๋ธ”๋ก ๊ตฌ์กฐ 7 2.1.2 HEVC ์˜ Inter Prediction 9 2.2 ํ™”๋ฉด ๊ฐ„ ์˜ˆ์ธก์˜ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ด์ „ ์—ฐ๊ตฌ 17 2.2.1 ๊ณ ์† Integer Motion Estimation ์•Œ๊ณ ๋ฆฌ์ฆ˜ 17 2.2.2 ๊ณ ์† Merge Mode Estimation ์•Œ๊ณ ๋ฆฌ์ฆ˜ 20 2.3 ํ™”๋ฉด ๊ฐ„ ์˜ˆ์ธก ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ์— ๋Œ€ํ•œ ์ด์ „ ์—ฐ๊ตฌ 21 2.3.1 ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ Integer Motion Estimation ์—ฐ๊ตฌ 21 2.3.2 ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ Merge Mode Estimation ์—ฐ๊ตฌ 25 ์ œ 3 ์žฅ Bottom-up Integer Motion Estimation 26 3.1 ์„œ๋กœ ๋‹ค๋ฅธ ๊ณ„์ธต ๊ฐ„์˜ Motion Vector ๊ด€๊ณ„ ๊ด€์ฐฐ 26 3.1.1 ์„œ๋กœ ๋‹ค๋ฅธ ๊ณ„์ธต ๊ฐ„์˜ Motion Vector ๊ด€๊ณ„ ๋ถ„์„ 26 3.1.2 Top-down ๋ฐ Bottom-up ๋ฐฉํ–ฅ์˜ Motion Vector ๊ด€๊ณ„ ๋ถ„์„ 30 3.2 Bottom-up Motion Vector Prediction 33 3.3 Bottom-up Integer Motion Estimation 37 3.3.1 Bottom-up Integer Motion Estimation - Single MVP 37 3.3.2 Bottom-up Integer Motion Estimation - Multiple MVP 38 3.4 ์‹คํ—˜ ๊ฒฐ๊ณผ 40 ์ œ 4 ์žฅ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ Integer Motion Estimation 46 4.1 Bottom-up Integer Motion Estimation์˜ ํ•˜๋“œ์›จ์–ด ์ ์šฉ 46 4.2 ํ•˜๋“œ์›จ์–ด๋ฅผ ์œ„ํ•œ ์ˆ˜์ •๋œ Test Zone Search 47 4.2.1 SAD-tree๋ฅผ ํ™œ์šฉํ•œ CU ๋‚ด PU์˜ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ 47 4.2.2 Grid ๊ธฐ๋ฐ˜์˜ Sampled Raster Search 53 4.2.3 ์„œ๋กœ ๋‹ค๋ฅธ PU ๊ฐ„์˜ ์ค‘๋ณต ์—ฐ์‚ฐ ์ œ๊ฑฐ 55 4.3 Idle cycle์ด ๊ฐ์†Œ๋œ 5-stage ํŒŒ์ดํ”„๋ผ์ธ ์Šค์ผ€์ค„ 56 4.3.1 ํŒŒ์ดํ”„๋ผ์ธ ์Šคํ…Œ์ด์ง€ ๋ณ„ ๋™์ž‘ 56 4.3.2 Test Zone Search์˜ ์˜์กด์„ฑ์œผ๋กœ ์ธํ•œ Idle cycle ๋„์ž… 58 4.3.3 ์ปจํ…์ŠคํŠธ ์Šค์œ„์นญ์„ ํ†ตํ•œ Idle cycle ๊ฐ์†Œ 60 4.4 ๊ณ ์† ๋™์ž‘์„ ์œ„ํ•œ ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ๊ณต๊ธ‰ ๋ฐฉ๋ฒ• 63 4.4.1 ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ํŒจํ„ด ๋ฐ ์ ‘๊ทผ ์ง€์—ฐ ๋ฐœ์ƒ ์‹œ ๋ฌธ์ œ์  63 4.4.2 Search Points์˜ Locality๋ฅผ ํ™œ์šฉํ•œ ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ 64 4.4.3 ๋‹จ์ผ cycle ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์„ ์œ„ํ•œ Multi Bank ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ 66 4.4.4 ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ์˜ ์ž์œ ๋„ ์ œ์–ด๋ฅผ ํ†ตํ•œ ์Šค์œ„์นญ ๋ณต์žก๋„ ์ €๊ฐ ๋ฐฉ๋ฒ• 68 4.5 ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ 72 4.5.1 ์ „์ฒด ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ 72 4.5.2 ํ•˜๋“œ์›จ์–ด ์„ธ๋ถ€ ์Šค์ผ€์ค„ 78 4.6 ํ•˜๋“œ์›จ์–ด ๊ตฌํ˜„ ๊ฒฐ๊ณผ ๋ฐ ์‹คํ—˜ ๊ฒฐ๊ณผ 82 4.6.1 ํ•˜๋“œ์›จ์–ด ๊ตฌํ˜„ ๊ฒฐ๊ณผ 82 4.6.2 ์ˆ˜ํ–‰ ์‹œ๊ฐ„ ๋ฐ ์••์ถ• ํšจ์œจ 84 4.6.3 ์ œ์•ˆ ๋ฐฉ๋ฒ• ์ ์šฉ ๋‹จ๊ณ„ ๋ณ„ ์„ฑ๋Šฅ ๋ณ€ํ™” 88 4.6.4 ์ด์ „ ์—ฐ๊ตฌ์™€์˜ ๋น„๊ต 91 ์ œ 5 ์žฅ ํ•˜๋“œ์›จ์–ด ๊ธฐ๋ฐ˜ Merge Mode Estimation 96 5.1 ๊ธฐ์กด Merge Mode Estimation์˜ ํ•˜๋“œ์›จ์–ด ๊ด€์ ์—์„œ์˜ ๊ณ ์ฐฐ 96 5.1.1 ๊ธฐ์กด Merge Mode Estimation 96 5.1.2 ๊ธฐ์กด Merge Mode Estimation ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ ๋ฐ ๋ถ„์„ 98 5.1.3 ๊ธฐ์กด Merge Mode Estimation์˜ ํ•˜๋“œ์›จ์–ด ์‚ฌ์šฉ๋ฅ  ์ €ํ•˜ ๋ฌธ์ œ 100 5.2 ์—ฐ์‚ฐ๋Ÿ‰ ๋ณ€๋™ํญ์„ ๊ฐ์†Œ์‹œํ‚จ ์ƒˆ๋กœ์šด Merge Mode Estimation 103 5.3 ์ƒˆ๋กœ์šด Merge Mode Estimation์˜ ํ•˜๋“œ์›จ์–ด ๊ตฌํ˜„ 106 5.3.1 ํ›„๋ณด ํƒ€์ž… ๋ณ„ ๋…๋ฆฝ์  path๋ฅผ ๊ฐ–๋Š” ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ 106 5.3.2 ํ•˜๋“œ์›จ์–ด ์‚ฌ์šฉ๋ฅ ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ์ ์‘์  ํ›„๋ณด ํ• ๋‹น ๋ฐฉ๋ฒ• 109 5.3.3 ์ ์‘์  ํ›„๋ณด ํ• ๋‹น ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•œ ํ•˜๋“œ์›จ์–ด ์Šค์ผ€์ค„ 111 5.4 ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ํ•˜๋“œ์›จ์–ด ๊ตฌํ˜„ ๊ฒฐ๊ณผ 114 5.4.1 ์ˆ˜ํ–‰ ์‹œ๊ฐ„ ๋ฐ ์••์ถ• ํšจ์œจ ๋ณ€ํ™” 114 5.4.2 ํ•˜๋“œ์›จ์–ด ๊ตฌํ˜„ ๊ฒฐ๊ณผ 116 ์ œ 6 ์žฅ Overall Inter Prediction 117 6.1 CTU ๋‹จ์œ„์˜ 3-stage ํŒŒ์ดํ”„๋ผ์ธ Inter Prediction 117 6.2 Two-way Encoding Order 119 6.2.1 Top-down ์ธ์ฝ”๋”ฉ ์ˆœ์„œ์™€ Bottom-up ์ธ์ฝ”๋”ฉ ์ˆœ์„œ 119 6.2.2 ๊ธฐ์กด ๊ณ ์† ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ํ˜ธํ™˜๋˜๋Š” Two-way Encoding Order 120 6.2.3 ๊ธฐ์กด ๊ณ ์† ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๊ฒฐํ•ฉ ๋ฐ ๋น„๊ต ์‹คํ—˜ ๊ฒฐ๊ณผ 123 ์ œ 7 ์žฅ Next Generation Video Coding์œผ๋กœ์˜ ํ™•์žฅ 127 7.1 Bottom-up Motion Vector Prediction์˜ ํ™•์žฅ 127 7.2 Bottom-up Integer Motion Estimation์˜ ํ™•์žฅ 130 ์ œ 8 ์žฅ ๊ฒฐ ๋ก  132Docto
    corecore