2 research outputs found

    Full RDO๋ฅผ ์‚ฌ์šฉํ•˜๋Š” HEVC ํ•˜๋“œ์›จ์–ด๋ฅผ ์œ„ํ•œ Rate Control ์•Œ๊ณ ๋ฆฌ๋“ฌ์˜ ๊ฐœ์„ ๊ณผ ๊ตฌํ˜„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 2. ์ฑ„์ˆ˜์ต.HM ์ธ์ฝ”๋”์—์„œ ์ ์šฉ๋œ coding tree unit (CTU) ์ˆ˜์ค€์˜ rate control์„ ์ ์šฉํ•˜๋ฉด, rate control์„ ์ ์šฉํ•˜์ง€ ์•Š์•˜์„ ๊ฒฝ์šฐ์— ๋น„ํ•ด์„œ ์ฝ”๋”ฉ ํšจ์œจ์ด ๋‚˜๋น ์ ธ Bjรธntegaard-delta rate (BD rate)๊ฐ€ ์•ฝ 4.14 % ์ฆ๊ฐ€ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  HM ์ธ์ฝ”๋”์—์„œ๋Š” rate control ์•Œ๊ณ ๋ฆฌ๋“ฌ์ด floating point๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์–ด HW ๊ตฌํ˜„์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์€ HEVC์˜ reference SW์ธ HM ์ธ์ฝ”๋”์— ์ ์šฉ๋˜์–ด ์žˆ๋Š” rate control ์•Œ๊ณ ๋ฆฌ๋“ฌ์˜ ์ฝ”๋”ฉ ํšจ์œจ์„ ๊ฐœ์„ ํ•œ ๋‚ด์šฉ๊ณผ, HW ๊ตฌํ˜„์— ์ ํ•ฉํ•˜๊ฒŒ ์ˆ˜์ •ํ•˜๊ณ  ๋‚ด์šฉ์„ ์„ค๋ช…ํ•œ ํ›„์—, ์ˆ˜์ •๋œ rate control ์•Œ๊ณ ๋ฆฌ๋“ฌ์˜ HW ๊ตฌํ˜„์— ๋Œ€ํ•ด์„œ ๊ธฐ์ˆ ํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์˜ ๊ธฐ์—ฌ๋Š” picture ์ˆ˜์ค€์˜ bit ํ• ๋‹น ๋ฐฉ๋ฒ• ๊ฐœ์„ , HW ๊ตฌํ˜„์— ์ ํ•ฉํ•œ full RD cost์˜ ์‚ฌ์šฉ, log๋ฅผ ์ทจํ•œ log R-log ฮป model์˜ ๋„์ž…, ๊ทธ๋ฆฌ๊ณ  ๊ฐœ์„ ํ•œ rate control ์•Œ๊ณ ๋ฆฌ๋“ฌ์˜ HW ๊ตฌํ˜„์ด๋‹ค. HM ์ธ์ฝ”๋”์˜ rate control์—์„œ picture ์ˆ˜์ค€์˜ bit ํ• ๋‹น์€ ์ด๋ฏธ์ง€ ์‹œํ€€์Šค์— ๋”ฐ๋ผ์„œ ์ด๋ฏธ์ง€ ํ›„๋ฐ˜๋ถ€์— bit rate์ด ๋ถ€์กฑํ•˜์—ฌ picture์˜ peak signal-to-noise ratio (PSNR)์ด ๊ธ‰๊ฒฉํžˆ ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ์„ ๋ณด์ธ๋‹ค. ์ด ํ˜„์ƒ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ „์ฒด ์ด๋ฏธ์ง€ ์‹œํ€€์Šค์—์„œ target bit ํ• ๋‹น์„ ์ด๋ฏธ์ง€ ์ดˆ๋ฐ˜๋ถ€์— bit์„ ์กฐ๊ธˆ ๋œ ํ• ๋‹นํ•˜์—ฌ ์ด๋ฏธ์ง€ ์‹œํ€€์Šค ํ›„๋ฐ˜๋ถ€์— ์ข€ ๋” bit์„ ํ• ๋‹นํ•˜์—ฌ ์ด๋ฏธ์ง€ ์‹œํ€€์Šค ํ›„๋ฐ˜์— PSNR์ด ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ์„ ์™„ํ™”์‹œํ‚ค๋„๋ก picture ์ˆ˜์ค€ bit ํ• ๋‹น์„ ์œ„ํ•œ ์ˆ˜์ •๋œ ์•Œ๊ณ ๋ฆฌ๋“ฌ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  transform & full RDO & reconstruction์„ ์œ„ํ•œ pipeline stage์—์„œ full RD cost๋ฅผ ์ด์šฉํ•œ rate distortion optimization (RDO)์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์ด pipeline stage์—์„œ full RD cost ๊ณ„์‚ฐํ•˜๋Š” HW ๊ตฌํ˜„์„ ์œ„ํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ฒซ์งธ๋กœ rate control์˜ ์ฝ”๋”ฉ ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ, CTU๋ณ„ ฮป๊ฐ€ ์•„๋‹Œ picture์˜ ํ‰๊ท  ฮป๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ์ฝ”๋”ฉ์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ full RD cost ๊ณ„์‚ฐ์˜ HW ๋ณต์žก๋„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ quantization step size (Qstep)์˜ ์ œ๊ณฑ์œผ๋กœ ๋‚˜๋ˆˆ normalized full RD cost๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ full RD cost์˜ dynamic range๋ฅผ ํฌ๊ฒŒ ์ค„์˜€๋‹ค. HM ์ธ์ฝ”๋”์—์„œ rate control์˜ R-ฮป model์€ floating point๋กœ ๊ตฌํ˜„์ด ๋˜์–ด ์žˆ๊ณ  ์ง€์ˆ˜ ์—ฐ์‚ฐ์„ ์ด์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— HW ๊ตฌํ˜„์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ๊ทธ๋ž˜์„œ R-ฮป model์„, ์„ ํ˜• ์—ฐ์‚ฐ์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๊ณ  HW ๊ตฌํ˜„์— ์ ํ•ฉํ•˜๋„๋ก, log๋ฅผ ์ทจํ•˜์—ฌ log R-log ฮป model๋กœ ๋ณ€ํ˜•ํ•˜์˜€๋‹ค. HM ์ธ์ฝ”๋”์—์„œ ์‚ฌ์šฉํ•˜๋Š” R-D model์ธ hyperbolic model์˜ parameter updateํ•  ๋•Œ log๋ฅผ ์ทจํ•œ model parameter์˜ update์˜ ๊ทผ์‚ฌ์ด๊ธฐ ๋•Œ๋ฌธ์— log R-log ฮป model์„ ์ด์šฉํ•˜์˜€์„ ๋•Œ ์ฝ”๋”ฉ ํšจ์œจ์ด ์˜คํžˆ๋ ค ์กฐ๊ธˆ ์ข‹์•„์กŒ๋‹ค. ๊ทธ๋ฆฌ๊ณ  rate๊ณผ ๊ด€๋ จ๋œ ๋ณ€์ˆ˜๋“ค์˜ log domain๊ณผ real domain์—์„œ์˜ ๊ฐ’ ๋ณ€ํ™˜์„ ์œ„ํ•ด์„œ look-up table (LUT)์„ ์ด์šฉํ•œ log2์™€ anti-log2๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋‚˜๋ˆ—์…ˆ ์—ฐ์‚ฐ๋„ LUT์„ ์ด์šฉํ•˜์—ฌ HW์˜ ๋ณต์žก๋„๋ฅผ ์ค„์—ฌ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” rate control ๋ฐฉ๋ฒ•์˜ ํšจ์šฉ์„ฑ์„ 5๊ฐœ์˜ 1080p ์ด๋ฏธ์ง€ ์‹œํ€€์Šค Kimono, ParkScene, Cactus, BasketballDrive, BQTerrace์— ๋Œ€ํ•˜์—ฌ ์ธ์ฝ”๋”ฉ ๊ฒฐ๊ณผ๋กœ ํŒ๋‹จํ–ˆ๋‹ค. ์ธ์ฝ”๋”ฉ ํ™˜๊ฒฝ์€ common test condition์˜ random access (RA) configuration์œผ๋กœ TU split์„ ์ง€์›ํ•˜์ง€ ์•Š๋„๋ก ํ•˜์—ฌ maximum TU depth๋ฅผ 1๋กœ ์„ค์ •ํ•˜์˜€๋‹ค. Rate control์˜ target rate์€ rate control์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  QP 22, 27, 32, 37๋กœ ์ธ์ฝ”๋”ฉํ•œ ๊ฒฝ์šฐ์— ๋ฐœ์ƒํ•œ rate๋“ค๋กœ ์ •ํ•˜์˜€๋‹ค. ์ด ์กฐ๊ฑด์—์„œ ๊ฐœ์„ ํ•œ rate control ์•Œ๊ณ ๋ฆฌ๋“ฌ์€ HM ์ธ์ฝ”๋”์— ์ ์šฉ๋œ CTU-level rate control์˜ Y-BD rate 4.14 %๋ฅผ 1.99 %๋กœ ๊ฐ์†Œ์‹œํ‚จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ›„๋ฐ˜์— PSNR์ด ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ์„ ์ค„์—ฌ์„œ minimum PSNR์„ ํ‰๊ท  0.11 dB ํ–ฅ์ƒ ์‹œ์ผฐ๊ณ  ํŠนํžˆ ParkScene ์ด๋ฏธ์ง€ ์‹œํ€€์Šค์—์„œ๋Š” ์ตœ๋Œ€ 1.58 dB๊นŒ์ง€ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ์ œ์•ˆํ•œ rate control algorithm์„ HW๋กœ GOP, picture, CTU level์„ ๋ชจ๋‘ ์ง€์›ํ•˜๋„๋ก ๊ตฌํ˜„ํ–ˆ๋Š”๋ฐ, ๊ทธ ์ „์ฒด ๋ณต์žก๋„๋Š” 27.5 kgate์ด๊ณ  ์ถ”๊ฐ€๋กœ 32 KB์˜ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. Rate control์˜ ์ˆ˜ํ–‰์— ํ•„์š”ํ•œ cycle budget์€ CTU๋‹น 4 cycle๋กœ 4K 30 fps๋ฅผ 400 MHz์— ์ˆ˜ํ–‰ํ•œ๋‹ค๊ณ  ํ•˜์˜€์„ ๊ฒฝ์šฐ์— 0.06 %์˜ overhead์— ํ•ด๋‹นํ•˜๋ฉฐ ์ „์ฒด ์ธ์ฝ”๋”ฉ ๊ณผ์ •์˜ ์˜ํ–ฅ์„ ๊ฑฐ์˜ ์ฃผ์ง€ ์•Š๋Š” ์ˆ˜์ค€์ด๋‹ค.์ œ 1 ์žฅ ์„œ ๋ก  1 1.1 ์—ฐ๊ตฌ์˜ ๋ฐฐ๊ฒฝ 2 1.2 ๊ด€๋ จ ์—ฐ๊ตฌ 7 1.3 ์ „์ฒด ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ 12 ์ œ 2 ์žฅ HEVC HW ์ธ์ฝ”๋”์˜ pipeline ๊ตฌ์„ฑ 13 2.1 ๊ฐ€์ •ํ•˜๋Š” HEVC HW ์ธ์ฝ”๋”์˜ pipeline ๊ตฌ์„ฑ 13 2.2 ๊ฐ€์ •ํ•˜๋Š” HW ๊ตฌ์กฐ์˜ ์ฝ”๋”ฉ ํšจ์œจ ์ €ํ•˜ 17 2.3 Full RD cost ์˜ˆ์ธก๊ธฐ HW ๊ตฌํ˜„์˜ ๊ฐœ์š” 20 ์ œ 3 ์žฅ HEVC์˜ CTU-level Rate control์˜ ์•Œ๊ณ ๋ฆฌ๋“ฌ ์„ค๋ช… 23 3.1 HM ์ธ์ฝ”๋”์˜ CTU-level rate control ์ „์ฒด ๊ณผ์ • 23 3.2 Target bit allocation 25 3.3 ฮป and QP calculation 32 3.4 Encoding 35 3.5 Model parameter update 35 ์ œ 4 ์žฅ HEVC์˜ CTU-level Rate control์˜ ์•Œ๊ณ ๋ฆฌ๋“ฌ์˜ ์ฝ”๋”ฉ ํšจ์œจ ๊ฐœ์„  39 4.1 Rate control์˜ ์‹คํ—˜ ํ™˜๊ฒฝ๊ณผ HM ์ธ์ฝ”๋”์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ 39 4.2 Bit saving์„ ์ด์šฉํ•œ Picture-level bit allocation 43 4.3 Picture์˜ ํ‰๊ท  ฮป๋ฅผ ์ด์šฉํ•œ rate control์˜ ์ฝ”๋”ฉ ํšจ์œจ ๊ฐœ์„  50 4.4 Full RDO์—์„œ ์ด์šฉํ•˜๋Š” normalized RD cost 51 ์ œ 5 ์žฅ HEVC์˜ CTU-level Rate control์˜ HW ๊ตฌํ˜„ 57 5.1 HW ๊ตฌํ˜„์„ ์œ„ํ•œ log๋ฅผ ์ทจํ•œ log R-log ฮป model 58 5.2 HW ๊ตฌํ˜„์„ ์œ„ํ•œ GOP์˜ picture๋ณ„ target rate ๊ณ„์‚ฐ ๋ฐฉ๋ฒ• 62 5.3 HW ๊ตฌํ˜„์„ ์œ„ํ•œ model parameter update 68 5.4 Rate control์˜ HW ๊ตฌํ˜„์„ ์œ„ํ•œ fixed point ์—ฐ์‚ฐ๊ณผ LUT ์‚ฌ์šฉ 70 5.5 Rate control์˜ HW ๊ตฌํ˜„๊ณผ HW ์ธ์ฝ”๋”์—์„œ์˜ rate control์˜ ๋™์ž‘ 75 ์ œ 6 ์žฅ ๊ฒฐ ๋ก  81 ์ฐธ๊ณ  ๋ฌธํ—Œ 83 Abstract 87Docto

    Design of microprocessor-based hardware for number theoretic transform implementation

    Get PDF
    Number Theoretic Transforms (NTTs) are defined in a finite ring of integers Z (_M), where M is the modulus. All the arithmetic operations are carried out modulo M. NTTs are similar in structure to DFTs, hence fast FFT type algorithms may be used to compute NTTs efficiently. A major advantage of the NTT is that it can be used to compute error free convolutions, unlike the FFT it is not subject to round off and truncation errors. In 1976 Winograd proposed a set of short length DFT algorithms using a fewer number of multiplications and approximately the same number of additions as the Cooley-Tukey FFT algorithm. This saving is accomplished at the expense of increased algorithm complexity. These short length DFT algorithms may be combined to perform longer transforms. The Winograd Fourier Transform Algorithm (WFTA) was implemented on a TMS9900 microprocessor to compute NTTs. Since multiplication conducted modulo M is very time consuming a special purpose external hardware modular multiplier was designed, constructed and interfaced with the TMS9900 microprocessor. This external hardware modular multiplier allowed an improvement in the transform execution time. Computation time may further be reduced by employing several microprocessors. Taking advantage of the inherent parallelism of the WFTA, a dedicated parallel microprocessor system was designed and constructed to implement a 15-point WFTA in parallel. Benchmark programs were written to choose a suitable microprocessor for the parallel microprocessor system. A master or a host microprocessor is used to control the parallel microprocessor system and provides an interface to the outside world. An analogue to digital (A/D) and a digital to analogue (D/A) converter allows real time digital signal processing
    corecore