216 research outputs found

    ํŠน์ง• ํ˜ผํ•ฉ ๋„คํŠธ์›Œํฌ๋ฅผ ์ด์šฉํ•œ ์˜์ƒ ์ •ํ•ฉ ๊ธฐ๋ฒ•๊ณผ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ• ๋ฐ ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™”์—์„œ์˜ ์‘์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์กฐ๋‚จ์ต.This dissertation presents a deep end-to-end network for high dynamic range (HDR) imaging of dynamic scenes with background and foreground motions. Generating an HDR image from a sequence of multi-exposure images is a challenging process when the images have misalignments by being taken in a dynamic situation. Hence, recent methods first align the multi-exposure images to the reference by using patch matching, optical flow, homography transformation, or attention module before the merging. In this dissertation, a deep network that synthesizes the aligned images as a result of blending the information from multi-exposure images is proposed, because explicitly aligning photos with different exposures is inherently a difficult problem. Specifically, the proposed network generates under/over-exposure images that are structurally aligned to the reference, by blending all the information from the dynamic multi-exposure images. The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing multi-exposure images that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods. Specifically, the proposed alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images. The proposed network is shown to generate the aligned images with a wide range of exposure differences very well and thus can be effectively used for the HDR imaging of dynamic scenes. Moreover, by adding a simple merging network after the alignment network and training the overall system end-to-end, a performance gain compared to the recent state-of-the-art methods is obtained. This dissertation also presents a deep end-to-end network for video super-resolution (VSR) of frames with motions. To reconstruct an HR frame from a sequence of adjacent frames is a challenging process when the images have misalignments. Hence, recent methods first align the adjacent frames to the reference by using optical flow or adding spatial transformer network (STN). In this dissertation, a deep network that synthesizes the aligned frames as a result of blending the information from adjacent frames is proposed, because explicitly aligning frames is inherently a difficult problem. Specifically, the proposed network generates adjacent frames that are structurally aligned to the reference, by blending all the information from the neighbor frames. The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing frames that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods. Specifically, the proposed alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images. The proposed network is shown to generate the aligned frames very well and thus can be effectively used for the VSR. Moreover, by adding a simple reconstruction network after the alignment network and training the overall system end-to-end, A performance gain compared to the recent state-of-the-art methods is obtained. In addition to each HDR imaging and VSR network, this dissertation presents a deep end-to-end network for joint HDR-SR of dynamic scenes with background and foreground motions. The proposed HDR imaging and VSR networks enhace the dynamic range and the resolution of images, respectively. However, they can be enhanced simultaneously by a single network. In this dissertation, the network which has same structure of the proposed VSR network is proposed. The network is shown to reconstruct the final results which have higher dynamic range and resolution. It is compared with several methods designed with existing HDR imaging and VSR networks, and shows both qualitatively and quantitatively better results.๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋ฐฐ๊ฒฝ ๋ฐ ์ „๊ฒฝ์˜ ์›€์ง์ž„์ด ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•์„ ์œ„ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์›€์ง์ž„์ด ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ์ดฌ์˜๋œ ๋…ธ์ถœ์ด ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ์˜ ์ƒ๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ์ž‘์—…์ด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์—, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ์ด๋ฏธ์ง€๋“ค์„ ํ•ฉ์„ฑํ•˜๊ธฐ ์ „์— ํŒจ์น˜ ๋งค์นญ, ์˜ตํ‹ฐ์ปฌ ํ”Œ๋กœ์šฐ, ํ˜ธ๋ชจ๊ทธ๋ž˜ํ”ผ ๋ณ€ํ™˜ ๋“ฑ์„ ์ด์šฉํ•˜์—ฌ ๊ทธ ์ด๋ฏธ์ง€๋“ค์„ ๋จผ์ € ์ •๋ ฌํ•œ๋‹ค. ์‹ค์ œ๋กœ ๋…ธ์ถœ ์ •๋„๊ฐ€ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€๋“ค์„ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์€ ์•„์ฃผ ์–ด๋ ค์šด ์ž‘์—…์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€๋“ค๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋ฅผ ์„ž์–ด์„œ ์ •๋ ฌ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ๋” ๋ฐ๊ฒŒ ํ˜น์€ ์–ด๋‘ก๊ฒŒ ์ดฌ์˜๋œ ์ด๋ฏธ์ง€๋“ค์„ ์ค‘๊ฐ„ ๋ฐ๊ธฐ๋กœ ์ดฌ์˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค. ์ฃผ์š”ํ•œ ์•„์ด๋””์–ด๋Š” ์ •๋ ฌ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•  ๋•Œ ํŠน์ง• ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, ์ด๋Š” ํ”ฝ์…€ ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๊ฑฐ๋‚˜ ๊ธฐํ•˜ํ•™์  ๋ณ€ํ™˜์„ ์ด์šฉํ•  ๋•Œ ๋ณด๋‹ค ๋” ์ข‹์€ ์ •๋ ฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ–๋Š”๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ์ •๋ ฌ ๋„คํŠธ์›Œํฌ๋Š” ๋‘ ๊ฐˆ๋ž˜์˜ ์ธ์ฝ”๋”์™€ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค ๊ทธ๋ฆฌ๊ณ  ๋””์ฝ”๋”๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ธ์ฝ”๋”๋“ค์€ ๋‘ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค์ด ์ด ํŠน์ง•๋“ค์„ ์„ž๋Š”๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋””์ฝ”๋”์—์„œ ์ •๋ ฌ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ๋…ธ์ถœ ์ •๋„๊ฐ€ ํฌ๊ฒŒ ์ฐจ์ด๋‚˜๋Š” ์˜์ƒ์—์„œ๋„ ์ž˜ ์ž‘๋™ํ•œ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๊ฐ„๋‹จํ•œ ๋ณ‘ํ•ฉ ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ „์ฒด ๋„คํŠธ์›Œํฌ๋“ค์„ ํ•œ ๋ฒˆ์— ํ•™์Šตํ•จ์œผ๋กœ์„œ, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค ๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ–๋Š”๋‹ค. ๋˜ํ•œ, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋™์˜์ƒ ๋‚ด ํ”„๋ ˆ์ž„๋“ค์„ ์ด์šฉํ•˜๋Š” ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™” ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋™์˜์ƒ ๋‚ด ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค ์‚ฌ์ด์—๋Š” ์›€์ง์ž„์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ณ  ํ•ด์ƒ๋„์˜ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์•„์ฃผ ์–ด๋ ค์šด ์ž‘์—…์ด๋‹ค. ๋”ฐ๋ผ์„œ, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ์ด ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค์„ ์ •๋ ฌํ•˜๊ธฐ ์œ„ํ•ด ์˜ตํ‹ฐ์ปฌ ํ”Œ๋กœ์šฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ฑฐ๋‚˜ STN์„ ์ถ”๊ฐ€ํ•œ๋‹ค. ์›€์ง์ž„์ด ์กด์žฌํ•˜๋Š” ํ”„๋ ˆ์ž„๋“ค์„ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ๊ณผ์ •์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋ฅผ ์„ž์–ด์„œ ์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ์ด์›ƒํ•œ ํ”„๋ ˆ์ž„๋“ค์„ ๋ชฉํ‘œ ํ”„๋ ˆ์ž„์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ฃผ์š” ์•„์ด๋””์–ด๋Š” ์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑํ•  ๋•Œ ํŠน์ง• ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ํ”ฝ์…€ ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๊ฑฐ๋‚˜ ๊ธฐํ•˜ํ•™์  ๋ณ€ํ™˜์„ ์ด์šฉํ•  ๋•Œ ๋ณด๋‹ค ๋” ์ข‹์€ ์ •๋ ฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ–๋Š”๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ์ •๋ ฌ ๋„คํŠธ์›Œํฌ๋Š” ๋‘ ๊ฐˆ๋ž˜์˜ ์ธ์ฝ”๋”์™€ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค ๊ทธ๋ฆฌ๊ณ  ๋””์ฝ”๋”๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ธ์ฝ”๋”๋“ค์€ ๋‘ ์ž…๋ ฅ ํ”„๋ ˆ์ž„์œผ๋กœ๋ถ€ํ„ฐ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค์ด ์ด ํŠน์ง•๋“ค์„ ์„ž๋Š”๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋””์ฝ”๋”์—์„œ ์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค์„ ์ž˜ ์ •๋ ฌํ•˜๋ฉฐ, ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™”์— ํšจ๊ณผ์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ๋ณ‘ํ•ฉ ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ „์ฒด ๋„คํŠธ์›Œํฌ๋“ค์„ ํ•œ ๋ฒˆ์— ํ•™์Šตํ•จ์œผ๋กœ์„œ, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค ๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ–๋Š”๋‹ค. ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•๊ณผ ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™”์— ๋”ํ•˜์—ฌ, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋ช…์•”๋น„์™€ ํ•ด์ƒ๋„๋ฅผ ํ•œ ๋ฒˆ์— ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋”ฅ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์•ž์—์„œ ์ œ์•ˆ๋œ ๋‘ ๋„คํŠธ์›Œํฌ๋“ค์€ ๊ฐ๊ฐ ๋ช…์•”๋น„์™€ ํ•ด์ƒ๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ํ•˜์ง€๋งŒ, ๊ทธ๋“ค์€ ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ํ•œ ๋ฒˆ์— ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„๋””์˜ค ๊ณ ํ•ด์ƒํ™”๋ฅผ ์œ„ํ•ด ์ œ์•ˆํ•œ ๋„คํŠธ์›Œํฌ์™€ ๊ฐ™์€ ๊ตฌ์กฐ์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ์ด์šฉํ•˜๋ฉฐ, ๋” ๋†’์€ ๋ช…์•”๋น„์™€ ํ•ด์ƒ๋„๋ฅผ ๊ฐ–๋Š” ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•๊ณผ ๋น„๋””์˜ค ๊ณ ํ•ด์ƒํ™”๋ฅผ ์œ„ํ•œ ๋„คํŠธ์›Œํฌ๋“ค์„ ์กฐํ•ฉํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค ์ •์„ฑ์ ์œผ๋กœ ๊ทธ๋ฆฌ๊ณ  ์ •๋Ÿ‰์ ์œผ๋กœ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค.1 Introduction 1 2 Related Work 7 2.1 High Dynamic Range Imaging 7 2.1.1 Rejecting Regions with Motions 7 2.1.2 Alignment Before Merging 8 2.1.3 Patch-based Reconstruction 9 2.1.4 Deep-learning-based Methods 9 2.1.5 Single-Image HDRI 10 2.2 Video Super-resolution 11 2.2.1 Deep Single Image Super-resolution 11 2.2.2 Deep Video Super-resolution 12 3 High Dynamic Range Imaging 13 3.1 Motivation 13 3.2 Proposed Method 14 3.2.1 Overall Pipeline 14 3.2.2 Alignment Network 15 3.2.3 Merging Network 19 3.2.4 Integrated HDR imaging network 20 3.3 Datasets 21 3.3.1 Kalantari Dataset and Ground Truth Aligned Images 21 3.3.2 Preprocessing 21 3.3.3 Patch Generation 22 3.4 Experimental Results 23 3.4.1 Evaluation Metrics 23 3.4.2 Ablation Studies 23 3.4.3 Comparisons with State-of-the-Art Methods 25 3.4.4 Application to the Case of More Numbers of Exposures 29 3.4.5 Pre-processing for other HDR imaging methods 32 4 Video Super-resolution 36 4.1 Motivation 36 4.2 Proposed Method 37 4.2.1 Overall Pipeline 37 4.2.2 Alignment Network 38 4.2.3 Reconstruction Network 40 4.2.4 Integrated VSR network 42 4.3 Experimental Results 42 4.3.1 Dataset 42 4.3.2 Ablation Study 42 4.3.3 Capability of DSBN for alignment 44 4.3.4 Comparisons with State-of-the-Art Methods 45 5 Joint HDR and SR 51 5.1 Proposed Method 51 5.1.1 Feature Blending Network 51 5.1.2 Joint HDR-SR Network 51 5.1.3 Existing VSR Network 52 5.1.4 Existing HDR Network 53 5.2 Experimental Results 53 6 Conclusion 58 Abstract (In Korean) 71Docto

    Alignment-free HDR Deghosting with Semantics Consistent Transformer

    Full text link
    High dynamic range (HDR) imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output. The essence is to leverage the contextual information, including both dynamic and static semantics, for better image generation. Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion. However, there is no research on jointly leveraging the dynamic and static context in a simultaneous manner. To delve into this problem, we propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules in the network. The spatial attention aims to deal with the intra-image correlation to model the dynamic motion, while the channel attention enables the inter-image intertwining to enhance the semantic consistency across frames. Aside from this, we introduce a novel realistic HDR dataset with more variations in foreground objects, environmental factors, and larger motions. Extensive comparisons on both conventional datasets and ours validate the effectiveness of our method, achieving the best trade-off on the performance and the computational cost

    HDRfeat: A Feature-Rich Network for High Dynamic Range Image Reconstruction

    Full text link
    A major challenge for high dynamic range (HDR) image reconstruction from multi-exposed low dynamic range (LDR) images, especially with dynamic scenes, is the extraction and merging of relevant contextual features in order to suppress any ghosting and blurring artifacts from moving objects. To tackle this, in this work we propose a novel network for HDR reconstruction with deep and rich feature extraction layers, including residual attention blocks with sequential channel and spatial attention. For the compression of the rich-features to the HDR domain, a residual feature distillation block (RFDB) based architecture is adopted. In contrast to earlier deep-learning methods for HDR, the above contributions shift focus from merging/compression to feature extraction, the added value of which we demonstrate with ablation experiments. We present qualitative and quantitative comparisons on a public benchmark dataset, showing that our proposed method outperforms the state-of-the-art.Comment: 4 pages, 5 figure

    LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction

    Full text link
    As demands for high-quality videos continue to rise, high-resolution and high-dynamic range (HDR) imaging techniques are drawing attention. To generate an HDR video from low dynamic range (LDR) images, one of the critical steps is the motion compensation between LDR frames, for which most existing works employed the optical flow algorithm. However, these methods suffer from flow estimation errors when saturation or complicated motions exist. In this paper, we propose an end-to-end HDR video composition framework, which aligns LDR frames in the feature space and then merges aligned features into an HDR frame, without relying on pixel-domain optical flow. Specifically, we propose a luminance-based alignment network for HDR (LAN-HDR) consisting of an alignment module and a hallucination module. The alignment module aligns a frame to the adjacent reference by evaluating luminance-based attention, excluding color information. The hallucination module generates sharp details, especially for washed-out areas due to saturation. The aligned and hallucinated features are then blended adaptively to complement each other. Finally, we merge the features to generate a final HDR frame. In training, we adopt a temporal loss, in addition to frame reconstruction losses, to enhance temporal consistency and thus reduce flickering. Extensive experiments demonstrate that our method performs better or comparable to state-of-the-art methods on several benchmarks.Comment: ICCV 202
    • โ€ฆ
    corecore