9 research outputs found

    Multiexposure and multifocus image fusion with multidimensional camera shake compensation

    Get PDF
    Multiexposure image fusion algorithms are used for enhancing the perceptual quality of an image captured by sensors of limited dynamic range. This is achieved by rendering a single scene based on multiple images captured at different exposure times. Similarly, multifocus image fusion is used when the limited depth of focus on a selected focus setting of a camera results in parts of an image being out of focus. The solution adopted is to fuse together a number of multifocus images to create an image that is focused throughout. A single algorithm that can perform both multifocus and multiexposure image fusion is proposed. This algorithm is a new approach in which a set of unregistered multiexposure focus images is first registered before being fused to compensate for the possible presence of camera shake. The registration of images is done via identifying matching key-points in constituent images using scale invariant feature transforms. The random sample consensus algorithm is used to identify inliers of SIFT key-points removing outliers that can cause errors in the registration process. Finally, the coherent point drift algorithm is used to register the images, preparing them to be fused in the subsequent fusion stage. For the fusion of images, a new approach based on an improved version of a wavelet-based contourlet transform is used. The experimental results and the detailed analysis presented prove that the proposed algorithm is capable of producing high-dynamic range (HDR) or multifocus images by registering and fusing a set of multiexposure or multifocus images taken in the presence of camera shake. Further,comparison of the performance of the proposed algorithm with a number of state-of-the art algorithms and commercial software packages is provided. In particular, our literature review has revealed that this is one of the first attempts where the compensation of camera shake, a very likely practical problem that can result in HDR image capture using handheld devices, has been addressed as a part of a multifocus and multiexposure image enhancement system. ยฉ 2013 Society of Photo-Optical Instrumentatio Engineers (SPIE)

    ํŠน์ง• ํ˜ผํ•ฉ ๋„คํŠธ์›Œํฌ๋ฅผ ์ด์šฉํ•œ ์˜์ƒ ์ •ํ•ฉ ๊ธฐ๋ฒ•๊ณผ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ• ๋ฐ ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™”์—์„œ์˜ ์‘์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์กฐ๋‚จ์ต.This dissertation presents a deep end-to-end network for high dynamic range (HDR) imaging of dynamic scenes with background and foreground motions. Generating an HDR image from a sequence of multi-exposure images is a challenging process when the images have misalignments by being taken in a dynamic situation. Hence, recent methods first align the multi-exposure images to the reference by using patch matching, optical flow, homography transformation, or attention module before the merging. In this dissertation, a deep network that synthesizes the aligned images as a result of blending the information from multi-exposure images is proposed, because explicitly aligning photos with different exposures is inherently a difficult problem. Specifically, the proposed network generates under/over-exposure images that are structurally aligned to the reference, by blending all the information from the dynamic multi-exposure images. The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing multi-exposure images that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods. Specifically, the proposed alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images. The proposed network is shown to generate the aligned images with a wide range of exposure differences very well and thus can be effectively used for the HDR imaging of dynamic scenes. Moreover, by adding a simple merging network after the alignment network and training the overall system end-to-end, a performance gain compared to the recent state-of-the-art methods is obtained. This dissertation also presents a deep end-to-end network for video super-resolution (VSR) of frames with motions. To reconstruct an HR frame from a sequence of adjacent frames is a challenging process when the images have misalignments. Hence, recent methods first align the adjacent frames to the reference by using optical flow or adding spatial transformer network (STN). In this dissertation, a deep network that synthesizes the aligned frames as a result of blending the information from adjacent frames is proposed, because explicitly aligning frames is inherently a difficult problem. Specifically, the proposed network generates adjacent frames that are structurally aligned to the reference, by blending all the information from the neighbor frames. The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing frames that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods. Specifically, the proposed alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images. The proposed network is shown to generate the aligned frames very well and thus can be effectively used for the VSR. Moreover, by adding a simple reconstruction network after the alignment network and training the overall system end-to-end, A performance gain compared to the recent state-of-the-art methods is obtained. In addition to each HDR imaging and VSR network, this dissertation presents a deep end-to-end network for joint HDR-SR of dynamic scenes with background and foreground motions. The proposed HDR imaging and VSR networks enhace the dynamic range and the resolution of images, respectively. However, they can be enhanced simultaneously by a single network. In this dissertation, the network which has same structure of the proposed VSR network is proposed. The network is shown to reconstruct the final results which have higher dynamic range and resolution. It is compared with several methods designed with existing HDR imaging and VSR networks, and shows both qualitatively and quantitatively better results.๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋ฐฐ๊ฒฝ ๋ฐ ์ „๊ฒฝ์˜ ์›€์ง์ž„์ด ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•์„ ์œ„ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์›€์ง์ž„์ด ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ์ดฌ์˜๋œ ๋…ธ์ถœ์ด ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ์˜ ์ƒ๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ์ž‘์—…์ด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์—, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ์ด๋ฏธ์ง€๋“ค์„ ํ•ฉ์„ฑํ•˜๊ธฐ ์ „์— ํŒจ์น˜ ๋งค์นญ, ์˜ตํ‹ฐ์ปฌ ํ”Œ๋กœ์šฐ, ํ˜ธ๋ชจ๊ทธ๋ž˜ํ”ผ ๋ณ€ํ™˜ ๋“ฑ์„ ์ด์šฉํ•˜์—ฌ ๊ทธ ์ด๋ฏธ์ง€๋“ค์„ ๋จผ์ € ์ •๋ ฌํ•œ๋‹ค. ์‹ค์ œ๋กœ ๋…ธ์ถœ ์ •๋„๊ฐ€ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€๋“ค์„ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์€ ์•„์ฃผ ์–ด๋ ค์šด ์ž‘์—…์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€๋“ค๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋ฅผ ์„ž์–ด์„œ ์ •๋ ฌ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ๋” ๋ฐ๊ฒŒ ํ˜น์€ ์–ด๋‘ก๊ฒŒ ์ดฌ์˜๋œ ์ด๋ฏธ์ง€๋“ค์„ ์ค‘๊ฐ„ ๋ฐ๊ธฐ๋กœ ์ดฌ์˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค. ์ฃผ์š”ํ•œ ์•„์ด๋””์–ด๋Š” ์ •๋ ฌ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•  ๋•Œ ํŠน์ง• ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, ์ด๋Š” ํ”ฝ์…€ ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๊ฑฐ๋‚˜ ๊ธฐํ•˜ํ•™์  ๋ณ€ํ™˜์„ ์ด์šฉํ•  ๋•Œ ๋ณด๋‹ค ๋” ์ข‹์€ ์ •๋ ฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ–๋Š”๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ์ •๋ ฌ ๋„คํŠธ์›Œํฌ๋Š” ๋‘ ๊ฐˆ๋ž˜์˜ ์ธ์ฝ”๋”์™€ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค ๊ทธ๋ฆฌ๊ณ  ๋””์ฝ”๋”๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ธ์ฝ”๋”๋“ค์€ ๋‘ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค์ด ์ด ํŠน์ง•๋“ค์„ ์„ž๋Š”๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋””์ฝ”๋”์—์„œ ์ •๋ ฌ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ๋…ธ์ถœ ์ •๋„๊ฐ€ ํฌ๊ฒŒ ์ฐจ์ด๋‚˜๋Š” ์˜์ƒ์—์„œ๋„ ์ž˜ ์ž‘๋™ํ•œ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๊ฐ„๋‹จํ•œ ๋ณ‘ํ•ฉ ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ „์ฒด ๋„คํŠธ์›Œํฌ๋“ค์„ ํ•œ ๋ฒˆ์— ํ•™์Šตํ•จ์œผ๋กœ์„œ, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค ๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ–๋Š”๋‹ค. ๋˜ํ•œ, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋™์˜์ƒ ๋‚ด ํ”„๋ ˆ์ž„๋“ค์„ ์ด์šฉํ•˜๋Š” ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™” ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋™์˜์ƒ ๋‚ด ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค ์‚ฌ์ด์—๋Š” ์›€์ง์ž„์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ณ  ํ•ด์ƒ๋„์˜ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์•„์ฃผ ์–ด๋ ค์šด ์ž‘์—…์ด๋‹ค. ๋”ฐ๋ผ์„œ, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์€ ์ด ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค์„ ์ •๋ ฌํ•˜๊ธฐ ์œ„ํ•ด ์˜ตํ‹ฐ์ปฌ ํ”Œ๋กœ์šฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ฑฐ๋‚˜ STN์„ ์ถ”๊ฐ€ํ•œ๋‹ค. ์›€์ง์ž„์ด ์กด์žฌํ•˜๋Š” ํ”„๋ ˆ์ž„๋“ค์„ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ๊ณผ์ •์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋ฅผ ์„ž์–ด์„œ ์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ์ด์›ƒํ•œ ํ”„๋ ˆ์ž„๋“ค์„ ๋ชฉํ‘œ ํ”„๋ ˆ์ž„์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ฃผ์š” ์•„์ด๋””์–ด๋Š” ์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑํ•  ๋•Œ ํŠน์ง• ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ํ”ฝ์…€ ๋„๋ฉ”์ธ์—์„œ ํ•ฉ์„ฑํ•˜๊ฑฐ๋‚˜ ๊ธฐํ•˜ํ•™์  ๋ณ€ํ™˜์„ ์ด์šฉํ•  ๋•Œ ๋ณด๋‹ค ๋” ์ข‹์€ ์ •๋ ฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ–๋Š”๋‹ค. ํŠนํžˆ, ์ œ์•ˆํ•˜๋Š” ์ •๋ ฌ ๋„คํŠธ์›Œํฌ๋Š” ๋‘ ๊ฐˆ๋ž˜์˜ ์ธ์ฝ”๋”์™€ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค ๊ทธ๋ฆฌ๊ณ  ๋””์ฝ”๋”๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ธ์ฝ”๋”๋“ค์€ ๋‘ ์ž…๋ ฅ ํ”„๋ ˆ์ž„์œผ๋กœ๋ถ€ํ„ฐ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค์ด ์ด ํŠน์ง•๋“ค์„ ์„ž๋Š”๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋””์ฝ”๋”์—์„œ ์ •๋ ฌ๋œ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ์ธ์ ‘ํ•œ ํ”„๋ ˆ์ž„๋“ค์„ ์ž˜ ์ •๋ ฌํ•˜๋ฉฐ, ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™”์— ํšจ๊ณผ์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ๋ณ‘ํ•ฉ ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ „์ฒด ๋„คํŠธ์›Œํฌ๋“ค์„ ํ•œ ๋ฒˆ์— ํ•™์Šตํ•จ์œผ๋กœ์„œ, ์ตœ๊ทผ์— ์ œ์•ˆ๋œ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค ๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ–๋Š”๋‹ค. ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•๊ณผ ๋น„๋””์˜ค ๊ณ  ํ•ด์ƒํ™”์— ๋”ํ•˜์—ฌ, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ๋ช…์•”๋น„์™€ ํ•ด์ƒ๋„๋ฅผ ํ•œ ๋ฒˆ์— ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋”ฅ ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์•ž์—์„œ ์ œ์•ˆ๋œ ๋‘ ๋„คํŠธ์›Œํฌ๋“ค์€ ๊ฐ๊ฐ ๋ช…์•”๋น„์™€ ํ•ด์ƒ๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ํ•˜์ง€๋งŒ, ๊ทธ๋“ค์€ ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ํ•œ ๋ฒˆ์— ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„๋””์˜ค ๊ณ ํ•ด์ƒํ™”๋ฅผ ์œ„ํ•ด ์ œ์•ˆํ•œ ๋„คํŠธ์›Œํฌ์™€ ๊ฐ™์€ ๊ตฌ์กฐ์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ์ด์šฉํ•˜๋ฉฐ, ๋” ๋†’์€ ๋ช…์•”๋น„์™€ ํ•ด์ƒ๋„๋ฅผ ๊ฐ–๋Š” ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ ๊ณ  ๋ช…์•”๋น„ ์˜์ƒ๋ฒ•๊ณผ ๋น„๋””์˜ค ๊ณ ํ•ด์ƒํ™”๋ฅผ ์œ„ํ•œ ๋„คํŠธ์›Œํฌ๋“ค์„ ์กฐํ•ฉํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค ์ •์„ฑ์ ์œผ๋กœ ๊ทธ๋ฆฌ๊ณ  ์ •๋Ÿ‰์ ์œผ๋กœ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค.1 Introduction 1 2 Related Work 7 2.1 High Dynamic Range Imaging 7 2.1.1 Rejecting Regions with Motions 7 2.1.2 Alignment Before Merging 8 2.1.3 Patch-based Reconstruction 9 2.1.4 Deep-learning-based Methods 9 2.1.5 Single-Image HDRI 10 2.2 Video Super-resolution 11 2.2.1 Deep Single Image Super-resolution 11 2.2.2 Deep Video Super-resolution 12 3 High Dynamic Range Imaging 13 3.1 Motivation 13 3.2 Proposed Method 14 3.2.1 Overall Pipeline 14 3.2.2 Alignment Network 15 3.2.3 Merging Network 19 3.2.4 Integrated HDR imaging network 20 3.3 Datasets 21 3.3.1 Kalantari Dataset and Ground Truth Aligned Images 21 3.3.2 Preprocessing 21 3.3.3 Patch Generation 22 3.4 Experimental Results 23 3.4.1 Evaluation Metrics 23 3.4.2 Ablation Studies 23 3.4.3 Comparisons with State-of-the-Art Methods 25 3.4.4 Application to the Case of More Numbers of Exposures 29 3.4.5 Pre-processing for other HDR imaging methods 32 4 Video Super-resolution 36 4.1 Motivation 36 4.2 Proposed Method 37 4.2.1 Overall Pipeline 37 4.2.2 Alignment Network 38 4.2.3 Reconstruction Network 40 4.2.4 Integrated VSR network 42 4.3 Experimental Results 42 4.3.1 Dataset 42 4.3.2 Ablation Study 42 4.3.3 Capability of DSBN for alignment 44 4.3.4 Comparisons with State-of-the-Art Methods 45 5 Joint HDR and SR 51 5.1 Proposed Method 51 5.1.1 Feature Blending Network 51 5.1.2 Joint HDR-SR Network 51 5.1.3 Existing VSR Network 52 5.1.4 Existing HDR Network 53 5.2 Experimental Results 53 6 Conclusion 58 Abstract (In Korean) 71Docto

    YDA gรถrรผntรผ gรถlgeleme gidermede geliลŸmiลŸlik seviyesi ve YDA gรถrรผntรผler iรงin nesnel bir gรถlgeleme giderme kalite metriฤŸi.

    Get PDF
    Despite the emergence of new HDR acquisition methods, the multiple exposure technique (MET) is still the most popular one. The application of MET on dynamic scenes is a challenging task due to the diversity of motion patterns and uncontrollable factors such as sensor noise, scene occlusion and performance concerns on some platforms with limited computational capability. Currently, there are already more than 50 deghosting algorithms proposed for artifact-free HDR imaging of dynamic scenes and it is expected that this number will grow in the future. Due to the large number of algorithms, it is a difficult and time-consuming task to conduct subjective experiments for benchmarking recently proposed algorithms. In this thesis, first, a taxonomy of HDR deghosting methods and the key characteristics of each group of algorithms are introduced. Next, the potential artifacts which are observed frequently in the outputs of HDR deghosting algorithms are defined and an objective HDR image deghosting quality metric is presented. It is found that the proposed metric is well correlated with the human preferences and it may be used as a reference for benchmarking current and future HDR image deghosting algorithmsPh.D. - Doctoral Progra

    Algorithms for the enhancement of dynamic range and colour constancy of digital images & video

    Get PDF
    One of the main objectives in digital imaging is to mimic the capabilities of the human eye, and perhaps, go beyond in certain aspects. However, the human visual system is so versatile, complex, and only partially understood that no up-to-date imaging technology has been able to accurately reproduce the capabilities of the it. The extraordinary capabilities of the human eye have become a crucial shortcoming in digital imaging, since digital photography, video recording, and computer vision applications have continued to demand more realistic and accurate imaging reproduction and analytic capabilities. Over decades, researchers have tried to solve the colour constancy problem, as well as extending the dynamic range of digital imaging devices by proposing a number of algorithms and instrumentation approaches. Nevertheless, no unique solution has been identified; this is partially due to the wide range of computer vision applications that require colour constancy and high dynamic range imaging, and the complexity of the human visual system to achieve effective colour constancy and dynamic range capabilities. The aim of the research presented in this thesis is to enhance the overall image quality within an image signal processor of digital cameras by achieving colour constancy and extending dynamic range capabilities. This is achieved by developing a set of advanced image-processing algorithms that are robust to a number of practical challenges and feasible to be implemented within an image signal processor used in consumer electronics imaging devises. The experiments conducted in this research show that the proposed algorithms supersede state-of-the-art methods in the fields of dynamic range and colour constancy. Moreover, this unique set of image processing algorithms show that if they are used within an image signal processor, they enable digital camera devices to mimic the human visual system s dynamic range and colour constancy capabilities; the ultimate goal of any state-of-the-art technique, or commercial imaging device

    Variational image fusion

    Get PDF
    The main goal of this work is the fusion of multiple images to a single composite that offers more information than the individual input images. We approach those fusion tasks within a variational framework. First, we present iterative schemes that are well-suited for such variational problems and related tasks. They lead to efficient algorithms that are simple to implement and well-parallelisable. Next, we design a general fusion technique that aims for an image with optimal local contrast. This is the key for a versatile method that performs well in many application areas such as multispectral imaging, decolourisation, and exposure fusion. To handle motion within an exposure set, we present the following two-step approach: First, we introduce the complete rank transform to design an optic flow approach that is robust against severe illumination changes. Second, we eliminate remaining misalignments by means of brightness transfer functions that relate the brightness values between frames. Additional knowledge about the exposure set enables us to propose the first fully coupled method that jointly computes an aligned high dynamic range image and dense displacement fields. Finally, we present a technique that infers depth information from differently focused images. In this context, we additionally introduce a novel second order regulariser that adapts to the image structure in an anisotropic way.Das Hauptziel dieser Arbeit ist die Fusion mehrerer Bilder zu einem Einzelbild, das mehr Informationen bietet als die einzelnen Eingangsbilder. Wir verwirklichen diese Fusionsaufgaben in einem variationellen Rahmen. Zunรคchst prรคsentieren wir iterative Schemata, die sich gut fรผr solche variationellen Probleme und verwandte Aufgaben eignen. Danach entwerfen wir eine Fusionstechnik, die ein Bild mit optimalem lokalen Kontrast anstrebt. Dies ist der Schlรผssel fรผr eine vielseitige Methode, die gute Ergebnisse fรผr zahlreiche Anwendungsbereiche wie Multispektralaufnahmen, Bildentfรคrbung oder Belichtungsreihenfusion liefert. Um Bewegungen in einer Belichtungsreihe zu handhaben, prรคsentieren wir folgenden Zweischrittansatz: Zuerst stellen wir die komplette Rangtransformation vor, um eine optische Flussmethode zu entwerfen, die robust gegenรผber starken Beleuchtungsรคnderungen ist. Dann eliminieren wir verbleibende Registrierungsfehler mit der Helligkeitstransferfunktion, welche die Helligkeitswerte zwischen Bildern in Beziehung setzt. Zusรคtzliches Wissen รผber die Belichtungsreihe ermรถglicht uns, die erste vollstรคndig gekoppelte Methode vorzustellen, die gemeinsam ein registriertes Hochkontrastbild sowie dichte Bewegungsfelder berechnet. Final prรคsentieren wir eine Technik, die von unterschiedlich fokussierten Bildern Tiefeninformation ableitet. In diesem Kontext stellen wir zusรคtzlich einen neuen Regularisierer zweiter Ordnung vor, der sich der Bildstruktur anisotrop anpasst

    Variational image fusion

    Get PDF
    The main goal of this work is the fusion of multiple images to a single composite that offers more information than the individual input images. We approach those fusion tasks within a variational framework. First, we present iterative schemes that are well-suited for such variational problems and related tasks. They lead to efficient algorithms that are simple to implement and well-parallelisable. Next, we design a general fusion technique that aims for an image with optimal local contrast. This is the key for a versatile method that performs well in many application areas such as multispectral imaging, decolourisation, and exposure fusion. To handle motion within an exposure set, we present the following two-step approach: First, we introduce the complete rank transform to design an optic flow approach that is robust against severe illumination changes. Second, we eliminate remaining misalignments by means of brightness transfer functions that relate the brightness values between frames. Additional knowledge about the exposure set enables us to propose the first fully coupled method that jointly computes an aligned high dynamic range image and dense displacement fields. Finally, we present a technique that infers depth information from differently focused images. In this context, we additionally introduce a novel second order regulariser that adapts to the image structure in an anisotropic way.Das Hauptziel dieser Arbeit ist die Fusion mehrerer Bilder zu einem Einzelbild, das mehr Informationen bietet als die einzelnen Eingangsbilder. Wir verwirklichen diese Fusionsaufgaben in einem variationellen Rahmen. Zunรคchst prรคsentieren wir iterative Schemata, die sich gut fรผr solche variationellen Probleme und verwandte Aufgaben eignen. Danach entwerfen wir eine Fusionstechnik, die ein Bild mit optimalem lokalen Kontrast anstrebt. Dies ist der Schlรผssel fรผr eine vielseitige Methode, die gute Ergebnisse fรผr zahlreiche Anwendungsbereiche wie Multispektralaufnahmen, Bildentfรคrbung oder Belichtungsreihenfusion liefert. Um Bewegungen in einer Belichtungsreihe zu handhaben, prรคsentieren wir folgenden Zweischrittansatz: Zuerst stellen wir die komplette Rangtransformation vor, um eine optische Flussmethode zu entwerfen, die robust gegenรผber starken Beleuchtungsรคnderungen ist. Dann eliminieren wir verbleibende Registrierungsfehler mit der Helligkeitstransferfunktion, welche die Helligkeitswerte zwischen Bildern in Beziehung setzt. Zusรคtzliches Wissen รผber die Belichtungsreihe ermรถglicht uns, die erste vollstรคndig gekoppelte Methode vorzustellen, die gemeinsam ein registriertes Hochkontrastbild sowie dichte Bewegungsfelder berechnet. Final prรคsentieren wir eine Technik, die von unterschiedlich fokussierten Bildern Tiefeninformation ableitet. In diesem Kontext stellen wir zusรคtzlich einen neuen Regularisierer zweiter Ordnung vor, der sich der Bildstruktur anisotrop anpasst

    Advanced editing methods for image and video sequences

    Get PDF
    In the context of image and video editing, this thesis proposes methods for modifying the semantic content of a recorded scene. Two different editing problems are approached: First, the removal of ghosting artifacts from high dynamic range (HDR) images recovered from exposure sequences, and second, the removal of objects from video sequences recorded with and without camera motion. These editings need to be performed in a way that the result looks plausible to humans, but without having to recover detailed models about the content of the scene, e.g. its geometry, reflectance, or illumination. The proposed editing methods add new key ingredients, such as camera noise models and global optimization frameworks, that help achieving results that surpass the capabilities of state-of-the-art methods. Using these ingredients, each proposed method defines local visual properties that approximate well the specific editing requirements of each task. These properties are then encoded into a energy function that, when globally minimized, produces the required editing results. The optimization of such energy functions corresponds to Bayesian inference problems that are solved efficiently using graph cuts. The proposed methods are demonstrated to outperform other state-ofthe-art methods. Furthermore, they are demonstrated to work well on complex real-world scenarios that have not been previously addressed in the literature, i.e., highly cluttered scenes for HDR deghosting, and highly dynamic scenes and unconstraint camera motion for object removal from videos.Diese Arbeit schlรคgt Methoden zur ร„nderung des semantischen Inhalts einer aufgenommenen Szene im Kontext der Bild-und Videobearbeitung vor. Zwei unterschiedliche Bearbeitungsmethoden werden angesprochen: Erstens, das Entfernen von Ghosting Artifacts (Geist-รคhnliche Artefakte) aus High Dynamic Range (HDR) Bildern welche von Belichtungsreihen erstellt wurden und zweitens, das Entfernen von Objekten aus Videosequenzen mit und ohne Kamerabewegung. Das Bearbeiten muss in einer Weise durchgefรผhrt werden, dass das Ergebnis fรผr den Menschen plausibel aussieht, aber ohne das detaillierte Modelle des Szeneninhalts rekonstruiert werden mรผssen, z.B. die Geometrie, das Reflexionsverhalten, oder Beleuchtungseigenschaften. Die vorgeschlagenen Bearbeitungsmethoden beinhalten neuartige Elemente, etwa Kameralรคrm-Modelle und globale Optimierungs-Systeme, mit deren Hilfe es mรถglich ist die Eigenschaften der modernsten existierenden Methoden zu รผbertreffen. Mit Hilfe dieser Elemente definieren die vorgeschlagenen Methoden lokale visuelle Eigenschaften welche die beschriebenen Bearbeitungsmethoden gut annรคhern. Diese Eigenschaften werden dann als Energiefunktion codiert, welche, nach globalem minimieren, die gewรผnschten Bearbeitung liefert. Die Optimierung solcher Energiefunktionen entspricht dem Bayesโ€™schen Inferenz Modell welches effizient mittels Graph-Cut Algorithmen gelรถst werden kann. Es wird gezeigt, dass die vorgeschlagenen Methoden den heutigen Stand der Technik รผbertreffen. Darรผber hinaus sind sie nachweislich gut auf komplexe natรผrliche Szenarien anwendbar, welche in der existierenden Literatur bisher noch nicht angegangen wurden, d.h. sehr unรผbersichtliche Szenen fรผr HDR Deghosting und sehr dynamische Szenen und unbeschrรคnkte Kamerabewegungen fรผr das Entfernen von Objekten aus Videosequenzen
    corecore