7,259 research outputs found
νΉμ§ νΌν© λ€νΈμν¬λ₯Ό μ΄μ©ν μμ μ ν© κΈ°λ²κ³Ό κ³ λͺ μλΉ μμλ² λ° λΉλμ€ κ³ ν΄μνμμμ μμ©
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2020. 8. μ‘°λ¨μ΅.This dissertation presents a deep end-to-end network for high dynamic range (HDR) imaging of dynamic scenes with background and foreground motions. Generating an HDR image from a sequence of multi-exposure images is a challenging process when the images have misalignments by being taken in a dynamic situation. Hence, recent methods first align the multi-exposure images to the reference by using patch matching, optical flow, homography transformation, or attention module before the merging. In this dissertation, a deep network that synthesizes the aligned images as a result of blending the information from multi-exposure images is proposed, because explicitly aligning photos with different exposures is inherently a difficult problem. Specifically, the proposed network generates under/over-exposure images that are structurally aligned to the reference, by blending all the information from the dynamic multi-exposure images. The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing multi-exposure images that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods. Specifically, the proposed alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images. The proposed network is shown to generate the aligned images with a wide range of exposure differences very well and thus can be effectively used for the HDR imaging of dynamic scenes. Moreover, by adding a simple merging network after the alignment network and training the overall system end-to-end, a performance gain compared to the recent state-of-the-art methods is obtained.
This dissertation also presents a deep end-to-end network for video super-resolution (VSR) of frames with motions. To reconstruct an HR frame from a sequence of adjacent frames is a challenging process when the images have misalignments. Hence, recent methods first align the adjacent frames to the reference by using optical flow or adding spatial transformer network (STN). In this dissertation, a deep network that synthesizes the aligned frames as a result of blending the information from adjacent frames is proposed, because explicitly aligning frames is inherently a difficult problem. Specifically, the proposed network generates adjacent frames that are structurally aligned to the reference, by blending all the information from the neighbor frames. The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing frames that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods. Specifically, the proposed alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images. The proposed network is shown to generate the aligned frames very well and thus can be effectively used for the VSR. Moreover, by adding a simple reconstruction network after the alignment network and training the overall system end-to-end, A performance gain compared to the recent state-of-the-art methods is obtained.
In addition to each HDR imaging and VSR network, this dissertation presents a deep end-to-end network for joint HDR-SR of dynamic scenes with background and foreground motions. The proposed HDR imaging and VSR networks enhace the dynamic range and the resolution of images, respectively. However, they can be enhanced simultaneously by a single network. In this dissertation, the network which has same structure of the proposed VSR network is proposed. The network is shown to reconstruct the final results which have higher dynamic range and resolution. It is compared with several methods designed with existing HDR imaging and VSR networks, and shows both qualitatively and quantitatively better results.λ³Έ νμλ
Όλ¬Έμ λ°°κ²½ λ° μ κ²½μ μμ§μμ΄ μλ μν©μμ κ³ λͺ
μλΉ μμλ²μ μν λ₯ λ¬λ λ€νΈμν¬λ₯Ό μ μνλ€. μμ§μμ΄ μλ μν©μμ 촬μλ λ
ΈμΆμ΄ λ€λ₯Έ μ¬λ¬ μ μλ€μ μ΄μ©νμ¬ κ³ λͺ
μλΉ μμμ μμ±νλ κ²μ λ§€μ° μ΄λ €μ΄ μμ
μ΄λ€.
κ·Έλ κΈ° λλ¬Έμ, μ΅κ·Όμ μ μλ λ°©λ²λ€μ μ΄λ―Έμ§λ€μ ν©μ±νκΈ° μ μ ν¨μΉ 맀μΉ, μ΅ν°μ»¬ νλ‘μ°, νΈλͺ¨κ·ΈλνΌ λ³ν λ±μ μ΄μ©νμ¬ κ·Έ μ΄λ―Έμ§λ€μ λ¨Όμ μ λ ¬νλ€. μ€μ λ‘ λ
ΈμΆ μ λκ° λ€λ₯Έ μ¬λ¬ μ΄λ―Έμ§λ€μ μ λ ¬νλ κ²μ μμ£Ό μ΄λ €μ΄ μμ
μ΄κΈ° λλ¬Έμ, μ΄ λ
Όλ¬Έμμλ μ¬λ¬ μ΄λ―Έμ§λ€λ‘λΆν° μ»μ μ 보λ₯Ό μμ΄μ μ λ ¬λ μ΄λ―Έμ§λ₯Ό ν©μ±νλ λ€νΈμν¬λ₯Ό μ μνλ€. νΉν, μ μνλ λ€νΈμν¬λ λ λ°κ² νΉμ μ΄λ‘κ² μ΄¬μλ μ΄λ―Έμ§λ€μ μ€κ° λ°κΈ°λ‘ 촬μλ μ΄λ―Έμ§λ₯Ό κΈ°μ€μΌλ‘ μ λ ¬νλ€. μ£Όμν μμ΄λμ΄λ μ λ ¬λ μ΄λ―Έμ§λ₯Ό ν©μ±ν λ νΉμ§ λλ©μΈμμ ν©μ±νλ κ²μ΄λ©°, μ΄λ ν½μ
λλ©μΈμμ ν©μ±νκ±°λ κΈ°ννμ λ³νμ μ΄μ©ν λ λ³΄λ€ λ μ’μ μ λ ¬ κ²°κ³Όλ₯Ό κ°λλ€. νΉν, μ μνλ μ λ ¬ λ€νΈμν¬λ λ κ°λμ μΈμ½λμ 컨볼루μ
λ μ΄μ΄λ€ κ·Έλ¦¬κ³ λμ½λλ‘ μ΄λ£¨μ΄μ Έ μλ€. μΈμ½λλ€μ λ μ
λ ₯ μ΄λ―Έμ§λ‘λΆν° νΉμ§μ μΆμΆνκ³ , 컨볼루μ
λ μ΄μ΄λ€μ΄ μ΄ νΉμ§λ€μ μλλ€. λ§μ§λ§μΌλ‘ λμ½λμμ μ λ ¬λ μ΄λ―Έμ§λ₯Ό μμ±νλ€. μ μνλ λ€νΈμν¬λ κ³ λͺ
μλΉ μμλ²μμ μ¬μ©λ μ μλλ‘ λ
ΈμΆ μ λκ° ν¬κ² μ°¨μ΄λλ μμμμλ μ μλνλ€. κ²λ€κ°, κ°λ¨ν λ³ν© λ€νΈμν¬λ₯Ό μΆκ°νκ³ μ 체 λ€νΈμν¬λ€μ ν λ²μ νμ΅ν¨μΌλ‘μ, μ΅κ·Όμ μ μλ λ°©λ²λ€ λ³΄λ€ λ μ’μ μ±λ₯μ κ°λλ€.
λν, λ³Έ νμλ
Όλ¬Έμ λμμ λ΄ νλ μλ€μ μ΄μ©νλ λΉλμ€ κ³ ν΄μν λ°©λ²μ μν λ₯ λ¬λ λ€νΈμν¬λ₯Ό μ μνλ€. λμμ λ΄ μΈμ ν νλ μλ€ μ¬μ΄μλ μμ§μμ΄ μ‘΄μ¬νκΈ° λλ¬Έμ, μ΄λ€μ μ΄μ©νμ¬ κ³ ν΄μλμ νλ μμ ν©μ±νλ κ²μ μμ£Ό μ΄λ €μ΄ μμ
μ΄λ€. λ°λΌμ, μ΅κ·Όμ μ μλ λ°©λ²λ€μ μ΄ μΈμ ν νλ μλ€μ μ λ ¬νκΈ° μν΄ μ΅ν°μ»¬ νλ‘μ°λ₯Ό κ³μ°νκ±°λ STNμ μΆκ°νλ€. μμ§μμ΄ μ‘΄μ¬νλ νλ μλ€μ μ λ ¬νλ κ²μ μ΄λ €μ΄ κ³Όμ μ΄κΈ° λλ¬Έμ, μ΄ λ
Όλ¬Έμμλ μΈμ ν νλ μλ€λ‘λΆν° μ»μ μ 보λ₯Ό μμ΄μ μ λ ¬λ νλ μμ ν©μ±νλ λ€νΈμν¬λ₯Ό μ μνλ€. νΉν, μ μνλ λ€νΈμν¬λ μ΄μν νλ μλ€μ λͺ©ν νλ μμ κΈ°μ€μΌλ‘ μ λ ¬νλ€. λ§μ°¬κ°μ§λ‘ μ£Όμ μμ΄λμ΄λ μ λ ¬λ νλ μμ ν©μ±ν λ νΉμ§ λλ©μΈμμ ν©μ±νλ κ²μ΄λ€. μ΄λ ν½μ
λλ©μΈμμ ν©μ±νκ±°λ κΈ°ννμ λ³νμ μ΄μ©ν λ λ³΄λ€ λ μ’μ μ λ ¬ κ²°κ³Όλ₯Ό κ°λλ€. νΉν, μ μνλ μ λ ¬ λ€νΈμν¬λ λ κ°λμ μΈμ½λμ 컨볼루μ
λ μ΄μ΄λ€ κ·Έλ¦¬κ³ λμ½λλ‘ μ΄λ£¨μ΄μ Έ μλ€. μΈμ½λλ€μ λ μ
λ ₯ νλ μμΌλ‘λΆν° νΉμ§μ μΆμΆνκ³ , 컨볼루μ
λ μ΄μ΄λ€μ΄ μ΄ νΉμ§λ€μ μλλ€. λ§μ§λ§μΌλ‘ λμ½λμμ μ λ ¬λ νλ μμ μμ±νλ€. μ μνλ λ€νΈμν¬λ μΈμ ν νλ μλ€μ μ μ λ ¬νλ©°, λΉλμ€ κ³ ν΄μνμ ν¨κ³Όμ μΌλ‘ μ¬μ©λ μ μλ€. κ²λ€κ° λ³ν© λ€νΈμν¬λ₯Ό μΆκ°νκ³ μ 체 λ€νΈμν¬λ€μ ν λ²μ νμ΅ν¨μΌλ‘μ, μ΅κ·Όμ μ μλ μ¬λ¬ λ°©λ²λ€ λ³΄λ€ λ μ’μ μ±λ₯μ κ°λλ€.
κ³ λͺ
μλΉ μμλ²κ³Ό λΉλμ€ κ³ ν΄μνμ λνμ¬, λ³Έ νμλ
Όλ¬Έμ λͺ
μλΉμ ν΄μλλ₯Ό ν λ²μ ν₯μμν€λ λ₯ λ€νΈμν¬λ₯Ό μ μνλ€. μμμ μ μλ λ λ€νΈμν¬λ€μ κ°κ° λͺ
μλΉμ ν΄μλλ₯Ό ν₯μμν¨λ€. νμ§λ§, κ·Έλ€μ νλμ λ€νΈμν¬λ₯Ό ν΅ν΄ ν λ²μ ν₯μλ μ μλ€. μ΄ λ
Όλ¬Έμμλ λΉλμ€ κ³ ν΄μνλ₯Ό μν΄ μ μν λ€νΈμν¬μ κ°μ ꡬ쑰μ λ€νΈμν¬λ₯Ό μ΄μ©νλ©°, λ λμ λͺ
μλΉμ ν΄μλλ₯Ό κ°λ μ΅μ’
κ²°κ³Όλ₯Ό μμ±ν΄λΌ μ μλ€. μ΄ λ°©λ²μ κΈ°μ‘΄μ κ³ λͺ
μλΉ μμλ²κ³Ό λΉλμ€ κ³ ν΄μνλ₯Ό μν λ€νΈμν¬λ€μ μ‘°ν©νλ κ² λ³΄λ€ μ μ±μ μΌλ‘ κ·Έλ¦¬κ³ μ λμ μΌλ‘ λ μ’μ κ²°κ³Όλ₯Ό λ§λ€μ΄ λΈλ€.1 Introduction 1
2 Related Work 7
2.1 High Dynamic Range Imaging 7
2.1.1 Rejecting Regions with Motions 7
2.1.2 Alignment Before Merging 8
2.1.3 Patch-based Reconstruction 9
2.1.4 Deep-learning-based Methods 9
2.1.5 Single-Image HDRI 10
2.2 Video Super-resolution 11
2.2.1 Deep Single Image Super-resolution 11
2.2.2 Deep Video Super-resolution 12
3 High Dynamic Range Imaging 13
3.1 Motivation 13
3.2 Proposed Method 14
3.2.1 Overall Pipeline 14
3.2.2 Alignment Network 15
3.2.3 Merging Network 19
3.2.4 Integrated HDR imaging network 20
3.3 Datasets 21
3.3.1 Kalantari Dataset and Ground Truth Aligned Images 21
3.3.2 Preprocessing 21
3.3.3 Patch Generation 22
3.4 Experimental Results 23
3.4.1 Evaluation Metrics 23
3.4.2 Ablation Studies 23
3.4.3 Comparisons with State-of-the-Art Methods 25
3.4.4 Application to the Case of More Numbers of Exposures 29
3.4.5 Pre-processing for other HDR imaging methods 32
4 Video Super-resolution 36
4.1 Motivation 36
4.2 Proposed Method 37
4.2.1 Overall Pipeline 37
4.2.2 Alignment Network 38
4.2.3 Reconstruction Network 40
4.2.4 Integrated VSR network 42
4.3 Experimental Results 42
4.3.1 Dataset 42
4.3.2 Ablation Study 42
4.3.3 Capability of DSBN for alignment 44
4.3.4 Comparisons with State-of-the-Art Methods 45
5 Joint HDR and SR 51
5.1 Proposed Method 51
5.1.1 Feature Blending Network 51
5.1.2 Joint HDR-SR Network 51
5.1.3 Existing VSR Network 52
5.1.4 Existing HDR Network 53
5.2 Experimental Results 53
6 Conclusion 58
Abstract (In Korean) 71Docto
Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos
Accurately capturing dynamic scenes with wide-ranging motion and light
intensity is crucial for many vision applications. However, acquiring
high-speed high dynamic range (HDR) video is challenging because the camera's
frame rate restricts its dynamic range. Existing methods sacrifice speed to
acquire multi-exposure frames. Yet, misaligned motion in these frames can still
pose complications for HDR fusion algorithms, resulting in artifacts. Instead
of frame-based exposures, we sample the videos using individual pixels at
varying exposures and phase offsets. Implemented on a pixel-wise programmable
image sensor, our sampling pattern simultaneously captures fast motion at a
high dynamic range. We then transform pixel-wise outputs into an HDR video
using end-to-end learned weights from deep neural networks, achieving high
spatiotemporal resolution with minimized motion blurring. We demonstrate
aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under
low-light conditions and against bright backgrounds - both challenging
conditions for conventional cameras. By combining the versatility of pixel-wise
sampling patterns with the strength of deep neural networks at decoding complex
scenes, our method greatly enhances the vision system's adaptability and
performance in dynamic conditions.Comment: 14 pages, 14 figure
Burst Denoising with Kernel Prediction Networks
We present a technique for jointly denoising bursts of images taken from a
handheld camera. In particular, we propose a convolutional neural network
architecture for predicting spatially varying kernels that can both align and
denoise frames, a synthetic data generation approach based on a realistic noise
formation model, and an optimization guided by an annealed loss function to
avoid undesirable local minima. Our model matches or outperforms the
state-of-the-art across a wide range of noise levels on both real and synthetic
data.Comment: To appear in CVPR 2018 (spotlight). Project page:
http://people.eecs.berkeley.edu/~bmild/kpn
mFish Alpha Pilot: Building a Roadmap for Effective Mobile Technology to Sustain Fisheries and Improve Fisher Livelihoods.
In June 2014 at the Our Ocean Conference in Washington, DC, United States Secretary of State John Kerry announced the ambitious goal of ending overfishing by 2020. To support that goal, the Secretary's Office of Global Partnerships launched mFish, a public-private partnership to harness the power of mobile technology to improve fisher livelihoods and increase the sustainability of fisheries around the world. The US Department of State provided a grant to 50in10 to create a pilot of mFish that would allow for the identification of behaviors and incentives that might drive more fishers to adopt novel technology. In May 2015 50in10 and Future of Fish designed a pilot to evaluate how to improve adoption of a new mobile technology platform aimed at improving fisheries data capture and fisher livelihoods. Full report
Towards Predictive Rendering in Virtual Reality
The strive for generating predictive images, i.e., images representing radiometrically correct renditions of reality, has been a longstanding problem in computer graphics. The exactness of such images is extremely important for Virtual Reality applications like Virtual Prototyping, where users need to make decisions impacting large investments based on the simulated images. Unfortunately, generation of predictive imagery is still an unsolved problem due to manifold reasons, especially if real-time restrictions apply. First, existing scenes used for rendering are not modeled accurately enough to create predictive images. Second, even with huge computational efforts existing rendering algorithms are not able to produce radiometrically correct images. Third, current display devices need to convert rendered images into some low-dimensional color space, which prohibits display of radiometrically correct images. Overcoming these limitations is the focus of current state-of-the-art research. This thesis also contributes to this task. First, it briefly introduces the necessary background and identifies the steps required for real-time predictive image generation. Then, existing techniques targeting these steps are presented and their limitations are pointed out. To solve some of the remaining problems, novel techniques are proposed. They cover various steps in the predictive image generation process, ranging from accurate scene modeling over efficient data representation to high-quality, real-time rendering. A special focus of this thesis lays on real-time generation of predictive images using bidirectional texture functions (BTFs), i.e., very accurate representations for spatially varying surface materials. The techniques proposed by this thesis enable efficient handling of BTFs by compressing the huge amount of data contained in this material representation, applying them to geometric surfaces using texture and BTF synthesis techniques, and rendering BTF covered objects in real-time. Further approaches proposed in this thesis target inclusion of real-time global illumination effects or more efficient rendering using novel level-of-detail representations for geometric objects. Finally, this thesis assesses the rendering quality achievable with BTF materials, indicating a significant increase in realism but also confirming the remainder of problems to be solved to achieve truly predictive image generation
- β¦