5 research outputs found

    Video Upright Adjustment and Stabilization

    Get PDF
    Upright adjustment, Video stabilization, Camera pathWe propose a novel video upright adjustment method that can reliably correct slanted video contents that are often found in casual videos. Our approach combines deep learning and Bayesian inference to estimate accurate rotation angles from video frames. We train a convolutional neural network to obtain initial estimates of the rotation angles of input video frames. The initial estimates from the network are temporally inconsistent and inaccurate. To resolve this, we use Bayesian inference. We analyze estimation errors of the network, and derive an error model. We then use the error model to formulate video upright adjustment as a maximum a posteriori problem where we estimate consistent rotation angles from the initial estimates, while respecting relative rotations between consecutive frames. Finally, we propose a joint approach to video stabilization and upright adjustment, which minimizes information loss caused by separately handling stabilization and upright adjustment. Experimental results show that our video upright adjustment method can effectively correct slanted video contents, and its combination with video stabilization can achieve visually pleasing results from shaky and slanted videos.openI. INTRODUCTION 1.1. Related work II. ROTATION ESTIMATION NETWORK III. ERROR ANALYSIS IV. VIDEO UPRIGHT ADJUSTMENT 4.1. Initial angle estimation 4.2. Robust angle estimation 4.3. Optimization 4.4. Warping V. JOINT UPRIGHT ADJUSTMENT AND STABILIZATION 5.1. Bundled camera paths for video stabilization 5.2. Joint approach VI. EXPERIMENTS VII. CONCLUSION ReferencesCNN)을 ν›ˆλ ¨μ‹œν‚¨λ‹€. μ‹ κ²½λ§μ˜ 초기 μΆ”μ •μΉ˜λŠ” μ™„μ „νžˆ μ •ν™•ν•˜μ§€ μ•ŠμœΌλ©° μ‹œκ°„μ μœΌλ‘œλ„ μΌκ΄€λ˜μ§€ μ•ŠλŠ”λ‹€. 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λ² μ΄μ§€μ•ˆ 인퍼런슀λ₯Ό μ‚¬μš©ν•œλ‹€. λ³Έ 논문은 μ‹ κ²½λ§μ˜ μΆ”μ • 였λ₯˜λ₯Ό λΆ„μ„ν•˜κ³  였λ₯˜ λͺ¨λΈμ„ λ„μΆœν•œλ‹€. 그런 λ‹€μŒ 였λ₯˜ λͺ¨λΈμ„ μ‚¬μš©ν•˜μ—¬ 연속 ν”„λ ˆμž„ κ°„μ˜ μƒλŒ€ νšŒμ „ 각도(Relative rotation angle)λ₯Ό λ°˜μ˜ν•˜λ©΄μ„œ 초기 μΆ”μ •μΉ˜λ‘œλΆ€ν„° μ‹œκ°„μ μœΌλ‘œ μΌκ΄€λœ νšŒμ „ 각도λ₯Ό μΆ”μ •ν•˜λŠ” μ΅œλŒ€ 사후 문제(Maximum a posteriori problem)둜 λ™μ˜μƒ μˆ˜ν‰ 보정을 κ³΅μ‹ν™”ν•œλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, λ™μ˜μƒ μˆ˜ν‰ 보정 및 λ™μ˜μƒ μ•ˆμ •ν™”(Video stabilization)에 λŒ€ν•œ λ™μ‹œ μ ‘κ·Ό 방법을 μ œμ•ˆν•˜μ—¬ μˆ˜ν‰ 보정과 μ•ˆμ •ν™”λ₯Ό λ³„λ„λ‘œ μˆ˜ν–‰ν•  λ•Œ λ°œμƒν•˜λŠ” 곡간 정보 손싀과 μ—°μ‚°λŸ‰μ„ μ΅œμ†Œν™”ν•˜λ©° μ•ˆμ •ν™”μ˜ μ„±λŠ₯을 μ΅œλŒ€ν™”ν•œλ‹€. μ‹€ν—˜ 결과에 λ”°λ₯΄λ©΄ λ™μ˜μƒ μˆ˜ν‰ λ³΄μ •μœΌλ‘œ κΈ°μšΈμ–΄μ§„ λ™μ˜μƒμ„ 효과적으둜 보정할 수 있으며 λ™μ˜μƒ μ•ˆμ •ν™” 방법과 κ²°ν•©ν•˜μ—¬ 흔듀리고 κΈ°μšΈμ–΄μ§„ λ™μ˜μƒμœΌλ‘œλΆ€ν„° μ‹œκ°μ μœΌλ‘œ 만쑱슀러운 μƒˆλ‘œμš΄ λ™μ˜μƒμ„ νšλ“ν•  수 μžˆλ‹€.λ³Έ 논문은 μΌλ°˜μΈλ“€μ΄ μ΄¬μ˜ν•œ λ™μ˜μƒμ—μ„œ ν”νžˆ λ°œμƒν•˜λŠ” 문제인 κΈ°μšΈμ–΄μ§μ„ μ œκ±°ν•˜μ—¬ μˆ˜ν‰μ΄ μ˜¬λ°”λ₯Έ λ™μ˜μƒμ„ νšλ“ν•  수 있게 ν•˜λŠ” λ™μ˜μƒ μˆ˜ν‰ 보정(Video upright adjustment) 방법을 μ œμ•ˆν•œλ‹€. λ³Έ λ…Όλ¬Έμ˜ μ ‘κ·Ό 방식은 λ”₯ λŸ¬λ‹(Deep learning)κ³Ό λ² μ΄μ§€μ•ˆ 인퍼런슀(Bayesian inference)λ₯Ό κ²°ν•©ν•˜μ—¬ λ™μ˜μƒ ν”„λ ˆμž„(Frame)μ—μ„œ μ •ν™•ν•œ 각도λ₯Ό μΆ”μ •ν•œλ‹€. λ¨Όμ € μž…λ ₯ λ™μ˜μƒ ν”„λ ˆμž„μ˜ νšŒμ „ κ°λ„μ˜ 초기 μΆ”μ •μΉ˜λ₯Ό μ–»κΈ° μœ„ν•΄ νšŒμ„  신경망(Convolutional neural networkMasterdCollectio

    Assembling convolution neural networks for automatic viewing transformation

    Get PDF
    Images taken under different camera poses are rotated or distorted, which leads to poor perception experiences. This paper proposes a new framework to automatically transform the images to the conformable view setting by assembling different convolution neural networks. Specifically, a referential 3D ground plane is firstly derived from the RGB image and a novel projection mapping algorithm is developed to achieve automatic viewing transformation. Extensive experimental results demonstrate that the proposed method outperforms the state-ofthe-art vanishing points based methods by a large margin in terms of accuracy and robustness

    Gradient Domain Methods for Image-based Reconstruction and Rendering

    Get PDF
    This thesis describes new approaches in image-based 3D reconstruction and rendering. In contrast to previous work our algorithms focus on image gradients instead of pixel values which allows us to avoid many of the disadvantages traditional techniques have. A single pixel only carries very local information about the image content. A gradient on the other hand reveals information about the magnitude and the direction in which the image content changes. Our techniques use this additional information to adapt dynamically to the image content. Especially in image regions without strong gradients we can employ more suitable reconstruction models and we can render images with less artifacts. Overall we present more accurate and robust results (both 3D models and renderings) compared to previous methods. First, we present a multi-view stereo algorithm that combines traditional stereo reconstruction and shading based reconstruction models in a single optimization scheme. By defining as gradient based trade off our model removes the need for an explicit regularization and can handle shading information without the need for an explicit albedo model. This effectively combines the strength of both reconstruction approaches and cancels out their weaknesses. Our second method is an image-based rendering technique that directly renders gradients instead of pixels. The final image is then generated by integrating over the rendered gradients. We present a detailed description on how gradients can be moved directly in the image during rendering which allows us to create a fast approximation that improves the quality and speed of the integration step. Our method also handles occlusions and compared to traditional approaches we can achieve better results that are especially robust for scenes with reflective or textureless areas. Finally, we also present a new model for image warping. Here we apply different types of regularization constraints based on the gradients in the image. Especially when used for direct real-time rendering this can handle larger distortions compared to traditional methods that use only a single type of regularization. Overall the results of this thesis show how shifting the focus from image pixels to image gradients can improve various aspects of image-based reconstruction and rendering. Some of the most challenging aspects such as textureless areas in rendering and spatially varying albedo in reconstruction are handled implicitly by our formulations which also leads to more effective algorithms

    Mitigating Distortion to Enable 360Β° Computer Vision

    Get PDF
    For tasks on central-perspective images, convolutional neural networks have been a revolutionary innovation. However, their performance degrades as the amount of geometric image distortion increases. This limitation is particularly evident for 360Β° images. These images capture a 180Β° x 360Β° field of view by replacing the imaging plane with the concept of an imaging sphere. Because there is no isometric mapping from this spherical capture format to a planar image representation, all 360Β° images necessarily suffer from some degree of geometric image distortion, which manifests as local content deformation. This corruptive effect hinders the ability of these groundbreaking computer vision algorithms to enable 360Β° computer vision, resulting in a performance gap between networks applied to central-perspective images and those applied to spherical images. This dissertation seeks to better understand the impact that geometric distortion has on convolutional neural networks and to identify spherical image representations that can mitigate its effect. This work argues that there are three requisite properties of any general solution: distortion-mitigation, transferability, and scalability. Bridging the performance gap requires reducing distortion in the image representation, developing tools to directly apply central-perspective image algorithms to spherical data, and ensuring that these algorithms can efficiently process high resolution spherical images. Drawing insight from the field of cartography, the subdivided regular icosahedron is proposed as a low-distortion alternative to the commonly used equirectangular and cube map spherical image formats. To address the non-Euclidean nature of this representation, a generalization of the standard convolution operation is proposed to map the standard convolutional kernel grid to the structure of any spherical representation. Finally, a new representation is proposed. Derived from the icosahedron, it represents a spherical image as a set of square, oriented, planar pixel grids rendered tangent to the sphere at the center of each face of the icosahedron. These "tangent images" satisfy all three requisite properties, offering a promising, general solution to the spherical image problem.Doctor of Philosoph
    corecore