125 research outputs found
GPU Accelerated Color Correction and Frame Warping for Real-time Video Stitching
Traditional image stitching focuses on a single panorama frame without
considering the spatial-temporal consistency in videos. The straightforward
image stitching approach will cause temporal flicking and color inconstancy
when it is applied to the video stitching task. Besides, inaccurate camera
parameters will cause artifacts in the image warping. In this paper, we propose
a real-time system to stitch multiple video sequences into a panoramic video,
which is based on GPU accelerated color correction and frame warping without
accurate camera parameters. We extend the traditional 2D-Matrix (2D-M) color
correction approach and a present spatio-temporal 3D-Matrix (3D-M) color
correction method for the overlap local regions with online color balancing
using a piecewise function on global frames. Furthermore, we use pairwise
homography matrices given by coarse camera calibration for global warping
followed by accurate local warping based on the optical flow. Experimental
results show that our system can generate highquality panorama videos in real
time
Improving Dynamic HDR Imaging with Fusion Transformer
Reconstructing a High Dynamic Range (HDR) image from several Low Dynamic Range (LDR) images with different exposures is a challenging task, especially in the presence of camera and object motion. Though existing models using convolutional neural networks (CNNs) have made great progress, challenges still exist, e.g., ghosting artifacts. Transformers, originating from the field of natural language processing, have shown success in computer vision tasks, due to their ability to address a large receptive field even within a single layer. In this paper, we propose a transformer model for HDR imaging. Our pipeline includes three steps: alignment, fusion, and reconstruction. The key component is the HDR transformer module. Through experiments and ablation studies, we demonstrate that our model outperforms the state-of-the-art by large margins on several popular public datasets
PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas
Achieving an immersive experience enabling users to explore virtual
environments with six degrees of freedom (6DoF) is essential for various
applications such as virtual reality (VR). Wide-baseline panoramas are commonly
used in these applications to reduce network bandwidth and storage
requirements. However, synthesizing novel views from these panoramas remains a
key challenge. Although existing neural radiance field methods can produce
photorealistic views under narrow-baseline and dense image captures, they tend
to overfit the training views when dealing with \emph{wide-baseline} panoramas
due to the difficulty in learning accurate geometry from sparse
views. To address this problem, we propose PanoGRF, Generalizable Spherical
Radiance Fields for Wide-baseline Panoramas, which construct spherical radiance
fields incorporating scene priors. Unlike generalizable radiance
fields trained on perspective images, PanoGRF avoids the information loss from
panorama-to-perspective conversion and directly aggregates geometry and
appearance features of 3D sample points from each panoramic view based on
spherical projection. Moreover, as some regions of the panorama are only
visible from one view while invisible from others under wide baseline settings,
PanoGRF incorporates monocular depth priors into spherical depth
estimation to improve the geometry features. Experimental results on multiple
panoramic datasets demonstrate that PanoGRF significantly outperforms
state-of-the-art generalizable view synthesis methods for wide-baseline
panoramas (e.g., OmniSyn) and perspective images (e.g., IBRNet, NeuRay)
TwinTex: Geometry-aware Texture Generation for Abstracted 3D Architectural Models
Coarse architectural models are often generated at scales ranging from
individual buildings to scenes for downstream applications such as Digital Twin
City, Metaverse, LODs, etc. Such piece-wise planar models can be abstracted as
twins from 3D dense reconstructions. However, these models typically lack
realistic texture relative to the real building or scene, making them
unsuitable for vivid display or direct reference. In this paper, we present
TwinTex, the first automatic texture mapping framework to generate a
photo-realistic texture for a piece-wise planar proxy. Our method addresses
most challenges occurring in such twin texture generation. Specifically, for
each primitive plane, we first select a small set of photos with greedy
heuristics considering photometric quality, perspective quality and facade
texture completeness. Then, different levels of line features (LoLs) are
extracted from the set of selected photos to generate guidance for later steps.
With LoLs, we employ optimization algorithms to align texture with geometry
from local to global. Finally, we fine-tune a diffusion model with a multi-mask
initialization component and a new dataset to inpaint the missing region.
Experimental results on many buildings, indoor scenes and man-made objects of
varying complexity demonstrate the generalization ability of our algorithm. Our
approach surpasses state-of-the-art texture mapping methods in terms of
high-fidelity quality and reaches a human-expert production level with much
less effort. Project page: https://vcc.tech/research/2023/TwinTex.Comment: Accepted to SIGGRAPH ASIA 202
Digital Stack Photography and Its Applications
<p>This work centers on digital stack photography and its applications.</p><p>A stack of images refer, in a broader sense, to an ensemble of</p><p>associated images taken with variation in one or more than one various </p><p>values in one or more parameters in system configuration or setting.</p><p>An image stack captures and contains potentially more information than</p><p>any of the constituent images. Digital stack photography (DST)</p><p>techniques explore the rich information to render a synthesized image</p><p>that oversteps the limitation in a digital camera's capabilities.</p><p>This work considers in particular two basic DST problems, which had</p><p>been challenging, and their applications. One is high-dynamic-range</p><p>(HDR) imaging of non-stationary dynamic scenes, in which the stacked</p><p>images vary in exposure conditions. The other</p><p>is large scale panorama composition from multiple images. In this</p><p>case, the image components are related to each other by the spatial</p><p>relation among the subdomains of the same scene they covered and</p><p>captured jointly. We consider the non-conventional, practical and</p><p>challenge situations where the spatial overlap among the sub-images is</p><p>sparse (S), irregular in geometry and imprecise from the designed</p><p>geometry (I), and the captured data over the overlap zones are noisy</p><p>(N) or lack of features. We refer to these conditions simply as the</p><p>S.I.N. conditions.</p><p>There are common challenging issues with both problems. For example,</p><p>both faced the dominant problem with image alignment for</p><p>seamless and artifact-free image composition. Our solutions to the</p><p>common problems are manifested differently in each of the particular</p><p>problems, as a result of adaption to the specific properties in each</p><p>type of image ensembles. For the exposure stack, existing</p><p>alignment approaches struggled to overcome three main challenges:</p><p>inconsistency in brightness, large displacement in dynamic scene and</p><p>pixel saturation. We exploit solutions in the following three</p><p>aspects. In the first, we introduce a model that addresses and admits</p><p>changes in both geometric configurations and optical conditions, while</p><p>following the traditional optical flow description. Previous models</p><p>treated these two types of changes one or the other, namely, with</p><p>mutual exclusions. Next, we extend the pixel-based optical flow model</p><p>to a patch-based model. There are two-fold advantages. A patch has</p><p>texture and local content that individual pixels fail to present. It</p><p>also renders opportunities for faster processing, such as via</p><p>two-scale or multiple-scale processing. The extended model is then</p><p>solved efficiently with an EM-like algorithm, which is reliable in the</p><p>presence of large displacement. Thirdly, we present a generative</p><p>model for reducing or eliminating typical artifacts as a side effect</p><p>of an inadequate alignment for clipped pixels. A patch-based texture</p><p>synthesis is combined with the patch-based alignment to achieve an</p><p>artifact free result.</p><p>For large-scale panorama composition under the S.I.N. conditions, we</p><p>have developed an effective solution scheme that significantly reduces</p><p>both processing time and artifacts. Previously existing approaches can</p><p>be roughly categorized as either geometry-based composition or feature</p><p>based composition. In the former approach, one relies on precise</p><p>knowledge of the system geometry, by design and/or calibration. It</p><p>works well with a far-away scene, in which case there is only limited</p><p>variation in projective geometry among the sub-images. However, the</p><p>system geometry is not invariant to physical conditions such as</p><p>thermal variation, stress variation and etc.. The composition with</p><p>this approach is typically done in the spatial space. The other</p><p>approach is more robust to geometric and optical conditions. It works</p><p>surprisingly well with feature-rich and stationary scenes, not well</p><p>with the absence of recognizable features. The composition based on</p><p>feature matching is typically done in the spatial gradient domain. In</p><p>short, both approaches are challenged by the S.I.N. conditions. With</p><p>certain snapshot data sets obtained and contributed by Brady et al, </p><p>these methods either fail in composition or render images with</p><p>visually disturbing artifacts. To overcome the S.I.N. conditions, we</p><p>have reconciled these two approaches and made successful and</p><p>complementary use of both priori and approximate information about</p><p>geometric system configuration and the feature information from the</p><p>image data. We also designed and developed a software architecture</p><p>with careful extraction of primitive function modules that can be</p><p>efficiently implemented and executed in parallel. In addition to a</p><p>much faster processing speed, the resulting images are clear and</p><p>sharper at the overlapping zones, without typical ghosting artifacts.</p>Dissertatio
- …