15 research outputs found
Novel View Synthesis from a Single RGBD Image for Indoor Scenes
In this paper, we propose an approach for synthesizing novel view images from
a single RGBD (Red Green Blue-Depth) input. Novel view synthesis (NVS) is an
interesting computer vision task with extensive applications. Methods using
multiple images has been well-studied, exemplary ones include training
scene-specific Neural Radiance Fields (NeRF), or leveraging multi-view stereo
(MVS) and 3D rendering pipelines. However, both are either computationally
intensive or non-generalizable across different scenes, limiting their
practical value. Conversely, the depth information embedded in RGBD images
unlocks 3D potential from a singular view, simplifying NVS. The widespread
availability of compact, affordable stereo cameras, and even LiDARs in
contemporary devices like smartphones, makes capturing RGBD images more
accessible than ever. In our method, we convert an RGBD image into a point
cloud and render it from a different viewpoint, then formulate the NVS task
into an image translation problem. We leveraged generative adversarial networks
to style-transfer the rendered image, achieving a result similar to a
photograph taken from the new perspective. We explore both unsupervised
learning using CycleGAN and supervised learning with Pix2Pix, and demonstrate
the qualitative results. Our method circumvents the limitations of traditional
multi-image techniques, holding significant promise for practical, real-time
applications in NVS.Comment: 2nd International Conference on Image Processing, Computer Vision and
Machine Learning, November 202
Coupled Depth Learning
In this paper we propose a method for estimating depth from a single image
using a coarse to fine approach. We argue that modeling the fine depth details
is easier after a coarse depth map has been computed. We express a global
(coarse) depth map of an image as a linear combination of a depth basis learned
from training examples. The depth basis captures spatial and statistical
regularities and reduces the problem of global depth estimation to the task of
predicting the input-specific coefficients in the linear combination. This is
formulated as a regression problem from a holistic representation of the image.
Crucially, the depth basis and the regression function are {\bf coupled} and
jointly optimized by our learning scheme. We demonstrate that this results in a
significant improvement in accuracy compared to direct regression of depth
pixel values or approaches learning the depth basis disjointly from the
regression function. The global depth estimate is then used as a guidance by a
local refinement method that introduces depth details that were not captured at
the global level. Experiments on the NYUv2 and KITTI datasets show that our
method outperforms the existing state-of-the-art at a considerably lower
computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation
Simplex-like sequential methods for a class of generalized fractional programs
A sequential method for a class of generalized fractional programming problems is proposed. The considered objective function is the ratio of powers of affine functions and the feasible region is a polyhedron, not necessarily bounded. Theoretical properties of the optimization problem are first established and the maximal domains of pseudoconcavity are characterized. When the objective function is pseudoconcave in the feasible region, the proposed algorithm takes advantage of the nice optimization properties of pseudoconcave functions; the particular structure of the objective function allows to provide a simplex-like algorithm even when the objective function is not pseudoconcave. Computational results validate the nice performance of the proposed algorithm
Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement
The reconstruction of indoor scenes from multi-view RGB images is challenging
due to the coexistence of flat and texture-less regions alongside delicate and
fine-grained regions. Recent methods leverage neural radiance fields aided by
predicted surface normal priors to recover the scene geometry. These methods
excel in producing complete and smooth results for floor and wall areas.
However, they struggle to capture complex surfaces with high-frequency
structures due to the inadequate neural representation and the inaccurately
predicted normal priors. To improve the capacity of the implicit
representation, we propose a hybrid architecture to represent low-frequency and
high-frequency regions separately. To enhance the normal priors, we introduce a
simple yet effective image sharpening and denoising technique, coupled with a
network that estimates the pixel-wise uncertainty of the predicted surface
normal vectors. Identifying such uncertainty can prevent our model from being
misled by unreliable surface normal supervisions that hinder the accurate
reconstruction of intricate geometries. Experiments on the benchmark datasets
show that our method significantly outperforms existing methods in terms of
reconstruction quality
A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing
Remote sensing images are essential for many earth science applications, but
their quality can be degraded due to limitations in sensor technology and
complex imaging environments. To address this, various remote sensing image
deblurring methods have been developed to restore sharp, high-quality images
from degraded observational data. However, most traditional model-based
deblurring methods usually require predefined hand-craft prior assumptions,
which are difficult to handle in complex applications, and most deep
learning-based deblurring methods are designed as a black box, lacking
transparency and interpretability. In this work, we propose a novel blind
deblurring learning framework based on alternating iterations of shrinkage
thresholds, alternately updating blurring kernels and images, with the
theoretical foundation of network design. Additionally, we propose a learnable
blur kernel proximal mapping module to improve the blur kernel evaluation in
the kernel domain. Then, we proposed a deep proximal mapping module in the
image domain, which combines a generalized shrinkage threshold operator and a
multi-scale prior feature extraction block. This module also introduces an
attention mechanism to adaptively adjust the prior importance, thus avoiding
the drawbacks of hand-crafted image prior terms. Thus, a novel multi-scale
generalized shrinkage threshold network (MGSTNet) is designed to specifically
focus on learning deep geometric prior features to enhance image restoration.
Experiments demonstrate the superiority of our MGSTNet framework on remote
sensing image datasets compared to existing deblurring methods.Comment: 12 pages
InSocialNet: Interactive visual analytics for role-event videos
Role–event videos are rich in information but challenging to be understood at the story level. The social roles and behavior patterns of characters largely depend on the interactions among characters and the background events. Understanding them requires analysis of the video contents for a long duration, which is beyond the ability of current algorithms designed for analyzing short-time dynamics. In this paper, we propose InSocialNet, an interactive video analytics tool for analyzing the contents of role–event videos. It automatically and dynamically constructs social networks from role–event videos making use of face and expression recognition, and provides a visual interface for interactive analysis of video contents. Together with social network analysis at the back end, InSocialNet supports users to investigate characters, their relationships, social roles, factions, and events in the input video. We conduct case studies to demonstrate the effectiveness of InSocialNet in assisting the harvest of rich information from role–event videos. We believe the current prototype implementation can be extended to applications beyond movie analysis, e.g., social psychology experiments to help understand crowd social behaviors