Search CORE

15 research outputs found

Novel View Synthesis from a Single RGBD Image for Indoor Scenes

Author: Hetang Congrui
Wang Yuping
Publication venue
Publication date: 02/11/2023
Field of study

In this paper, we propose an approach for synthesizing novel view images from a single RGBD (Red Green Blue-Depth) input. Novel view synthesis (NVS) is an interesting computer vision task with extensive applications. Methods using multiple images has been well-studied, exemplary ones include training scene-specific Neural Radiance Fields (NeRF), or leveraging multi-view stereo (MVS) and 3D rendering pipelines. However, both are either computationally intensive or non-generalizable across different scenes, limiting their practical value. Conversely, the depth information embedded in RGBD images unlocks 3D potential from a singular view, simplifying NVS. The widespread availability of compact, affordable stereo cameras, and even LiDARs in contemporary devices like smartphones, makes capturing RGBD images more accessible than ever. In our method, we convert an RGBD image into a point cloud and render it from a different viewpoint, then formulate the NVS task into an image translation problem. We leveraged generative adversarial networks to style-transfer the rendered image, achieving a result similar to a photograph taken from the new perspective. We explore both unsupervised learning using CycleGAN and supervised learning with Pix2Pix, and demonstrate the qualitative results. Our method circumvents the limitations of traditional multi-image techniques, holding significant promise for practical, real-time applications in NVS.Comment: 2nd International Conference on Image Processing, Computer Vision and Machine Learning, November 202

arXiv.org e-Print Archive

Coupled Depth Learning

Author: Baig Mohammad Haris
Torresani Lorenzo
Publication venue
Publication date: 09/02/2016
Field of study

In this paper we propose a method for estimating depth from a single image using a coarse to fine approach. We argue that modeling the fine depth details is easier after a coarse depth map has been computed. We express a global (coarse) depth map of an image as a linear combination of a depth basis learned from training examples. The depth basis captures spatial and statistical regularities and reduces the problem of global depth estimation to the task of predicting the input-specific coefficients in the linear combination. This is formulated as a regression problem from a holistic representation of the image. Crucially, the depth basis and the regression function are {\bf coupled} and jointly optimized by our learning scheme. We demonstrate that this results in a significant improvement in accuracy compared to direct regression of depth pixel values or approaches learning the depth basis disjointly from the regression function. The global depth estimate is then used as a guidance by a local refinement method that introduces depth details that were not captured at the global level. Experiments on the NYUv2 and KITTI datasets show that our method outperforms the existing state-of-the-art at a considerably lower computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation

arXiv.org e-Print Archive

Crossref

Simplex-like sequential methods for a class of generalized fractional programs

Author: CAMBINI RICCARDO
CAROSI LAURA
MARTEIN LAURA
Valipour Ezat
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

A sequential method for a class of generalized fractional programming problems is proposed. The considered objective function is the ratio of powers of affine functions and the feasible region is a polyhedron, not necessarily bounded. Theoretical properties of the optimization problem are first established and the maximal domains of pseudoconcavity are characterized. When the objective function is pseudoconcave in the feasible region, the proposed algorithm takes advantage of the nice optimization properties of pseudoconcave functions; the particular structure of the objective function allows to provide a simplex-like algorithm even when the objective function is not pseudoconcave. Computational results validate the nice performance of the proposed algorithm

Archivio della Ricerca - Università di Pisa

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

Author: Hu Yubin
Lin Matthieu
Liu Yong-Jin
Wang Wenping
Wen Yu-Hui
Ye Sheng
Zhao Wang
Publication venue
Publication date: 14/09/2023
Field of study

The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method significantly outperforms existing methods in terms of reconstruction quality

arXiv.org e-Print Archive

A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing

Author: Fan Xiaohong
Feng Yujie
Yang Yin
Zhang Jianping
Zhang Zhengpeng
Publication venue
Publication date: 14/09/2023
Field of study

Remote sensing images are essential for many earth science applications, but their quality can be degraded due to limitations in sensor technology and complex imaging environments. To address this, various remote sensing image deblurring methods have been developed to restore sharp, high-quality images from degraded observational data. However, most traditional model-based deblurring methods usually require predefined hand-craft prior assumptions, which are difficult to handle in complex applications, and most deep learning-based deblurring methods are designed as a black box, lacking transparency and interpretability. In this work, we propose a novel blind deblurring learning framework based on alternating iterations of shrinkage thresholds, alternately updating blurring kernels and images, with the theoretical foundation of network design. Additionally, we propose a learnable blur kernel proximal mapping module to improve the blur kernel evaluation in the kernel domain. Then, we proposed a deep proximal mapping module in the image domain, which combines a generalized shrinkage threshold operator and a multi-scale prior feature extraction block. This module also introduces an attention mechanism to adaptively adjust the prior importance, thus avoiding the drawbacks of hand-crafted image prior terms. Thus, a novel multi-scale generalized shrinkage threshold network (MGSTNet) is designed to specifically focus on learning deep geometric prior features to enhance image restoration. Experiments demonstrate the superiority of our MGSTNet framework on remote sensing image datasets compared to existing deblurring methods.Comment: 12 pages

arXiv.org e-Print Archive

InSocialNet: Interactive visual analytics for role-event videos

Author: Niu Zhibin
Pan Yaohua
Wu Jing
Zhang Jiawan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/12/2019
Field of study

Role–event videos are rich in information but challenging to be understood at the story level. The social roles and behavior patterns of characters largely depend on the interactions among characters and the background events. Understanding them requires analysis of the video contents for a long duration, which is beyond the ability of current algorithms designed for analyzing short-time dynamics. In this paper, we propose InSocialNet, an interactive video analytics tool for analyzing the contents of role–event videos. It automatically and dynamically constructs social networks from role–event videos making use of face and expression recognition, and provides a visual interface for interactive analysis of video contents. Together with social network analysis at the back end, InSocialNet supports users to investigate characters, their relationships, social roles, factions, and events in the input video. We conduct case studies to demonstrate the effectiveness of InSocialNet in assisting the harvest of rich information from role–event videos. We believe the current prototype implementation can be extended to applications beyond movie analysis, e.g., social psychology experiments to help understand crowd social behaviors

Online Research @ Cardiff