6,821 research outputs found
Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction
Integration of aerial and ground images has been proved as an efficient
approach to enhance the surface reconstruction in urban environments. However,
as the first step, the feature point matching between aerial and ground images
is remarkably difficult, due to the large differences in viewpoint and
illumination conditions. Previous studies based on geometry-aware image
rectification have alleviated this problem, but the performance and convenience
of this strategy is limited by several flaws, e.g. quadratic image pairs,
segregated extraction of descriptors and occlusions. To address these problems,
we propose a novel approach: leveraging photogrammetric mesh models for
aerial-ground image matching. The methods of this proposed approach have linear
time complexity with regard to the number of images, can explicitly handle low
overlap using multi-view images and can be directly injected into off-the-shelf
structure-from-motion (SfM) and multi-view stereo (MVS) solutions. First,
aerial and ground images are reconstructed separately and initially
co-registered through weak georeferencing data. Second, aerial models are
rendered to the initial ground views, in which the color, depth and normal
images are obtained. Then, the synthesized color images and the corresponding
ground images are matched by comparing the descriptors, filtered by local
geometrical information, and then propagated to the aerial views using depth
images and patch-based matching. Experimental evaluations using various
datasets confirm the superior performance of the proposed methods in
aerial-ground image matching. In addition, incorporation of the existing SfM
and MVS solutions into these methods enables more complete and accurate models
to be directly obtained.Comment: Accepted for publication in ISPRS Journal of Photogrammetry and
Remote Sensin
Light Field Retargeting for Multi-Panel Displays
Light fields preserve angular information which can be retargeted to
multi-panel depth displays. Due to limited aperture size and constrained
spatial-angular sampling of many light field capture systems, the displayed
light fields provide only a narrow viewing zone in which parallax views can be
supported. In addition, multi-panel displays typically have a reduced number of
panels being able to coarsely sample depth content resulting in a layered
appearance of light fields. We propose a light field retargeting technique for
multi-panel displays that enhances the perceived parallax and achieves seamless
transition over different depths and viewing angles. This is accomplished by
slicing the captured light fields according to their depth content, boosting
the parallax, and blending the results across the panels. Displayed views are
synthesized and aligned dynamically according to the position of the viewer.
The proposed technique is outlined, simulated and verified experimentally on a
three-panel aerial display.Comment: 16 Page
Self-Supervised Human Depth Estimation from Monocular Videos
Previous methods on estimating detailed human depth often require supervised
training with `ground truth' depth data. This paper presents a self-supervised
method that can be trained on YouTube videos without known depth, which makes
training data collection simple and improves the generalization of the learned
network. The self-supervised learning is achieved by minimizing a
photo-consistency loss, which is evaluated between a video frame and its
neighboring frames warped according to the estimated depth and the 3D non-rigid
motion of the human body. To solve this non-rigid motion, we first estimate a
rough SMPL model at each video frame and compute the non-rigid body motion
accordingly, which enables self-supervised learning on estimating the shape
details. Experiments demonstrate that our method enjoys better generalization
and performs much better on data in the wild.Comment: Accepted by IEEE Conference on Computer Vision and Patten Recognition
(CVPR), 202
Deep Learning Guided Building Reconstruction from Satellite Imagery-derived Point Clouds
3D urban reconstruction of buildings from remotely sensed imagery has drawn
significant attention during the past two decades. While aerial imagery and
LiDAR provide higher resolution, satellite imagery is cheaper and more
efficient to acquire for large scale need. However, the high, orbital altitude
of satellite observation brings intrinsic challenges, like unpredictable
atmospheric effect, multi view angles, significant radiometric differences due
to the necessary multiple views, diverse land covers and urban structures in a
scene, small base-height ratio or narrow field of view, all of which may
degrade 3D reconstruction quality. To address these major challenges, we
present a reliable and effective approach for building model reconstruction
from the point clouds generated from multi-view satellite images. We utilize
multiple types of primitive shapes to fit the input point cloud. Specifically,
a deep-learning approach is adopted to distinguish the shape of building roofs
in complex and yet noisy scenes. For points that belong to the same roof shape,
a multi-cue, hierarchical RANSAC approach is proposed for efficient and
reliable segmenting and reconstructing the building point cloud. Experimental
results over four selected urban areas (0.34 to 2.04 sq km in size) demonstrate
the proposed method can generate detailed roof structures under noisy data
environments. The average successful rate for building shape recognition is
83.0%, while the overall completeness and correctness are over 70% with
reference to ground truth created from airborne lidar. As the first effort to
address the public need of large scale city model generation, the development
is deployed as open source software
Underwater Stereo using Refraction-free Image Synthesized from Light Field Camera
There is a strong demand on capturing underwater scenes without distortions
caused by refraction. Since a light field camera can capture several light rays
at each point of an image plane from various directions, if geometrically
correct rays are chosen, it is possible to synthesize a refraction-free image.
In this paper, we propose a novel technique to efficiently select such rays to
synthesize a refraction-free image from an underwater image captured by a light
field camera. In addition, we propose a stereo technique to reconstruct 3D
shapes using a pair of our refraction-free images, which are central
projection. In the experiment, we captured several underwater scenes by two
light field cameras, synthesized refraction free images and applied stereo
technique to reconstruct 3D shapes. The results are compared with previous
techniques which are based on approximation, showing the strength of our
method.Comment: Accepted in 2019 IEEE International Conference on Image Processing
(ICIP
Neural Inverse Rendering for General Reflectance Photometric Stereo
We present a novel convolutional neural network architecture for photometric
stereo (Woodham, 1980), a problem of recovering 3D object surface normals from
multiple images observed under varying illuminations. Despite its long history
in computer vision, the problem still shows fundamental challenges for surfaces
with unknown general reflectance properties (BRDFs). Leveraging deep neural
networks to learn complicated reflectance models is promising, but studies in
this direction are very limited due to difficulties in acquiring accurate
ground truth for training and also in designing networks invariant to
permutation of input images. In order to address these challenges, we propose a
physics based unsupervised learning framework where surface normals and BRDFs
are predicted by the network and fed into the rendering equation to synthesize
observed images. The network weights are optimized during testing by minimizing
reconstruction loss between observed and synthesized images. Thus, our learning
process does not require ground truth normals or even pre-training on external
images. Our method is shown to achieve the state-of-the-art performance on a
challenging real-world scene benchmark.Comment: To appear in International Conference on Machine Learning 2018 (ICML
2018). 10 pages + 20 pages (appendices
DeepHuman: 3D Human Reconstruction from a Single Image
We propose DeepHuman, an image-guided volume-to-volume translation CNN for 3D
human reconstruction from a single RGB image. To reduce the ambiguities
associated with the surface geometry reconstruction, even for the
reconstruction of invisible areas, we propose and leverage a dense semantic
representation generated from SMPL model as an additional input. One key
feature of our network is that it fuses different scales of image features into
the 3D space through volumetric feature transformation, which helps to recover
accurate surface geometry. The visible surface details are further refined
through a normal refinement network, which can be concatenated with the volume
generation network using our proposed volumetric normal projection layer. We
also contribute THuman, a 3D real-world human model dataset containing about
7000 models. The network is trained using training data generated from the
dataset. Overall, due to the specific design of our network and the diversity
in our dataset, our method enables 3D human model estimation given only a
single image and outperforms state-of-the-art approaches
A ParaBoost Stereoscopic Image Quality Assessment (PBSIQA) System
The problem of stereoscopic image quality assessment, which finds
applications in 3D visual content delivery such as 3DTV, is investigated in
this work. Specifically, we propose a new ParaBoost (parallel-boosting)
stereoscopic image quality assessment (PBSIQA) system. The system consists of
two stages. In the first stage, various distortions are classified into a few
types, and individual quality scorers targeting at a specific distortion type
are developed. These scorers offer complementary performance in face of a
database consisting of heterogeneous distortion types. In the second stage,
scores from multiple quality scorers are fused to achieve the best overall
performance, where the fuser is designed based on the parallel boosting idea
borrowed from machine learning. Extensive experimental results are conducted to
compare the performance of the proposed PBSIQA system with those of existing
stereo image quality assessment (SIQA) metrics. The developed quality metric
can serve as an objective function to optimize the performance of a 3D content
delivery system
Local Activity-tuned Image Filtering for Noise Removal and Image Smoothing
In this paper, two local activity-tuned filtering frameworks are proposed for
noise removal and image smoothing, where the local activity measurement is
given by the clipped and normalized local variance or standard deviation. The
first framework is a modified anisotropic diffusion for noise removal of
piece-wise smooth image. The second framework is a local activity-tuned
Relative Total Variation (LAT-RTV) method for image smoothing. Both frameworks
employ the division of gradient and the local activity measurement to achieve
noise removal. In addition, to better capture local information, the proposed
LAT-RTV uses the product of gradient and local activity measurement to boost
the performance of image smoothing. Experimental results are presented to
demonstrate the efficiency of the proposed methods on various applications,
including depth image filtering, clip-art compression artifact removal, image
smoothing, and image denoising.Comment: 13 papers, 9 figure
4D Visualization of Dynamic Events from Unconstrained Multi-View Videos
We present a data-driven approach for 4D space-time visualization of dynamic
events from videos captured by hand-held multiple cameras. Key to our approach
is the use of self-supervised neural networks specific to the scene to compose
static and dynamic aspects of an event. Though captured from discrete
viewpoints, this model enables us to move around the space-time of the event
continuously. This model allows us to create virtual cameras that facilitate:
(1) freezing the time and exploring views; (2) freezing a view and moving
through time; and (3) simultaneously changing both time and view. We can also
edit the videos and reveal occluded objects for a given view if it is visible
in any of the other views. We validate our approach on challenging in-the-wild
events captured using up to 15 mobile cameras.Comment: Project Page - http://www.cs.cmu.edu/~aayushb/Open4D
- …