303 research outputs found
Adorym: A multi-platform generic x-ray image reconstruction framework based on automatic differentiation
We describe and demonstrate an optimization-based x-ray image reconstruction
framework called Adorym. Our framework provides a generic forward model,
allowing one code framework to be used for a wide range of imaging methods
ranging from near-field holography to and fly-scan ptychographic tomography. By
using automatic differentiation for optimization, Adorym has the flexibility to
refine experimental parameters including probe positions, multiple hologram
alignment, and object tilts. It is written with strong support for parallel
processing, allowing large datasets to be processed on high-performance
computing systems. We demonstrate its use on several experimental datasets to
show improved image quality through parameter refinement
FroDO: From Detections to 3D Objects
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner. Key to FroDO is to embed object shapes in a novel learnt shape space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse or dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction
Vision-based Situational Graphs Generating Optimizable 3D Scene Representations
3D scene graphs offer a more efficient representation of the environment by
hierarchically organizing diverse semantic entities and the topological
relationships among them. Fiducial markers, on the other hand, offer a valuable
mechanism for encoding comprehensive information pertaining to environments and
the objects within them. In the context of Visual SLAM (VSLAM), especially when
the reconstructed maps are enriched with practical semantic information, these
markers have the potential to enhance the map by augmenting valuable semantic
information and fostering meaningful connections among the semantic objects. In
this regard, this paper exploits the potential of fiducial markers to
incorporate a VSLAM framework with hierarchical representations that generates
optimizable multi-layered vision-based situational graphs. The framework
comprises a conventional VSLAM system with low-level feature tracking and
mapping capabilities bolstered by the incorporation of a fiducial marker map.
The fiducial markers aid in identifying walls and doors in the environment,
subsequently establishing meaningful associations with high-level entities,
including corridors and rooms. Experimental results are conducted on a
real-world dataset collected using various legged robots and benchmarked
against a Light Detection And Ranging (LiDAR)-based framework (S-Graphs) as the
ground truth. Consequently, our framework not only excels in crafting a richer,
multi-layered hierarchical map of the environment but also shows enhancement in
robot pose accuracy when contrasted with state-of-the-art methodologies.Comment: 7 pages, 6 figures, 2 table
SelfNeRF: Fast Training NeRF for Human from Monocular Self-rotating Video
In this paper, we propose SelfNeRF, an efficient neural radiance field based
novel view synthesis method for human performance. Given monocular
self-rotating videos of human performers, SelfNeRF can train from scratch and
achieve high-fidelity results in about twenty minutes. Some recent works have
utilized the neural radiance field for dynamic human reconstruction. However,
most of these methods need multi-view inputs and require hours of training,
making it still difficult for practical use. To address this challenging
problem, we introduce a surface-relative representation based on
multi-resolution hash encoding that can greatly improve the training speed and
aggregate inter-frame information. Extensive experimental results on several
different datasets demonstrate the effectiveness and efficiency of SelfNeRF to
challenging monocular videos.Comment: Project page: https://ustc3dv.github.io/SelfNeR
FroDO: From Detections to 3D Objects
Object-oriented maps are important for scene understanding since they jointly
capture geometry and semantics, allow individual instantiation and meaningful
reasoning about objects. We introduce FroDO, a method for accurate 3D
reconstruction of object instances from RGB video that infers object location,
pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object
shapes in a novel learnt space that allows seamless switching between sparse
point cloud and dense DeepSDF decoding. Given an input sequence of localized
RGB frames, FroDO first aggregates 2D detections to instantiate a
category-aware 3D bounding box per object. A shape code is regressed using an
encoder network before optimizing shape and pose further under the learnt shape
priors using sparse and dense shape representations. The optimization uses
multi-view geometric, photometric and silhouette losses. We evaluate on
real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view,
multi-view, and multi-object reconstruction.Comment: To be published in CVPR 2020. The first two authors contributed
equall
4D Human Body Capture from Egocentric Video via 3D Scene Grounding
We introduce a novel task of reconstructing a time series of second-person 3D
human body meshes from monocular egocentric videos. The unique viewpoint and
rapid embodied camera motion of egocentric videos raise additional technical
barriers for human body capture. To address those challenges, we propose a
simple yet effective optimization-based approach that leverages 2D observations
of the entire video sequence and human-scene interaction constraint to estimate
second-person human poses, shapes, and global motion that are grounded on the
3D environment captured from the egocentric view. We conduct detailed ablation
studies to validate our design choice. Moreover, we compare our method with the
previous state-of-the-art method on human motion capture from monocular video,
and show that our method estimates more accurate human-body poses and shapes
under the challenging egocentric setting. In addition, we demonstrate that our
approach produces more realistic human-scene interaction
- …