978 research outputs found
NeTO:Neural Reconstruction of Transparent Objects with Self-Occlusion Aware Refraction-Tracing
We present a novel method, called NeTO, for capturing 3D geometry of solid
transparent objects from 2D images via volume rendering. Reconstructing
transparent objects is a very challenging task, which is ill-suited for
general-purpose reconstruction techniques due to the specular light transport
phenomena. Although existing refraction-tracing based methods, designed
specially for this task, achieve impressive results, they still suffer from
unstable optimization and loss of fine details, since the explicit surface
representation they adopted is difficult to be optimized, and the
self-occlusion problem is ignored for refraction-tracing. In this paper, we
propose to leverage implicit Signed Distance Function (SDF) as surface
representation, and optimize the SDF field via volume rendering with a
self-occlusion aware refractive ray tracing. The implicit representation
enables our method to be capable of reconstructing high-quality reconstruction
even with a limited set of images, and the self-occlusion aware strategy makes
it possible for our method to accurately reconstruct the self-occluded regions.
Experiments show that our method achieves faithful reconstruction results and
outperforms prior works by a large margin. Visit our project page at
\url{https://www.xxlong.site/NeTO/
Perceptually Uniform Construction of Illustrative Textures
Illustrative textures, such as stippling or hatching, were predominantly used
as an alternative to conventional Phong rendering. Recently, the potential of
encoding information on surfaces or maps using different densities has also
been recognized. This has the significant advantage that additional color can
be used as another visual channel and the illustrative textures can then be
overlaid. Effectively, it is thus possible to display multiple information,
such as two different scalar fields on surfaces simultaneously. In previous
work, these textures were manually generated and the choice of density was
unempirically determined. Here, we first want to determine and understand the
perceptual space of illustrative textures. We chose a succession of simplices
with increasing dimensions as primitives for our textures: Dots, lines, and
triangles. Thus, we explore the texture types of stippling, hatching, and
triangles. We create a range of textures by sampling the density space
uniformly. Then, we conduct three perceptual studies in which the participants
performed pairwise comparisons for each texture type. We use multidimensional
scaling (MDS) to analyze the perceptual spaces per category. The perception of
stippling and triangles seems relatively similar. Both are adequately described
by a 1D manifold in 2D space. The perceptual space of hatching consists of two
main clusters: Crosshatched textures, and textures with only one hatching
direction. However, the perception of hatching textures with only one hatching
direction is similar to the perception of stippling and triangles. Based on our
findings, we construct perceptually uniform illustrative textures. Afterwards,
we provide concrete application examples for the constructed textures.Comment: 11 pages, 15 figures, to be published in IEEE Transactions on
Visualization and Computer Graphic
SmartMocap: Joint Estimation of Human and Camera Motion using Uncalibrated RGB Cameras
Markerless human motion capture (mocap) from multiple RGB cameras is a widely
studied problem. Existing methods either need calibrated cameras or calibrate
them relative to a static camera, which acts as the reference frame for the
mocap system. The calibration step has to be done a priori for every capture
session, which is a tedious process, and re-calibration is required whenever
cameras are intentionally or accidentally moved. In this paper, we propose a
mocap method which uses multiple static and moving extrinsically uncalibrated
RGB cameras. The key components of our method are as follows. First, since the
cameras and the subject can move freely, we select the ground plane as a common
reference to represent both the body and the camera motions unlike existing
methods which represent bodies in the camera coordinate. Second, we learn a
probability distribution of short human motion sequences (1sec) relative
to the ground plane and leverage it to disambiguate between the camera and
human motion. Third, we use this distribution as a motion prior in a novel
multi-stage optimization approach to fit the SMPL human body model and the
camera poses to the human body keypoints on the images. Finally, we show that
our method can work on a variety of datasets ranging from aerial cameras to
smartphones. It also gives more accurate results compared to the
state-of-the-art on the task of monocular human mocap with a static camera. Our
code is available for research purposes on
https://github.com/robot-perception-group/SmartMocap
3D GANs and Latent Space: A comprehensive survey
Generative Adversarial Networks (GANs) have emerged as a significant player
in generative modeling by mapping lower-dimensional random noise to
higher-dimensional spaces. These networks have been used to generate
high-resolution images and 3D objects. The efficient modeling of 3D objects and
human faces is crucial in the development process of 3D graphical environments
such as games or simulations. 3D GANs are a new type of generative model used
for 3D reconstruction, point cloud reconstruction, and 3D semantic scene
completion. The choice of distribution for noise is critical as it represents
the latent space. Understanding a GAN's latent space is essential for
fine-tuning the generated samples, as demonstrated by the morphing of
semantically meaningful parts of images. In this work, we explore the latent
space and 3D GANs, examine several GAN variants and training methods to gain
insights into improving 3D GAN training, and suggest potential future
directions for further research
Inverse Global Illumination using a Neural Radiometric Prior
Inverse rendering methods that account for global illumination are becoming
more popular, but current methods require evaluating and automatically
differentiating millions of path integrals by tracing multiple light bounces,
which remains expensive and prone to noise. Instead, this paper proposes a
radiometric prior as a simple alternative to building complete path integrals
in a traditional differentiable path tracer, while still correctly accounting
for global illumination. Inspired by the Neural Radiosity technique, we use a
neural network as a radiance function, and we introduce a prior consisting of
the norm of the residual of the rendering equation in the inverse rendering
loss. We train our radiance network and optimize scene parameters
simultaneously using a loss consisting of both a photometric term between
renderings and the multi-view input images, and our radiometric prior (the
residual term). This residual term enforces a physical constraint on the
optimization that ensures that the radiance field accounts for global
illumination. We compare our method to a vanilla differentiable path tracer,
and more advanced techniques such as Path Replay Backpropagation. Despite the
simplicity of our approach, we can recover scene parameters with comparable and
in some cases better quality, at considerably lower computation times.Comment: Homepage: https://inverse-neural-radiosity.github.i
3DGen: Triplane Latent Diffusion for Textured Mesh Generation
Latent diffusion models for image generation have crossed a quality threshold
which enabled them to achieve mass adoption. Recently, a series of works have
made advancements towards replicating this success in the 3D domain,
introducing techniques such as point cloud VAE, triplane representation, neural
implicit surfaces and differentiable rendering based training. We take another
step along this direction, combining these developments in a two-step pipeline
consisting of 1) a triplane VAE which can learn latent representations of
textured meshes and 2) a conditional diffusion model which generates the
triplane features. For the first time this architecture allows conditional and
unconditional generation of high quality textured or untextured 3D meshes
across multiple diverse categories in a few seconds on a single GPU. It
outperforms previous work substantially on image-conditioned and unconditional
generation on mesh quality as well as texture generation. Furthermore, we
demonstrate the scalability of our model to large datasets for increased
quality and diversity. We will release our code and trained models
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction
Recent hand-object interaction datasets show limited real object variability
and rely on fitting the MANO parametric model to obtain groundtruth hand
shapes. To go beyond these limitations and spur further research, we introduce
the SHOWMe dataset which consists of 96 videos, annotated with real and
detailed hand-object 3D textured meshes. Following recent work, we consider a
rigid hand-object scenario, in which the pose of the hand with respect to the
object remains constant during the whole video sequence. This assumption allows
us to register sub-millimetre-precise groundtruth 3D scans to the image
sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of
applications where the required accuracy and level of detail is important eg.,
object hand-over in human-robot collaboration, object scanning, or manipulation
and contact point analysis. Importantly, the rigidity of the hand-object
systems allows to tackle video-based 3D reconstruction of unknown hand-held
objects using a 2-stage pipeline consisting of a rigid registration step
followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set
of non-trivial baselines for these two stages and show that it is possible to
achieve promising object-agnostic 3D hand-object reconstructions employing an
SfM toolbox or a hand pose estimator to recover the rigid transforms and
off-the-shelf MVR algorithms. However, these methods remain sensitive to the
initial camera pose estimates which might be imprecise due to lack of textures
on the objects or heavy occlusions of the hands, leaving room for improvements
in the reconstruction. Code and dataset are available at
https://europe.naverlabs.com/research/showmeComment: Paper and Appendix, Accepted in ACVR workshop at ICCV conferenc
Reality3DSketch: Rapid 3D Modeling of Objects from Single Freehand Sketches
The emerging trend of AR/VR places great demands on 3D content. However, most
existing software requires expertise and is difficult for novice users to use.
In this paper, we aim to create sketch-based modeling tools for user-friendly
3D modeling. We introduce Reality3DSketch with a novel application of an
immersive 3D modeling experience, in which a user can capture the surrounding
scene using a monocular RGB camera and can draw a single sketch of an object in
the real-time reconstructed 3D scene. A 3D object is generated and placed in
the desired location, enabled by our novel neural network with the input of a
single sketch. Our neural network can predict the pose of a drawing and can
turn a single sketch into a 3D model with view and structural awareness, which
addresses the challenge of sparse sketch input and view ambiguity. We conducted
extensive experiments synthetic and real-world datasets and achieved
state-of-the-art (SOTA) results in both sketch view estimation and 3D modeling
performance. According to our user study, our method of performing 3D modeling
in a scene is 5x faster than conventional methods. Users are also more
satisfied with the generated 3D model than the results of existing methods.Comment: IEEE Transactions on MultiMedi
Semantic Validation in Structure from Motion
The Structure from Motion (SfM) challenge in computer vision is the process
of recovering the 3D structure of a scene from a series of projective
measurements that are calculated from a collection of 2D images, taken from
different perspectives. SfM consists of three main steps; feature detection and
matching, camera motion estimation, and recovery of 3D structure from estimated
intrinsic and extrinsic parameters and features.
A problem encountered in SfM is that scenes lacking texture or with
repetitive features can cause erroneous feature matching between frames.
Semantic segmentation offers a route to validate and correct SfM models by
labelling pixels in the input images with the use of a deep convolutional
neural network. The semantic and geometric properties associated with classes
in the scene can be taken advantage of to apply prior constraints to each class
of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab
were used. This, along with planar reconstruction of the dense model, were used
to determine erroneous points that may be occluded from the calculated camera
position, given the semantic label, and thus prior constraint of the
reconstructed plane. Herein, semantic segmentation is integrated into SfM to
apply priors on the 3D point cloud, given the object detection in the 2D input
images. Additionally, the semantic labels of matched keypoints are compared and
inconsistent semantically labelled points discarded. Furthermore, semantic
labels on input images are used for the removal of objects associated with
motion in the output SfM models. The proposed approach is evaluated on a
data-set of 1102 images of a repetitive architecture scene. This project offers
a novel method for improved validation of 3D SfM models
- …