28,287 research outputs found
SurfNet: Generating 3D shape surfaces using deep residual networks
3D shape models are naturally parameterized using vertices and faces, \ie,
composed of polygons forming a surface. However, current 3D learning paradigms
for predictive and generative tasks using convolutional neural networks focus
on a voxelized representation of the object. Lifting convolution operators from
the traditional 2D to 3D results in high computational overhead with little
additional benefit as most of the geometry information is contained on the
surface boundary. Here we study the problem of directly generating the 3D shape
surface of rigid and non-rigid shapes using deep convolutional neural networks.
We develop a procedure to create consistent `geometry images' representing the
shape surface of a category of 3D objects. We then use this consistent
representation for category-specific shape surface generation from a parametric
representation or an image by developing novel extensions of deep residual
networks for the task of geometry image generation. Our experiments indicate
that our network learns a meaningful representation of shape surfaces allowing
it to interpolate between shape orientations and poses, invent new shape
surfaces and reconstruct 3D shape surfaces from previously unseen images.Comment: CVPR 2017 pape
{D-NeRF}: {N}eural Radiance Fields for Dynamic Scenes
Trabajo presentado en la IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), celebrada de forma virtual desde Nashville, TN (Estados Unidos), del 20 al 25 de junio de 2021Neural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views of a scene from a sparse set of images. Among these, stands out the Neural radiance fields (NeRF), which trains a deep network to map 5D input coordinates (representing spatial location and viewing direction) into a volume density and view-dependent emitted radiance. However, despite achieving an unprecedented level of photorealism on the generated images, NeRF is only applicable to static scenes, where the same spatial location can be queried from different images. In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions. For this purpose we consider time as an additional input to the system, and split the learning process in two main stages: one that encodes the scene into a canonical space and another that maps this canonical representation into the deformed scene at a particular time. Both mappings are learned using fully-connected networks. Once the networks are trained, D-NeRF can render novel images, controlling both the camera view and the time variable, and thus, the object movement. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.Peer reviewe
3D-PhysNet: Learning the Intuitive Physics of Non-Rigid Object Deformations
The ability to interact and understand the environment is a fundamental
prerequisite for a wide range of applications from robotics to augmented
reality. In particular, predicting how deformable objects will react to applied
forces in real time is a significant challenge. This is further confounded by
the fact that shape information about encountered objects in the real world is
often impaired by occlusions, noise and missing regions e.g. a robot
manipulating an object will only be able to observe a partial view of the
entire solid. In this work we present a framework, 3D-PhysNet, which is able to
predict how a three-dimensional solid will deform under an applied force using
intuitive physics modelling. In particular, we propose a new method to encode
the physical properties of the material and the applied force, enabling
generalisation over materials. The key is to combine deep variational
autoencoders with adversarial training, conditioned on the applied force and
the material properties. We further propose a cascaded architecture that takes
a single 2.5D depth view of the object and predicts its deformation. Training
data is provided by a physics simulator. The network is fast enough to be used
in real-time applications from partial views. Experimental results show the
viability and the generalisation properties of the proposed architecture.Comment: in IJCAI 201
Context-aware Human Motion Prediction
The problem of predicting human motion given a sequence of past observations
is at the core of many applications in robotics and computer vision. Current
state-of-the-art formulate this problem as a sequence-to-sequence task, in
which a historical of 3D skeletons feeds a Recurrent Neural Network (RNN) that
predicts future movements, typically in the order of 1 to 2 seconds. However,
one aspect that has been obviated so far, is the fact that human motion is
inherently driven by interactions with objects and/or other humans in the
environment. In this paper, we explore this scenario using a novel
context-aware motion prediction architecture. We use a semantic-graph model
where the nodes parameterize the human and objects in the scene and the edges
their mutual interactions. These interactions are iteratively learned through a
graph attention layer, fed with the past observations, which now include both
object and human body motions. Once this semantic graph is learned, we inject
it to a standard RNN to predict future movements of the human/s and object/s.
We consider two variants of our architecture, either freezing the contextual
interactions in the future of updating them. A thorough evaluation in the
"Whole-Body Human Motion Database" shows that in both cases, our context-aware
networks clearly outperform baselines in which the context information is not
considered.Comment: Accepted at CVPR2
D-NeRF: neural radiance fields for dynamic scenes
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksNeural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views of a scene from a sparse set of images. Among these, stands out the Neural radiance fields (NeRF), which trains a deep network to map 5D input coordinates (representing spatial location and viewing direction) into a volume density and view-dependent emitted radiance. However, despite achieving an unprecedented level of photorealism on the generated images, NeRF is only applicable to static scenes, where the same spatial location can be queried from different images. In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions. For this purpose we consider time as an additional input to the system, and split the learning process in two main stages: one that encodes the scene into a canonical space and another that maps this canonical representation into the deformed scene at a particular time. Both mappings are learned using fully-connected networks. Once the networks are trained, D-NeRF can render novel images, controlling both the camera view and the time variable, and thus, the object movement. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.This work is supported in part by a Google Daydream Research award and by the Spanish government with the project HuMoUR TIN2017-90086-R, the ERA-Net Chistera project IPALM PCI2019-103386 and MarÃa de Maeztu Seal of Excellence MDM-2016- 0656. Gerard Pons-Moll is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans).Peer ReviewedPostprint (author's final draft
Function-Theoretic Explanation and the Search for Neural Mechanisms
A common kind of explanation in cognitive neuroscience might be called functiontheoretic:
with some target cognitive capacity in view, the theorist hypothesizes that
the system computes a well-defined function (in the mathematical sense) and explains
how computing this function constitutes (in the system’s normal environment) the
exercise of the cognitive capacity. Recently, proponents of the so-called ‘new mechanist’
approach in philosophy of science have argued that a model of a cognitive capacity is
explanatory only to the extent that it reveals the causal structure of the mechanism
underlying the capacity. If they are right, then a cognitive model that resists a transparent
mapping to known neural mechanisms fails to be explanatory. I argue that a functiontheoretic
characterization of a cognitive capacity can be genuinely explanatory even
absent an account of how the capacity is realized in neural hardware
Deformable Shape Completion with Graph Convolutional Autoencoders
The availability of affordable and portable depth sensors has made scanning
objects and people simpler than ever. However, dealing with occlusions and
missing parts is still a significant challenge. The problem of reconstructing a
(possibly non-rigidly moving) 3D object from a single or multiple partial scans
has received increasing attention in recent years. In this work, we propose a
novel learning-based method for the completion of partial shapes. Unlike the
majority of existing approaches, our method focuses on objects that can undergo
non-rigid deformations. The core of our method is a variational autoencoder
with graph convolutional operations that learns a latent space for complete
realistic shapes. At inference, we optimize to find the representation in this
latent space that best fits the generated shape to the known partial input. The
completed shape exhibits a realistic appearance on the unknown part. We show
promising results towards the completion of synthetic and real scans of human
body and face meshes exhibiting different styles of articulation and
partiality.Comment: CVPR 201
- …