2,731 research outputs found
Extracting curve-skeletons from digital shapes using occluding contours
Curve-skeletons are compact and semantically relevant shape descriptors, able to summarize both topology and pose of a wide range of digital objects. Most of the state-of-the-art algorithms for their computation rely on the type of geometric primitives used and sampling frequency. In this paper we introduce a formally sound and intuitive definition of curve-skeleton, then we propose a novel method for skeleton extraction that rely on the visual appearance of the shapes. To achieve this result we inspect the properties of occluding contours, showing how information about the symmetry axes of a 3D shape can be inferred by a small set of its planar projections. The proposed method is fast, insensitive to noise, capable of working with different shape representations, resolution insensitive and easy to implement
Recurrent Attention Models for Depth-Based Person Identification
We present an attention-based model that reasons on human body shape and
motion dynamics to identify individuals in the absence of RGB information,
hence in the dark. Our approach leverages unique 4D spatio-temporal signatures
to address the identification problem across days. Formulated as a
reinforcement learning task, our model is based on a combination of
convolutional and recurrent neural networks with the goal of identifying small,
discriminative regions indicative of human identity. We demonstrate that our
model produces state-of-the-art results on several published datasets given
only depth images. We further study the robustness of our model towards
viewpoint, appearance, and volumetric changes. Finally, we share insights
gleaned from interpretable 2D, 3D, and 4D visualizations of our model's
spatio-temporal attention.Comment: Computer Vision and Pattern Recognition (CVPR) 201
Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification
Gait-based person re-identification (Re-ID) is valuable for safety-critical
applications, and using only 3D skeleton data to extract discriminative gait
features for person Re-ID is an emerging open topic. Existing methods either
adopt hand-crafted features or learn gait features by traditional supervised
learning paradigms. Unlike previous methods, we for the first time propose a
generic gait encoding approach that can utilize unlabeled skeleton data to
learn gait representations in a self-supervised manner. Specifically, we first
propose to introduce self-supervision by learning to reconstruct input skeleton
sequences in reverse order, which facilitates learning richer high-level
semantics and better gait representations. Second, inspired by the fact that
motion's continuity endows temporally adjacent skeletons with higher
correlations ("locality"), we propose a locality-aware attention mechanism that
encourages learning larger attention weights for temporally adjacent skeletons
when reconstructing current skeleton, so as to learn locality when encoding
gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are
built using context vectors learned by locality-aware attention, as final gait
representations. AGEs are directly utilized to realize effective person Re-ID.
Our approach typically improves existing skeleton-based methods by 10-20%
Rank-1 accuracy, and it achieves comparable or even superior performance to
multi-modal methods with extra RGB or depth information. Our codes are
available at https://github.com/Kali-Hac/SGE-LA.Comment: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI.
Codes are available at https://github.com/Kali-Hac/SGE-L
Recommended from our members
LEARNING TO RIG CHARACTERS
With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging.
First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons match the animator expectations in joint placement and topology. RigNet also estimates surface skin weights which determine how the mesh is animated given the different skeletal poses. In contrast to prior work that fits pre-defined skeletal templates with hand-tuned objectives, RigNet is able to automatically rig diverse characters, such as humanoids, quadrupeds, toys, birds, with varying articulation structure and geometry. RigNet is based on a deep neural architecture that directly operates on the mesh representation. The architecture is trained on a diverse dataset of rigged models that we mined online and curated. The dataset includes 2.7K polygon meshes, along with their associated skeletons and corresponding skin weights.
Second, the thesis introduces Morig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Compared to RigNet, MoRig\u27s rigging is \emph{motion-aware}: its neural network encodes motion cues from the point clouds into compact feature representations that are informative about the articulated parts of the performing character. These motion-aware features guide the inference of an appropriate skeletal rig for the input mesh. Furthermore, Morig is able to animate the rig according to the captured point cloud motion. Morig can handle diverse characters with different morphologies (e.g., humanoids, quadrupeds, toy characters). It also accounts for occluded regions in the point clouds and mismatches in the part proportions between the input mesh and captured character.
Third, the thesis introduces APES, a method that takes as input 2D raster images depicting a small set of poses of a character shown in a sprite sheet, and identifies articulated parts useful for rigging the character. APES uses a combination of neural network inference and integer linear programming to identify a compact set of articulated body parts, e.g. head, torso and limbs, that best reconstruct the input poses. Compared to Morig and RigNet that require a large collection of training models with associated skeletons and skinning weights, APES\u27 neural architecture relies on less effortful supervision from (i) pixel correspondences readily available in existing large cartoon image datasets (e.g., Creative Flow), (ii) a relatively small dataset of 57 cartoon characters segmented into moving parts.
Finally, the thesis discusses future research directions related to combining neural rigging with 3D and 4D reconstruction of characters from point cloud data and 2D video as well as automating the process of motion synthesis for 3D characters
PHRIT: Parametric Hand Representation with Implicit Template
We propose PHRIT, a novel approach for parametric hand mesh modeling with an
implicit template that combines the advantages of both parametric meshes and
implicit representations. Our method represents deformable hand shapes using
signed distance fields (SDFs) with part-based shape priors, utilizing a
deformation field to execute the deformation. The model offers efficient
high-fidelity hand reconstruction by deforming the canonical template at
infinite resolution. Additionally, it is fully differentiable and can be easily
used in hand modeling since it can be driven by the skeleton and shape latent
codes. We evaluate PHRIT on multiple downstream tasks, including
skeleton-driven hand reconstruction, shapes from point clouds, and single-view
3D reconstruction, demonstrating that our approach achieves realistic and
immersive hand modeling with state-of-the-art performance.Comment: Accepted by ICCV202
Template-free Articulated Neural Point Clouds for Reposable View Synthesis
Dynamic Neural Radiance Fields (NeRFs) achieve remarkable visual quality when
synthesizing novel views of time-evolving 3D scenes. However, the common
reliance on backward deformation fields makes reanimation of the captured
object poses challenging. Moreover, the state of the art dynamic models are
often limited by low visual fidelity, long reconstruction time or specificity
to narrow application domains. In this paper, we present a novel method
utilizing a point-based representation and Linear Blend Skinning (LBS) to
jointly learn a Dynamic NeRF and an associated skeletal model from even sparse
multi-view video. Our forward-warping approach achieves state-of-the-art visual
fidelity when synthesizing novel views and poses while significantly reducing
the necessary learning time when compared to existing work. We demonstrate the
versatility of our representation on a variety of articulated objects from
common datasets and obtain reposable 3D reconstructions without the need of
object-specific skeletal templates. Code will be made available at
https://github.com/lukasuz/Articulated-Point-NeRF
- …