26 research outputs found
Understanding Everyday Hands in Action from RGB-D Images
International audienceWe analyze functional manipulations of handheld objects, formalizing the problem as one of fine-grained grasp classification. To do so, we make use of a recently developed fine-grained taxonomy of human-object grasps. We introduce a large dataset of 12000 RGB-D images covering 71 everyday grasps in natural interactions. Our dataset is different from past work (typically addressed from a robotics perspective) in terms of its scale, diversity, and combination of RGB and depth data. From a computer-vision perspective , our dataset allows for exploration of contact and force prediction (crucial concepts in functional grasp analysis) from perceptual cues. We present extensive experimental results with state-of-the-art baselines, illustrating the role of segmentation, object context, and 3D-understanding in functional grasp analysis. We demonstrate a near 2X improvement over prior work and a naive deep baseline, while pointing out important directions for improvement
Progressive Skeletonization: Trimming more fat from a network at initialization
Recent studies have shown that skeletonization (pruning parameters) of
networks \textit{at initialization} provides all the practical benefits of
sparsity both at inference and training time, while only marginally degrading
their performance. However, we observe that beyond a certain level of sparsity
(approx ), these approaches fail to preserve the network performance, and
to our surprise, in many cases perform even worse than trivial random pruning.
To this end, we propose an objective to find a skeletonized network with
maximum {\em foresight connection sensitivity} (FORCE) whereby the
trainability, in terms of connection sensitivity, of a pruned network is taken
into consideration. We then propose two approximate procedures to maximize our
objective (1) Iterative SNIP: allows parameters that were unimportant at
earlier stages of skeletonization to become important at later stages; and (2)
FORCE: iterative process that allows exploration by allowing already pruned
parameters to resurrect at later stages of skeletonization. Empirical analyses
on a large suite of experiments show that our approach, while providing at
least as good a performance as other recent approaches on moderate pruning
levels, provides remarkably improved performance on higher pruning levels
(could remove up to parameters while keeping the networks trainable).
Code can be found in https://github.com/naver/force
Depth-based hand pose estimation: data, methods, and challenges
International audienceHand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby objects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria , making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction
Recent hand-object interaction datasets show limited real object variability
and rely on fitting the MANO parametric model to obtain groundtruth hand
shapes. To go beyond these limitations and spur further research, we introduce
the SHOWMe dataset which consists of 96 videos, annotated with real and
detailed hand-object 3D textured meshes. Following recent work, we consider a
rigid hand-object scenario, in which the pose of the hand with respect to the
object remains constant during the whole video sequence. This assumption allows
us to register sub-millimetre-precise groundtruth 3D scans to the image
sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of
applications where the required accuracy and level of detail is important eg.,
object hand-over in human-robot collaboration, object scanning, or manipulation
and contact point analysis. Importantly, the rigidity of the hand-object
systems allows to tackle video-based 3D reconstruction of unknown hand-held
objects using a 2-stage pipeline consisting of a rigid registration step
followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set
of non-trivial baselines for these two stages and show that it is possible to
achieve promising object-agnostic 3D hand-object reconstructions employing an
SfM toolbox or a hand pose estimator to recover the rigid transforms and
off-the-shelf MVR algorithms. However, these methods remain sensitive to the
initial camera pose estimates which might be imprecise due to lack of textures
on the objects or heavy occlusions of the hands, leaving room for improvements
in the reconstruction. Code and dataset are available at
https://europe.naverlabs.com/research/showmeComment: Paper and Appendix, Accepted in ACVR workshop at ICCV conferenc
Near Earth Asteroid Scout - Mission Update
After its deployment from NASA’s Space Launch System (SLS), the Near-Earth Asteroid (NEA) Scout mission will travel to and image an asteroid during a close flyby using an 86m2 solar sail as its primary propulsion. Solar sails are large, mirror-like structures made of a lightweight material that reflects sunlight to propel the spacecraft. The continuous solar photon pressure provides thrust with no need for the heavy, expendable propellants used by conventional chemical and electric propulsion systems. Developed by NASA’s Marshall Space Flight Center (MSFC) and Jet Propulsion Laboratory (JPL), the NEA Scout is based on the industry-standard CubeSat form factor. The spacecraft measures 11 cm x 24 cm x 36 cm and weighs less than 14 kilograms. Following deployment from the Space Launch System (SLS), the solar sail will deploy, and the spacecraft will begin its 2.0 – 2.5-year journey. About one month before the asteroid flyby, NEA Scout will search for the target and start its Approach Phase using a combination of radio tracking and optical navigation and perform a relatively slow flyby (10-20 m/s) of the target. A summary of the mission, sailcraft, mission design, and its first several months of deep space operation will be described
4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements
This work presents 4DHumanOutfit, a new dataset of densely sampled
spatio-temporal 4D human motion data of different actors, outfits and motions.
The dataset is designed to contain different actors wearing different outfits
while performing different motions in each outfit. In this way, the dataset can
be seen as a cube of data containing 4D motion sequences along 3 axes with
identity, outfit and motion. This rich dataset has numerous potential
applications for the processing and creation of digital humans, e.g. augmented
reality, avatar creation and virtual try on. 4DHumanOutfit is released for
research purposes at https://kinovis.inria.fr/4dhumanoutfit/. In addition to
image data and 4D reconstructions, the dataset includes reference solutions for
each axis. We present independent baselines along each axis that demonstrate
the value of these reference solutions for evaluation tasks