29,433 research outputs found
BodyNet: Volumetric Inference of 3D Human Body Shapes
Human shape estimation is an important task for video editing, animation and
fashion industry. Predicting 3D human body shape from natural images, however,
is highly challenging due to factors such as variation in human bodies,
clothing and viewpoint. Prior methods addressing this problem typically attempt
to fit parametric body models with certain priors on pose and shape. In this
work we argue for an alternative representation and propose BodyNet, a neural
network for direct inference of volumetric body shape from a single image.
BodyNet is an end-to-end trainable network that benefits from (i) a volumetric
3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate
supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them
results in performance improvement as demonstrated by our experiments. To
evaluate the method, we fit the SMPL model to our network output and show
state-of-the-art results on the SURREAL and Unite the People datasets,
outperforming recent approaches. Besides achieving state-of-the-art
performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018).
27 page
Ball-Scale Based Hierarchical Multi-Object Recognition in 3D Medical Images
This paper investigates, using prior shape models and the concept of ball
scale (b-scale), ways of automatically recognizing objects in 3D images without
performing elaborate searches or optimization. That is, the goal is to place
the model in a single shot close to the right pose (position, orientation, and
scale) in a given image so that the model boundaries fall in the close vicinity
of object boundaries in the image. This is achieved via the following set of
key ideas: (a) A semi-automatic way of constructing a multi-object shape model
assembly. (b) A novel strategy of encoding, via b-scale, the pose relationship
between objects in the training images and their intensity patterns captured in
b-scale images. (c) A hierarchical mechanism of positioning the model, in a
one-shot way, in a given image from a knowledge of the learnt pose relationship
and the b-scale image of the given image to be segmented. The evaluation
results on a set of 20 routine clinical abdominal female and male CT data sets
indicate the following: (1) Incorporating a large number of objects improves
the recognition accuracy dramatically. (2) The recognition algorithm can be
thought as a hierarchical framework such that quick replacement of the model
assembly is defined as coarse recognition and delineation itself is known as
finest recognition. (3) Scale yields useful information about the relationship
between the model assembly and any given image such that the recognition
results in a placement of the model close to the actual pose without doing any
elaborate searches or optimization. (4) Effective object recognition can make
delineation most accurate.Comment: This paper was published and presented in SPIE Medical Imaging 201
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
- …