4,653 research outputs found
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
- …