65,821 research outputs found
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
We present a method for simultaneously estimating 3D human pose and body
shape from a sparse set of wide-baseline camera views. We train a symmetric
convolutional autoencoder with a dual loss that enforces learning of a latent
representation that encodes skeletal joint positions, and at the same time
learns a deep representation of volumetric body shape. We harness the latter to
up-scale input volumetric data by a factor of , whilst recovering a
3D estimate of joint positions with equal or greater accuracy than the state of
the art. Inference runs in real-time (25 fps) and has the potential for passive
human behaviour monitoring where there is a requirement for high fidelity
estimation of human body shape and pose
Knowledge-based vision and simple visual machines
The vast majority of work in machine vision emphasizes the representation of perceived objects and events: it is these internal representations that incorporate the 'knowledge' in knowledge-based vision or form the 'models' in model-based vision. In this paper, we discuss simple machine vision systems developed by artificial evolution rather than traditional engineering design techniques, and note that the task of identifying internal representations within such systems is made difficult by the lack of an operational definition of representation at the causal mechanistic level. Consequently, we question the nature and indeed the existence of representations posited to be used within natural vision systems (i.e. animals). We conclude that representations argued for on a priori grounds by external observers of a particular vision system may well be illusory, and are at best place-holders for yet-to-be-identified causal mechanistic interactions. That is, applying the knowledge-based vision approach in the understanding of evolved systems (machines or animals) may well lead to theories and models that are internally consistent, computationally plausible, and entirely wrong
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple
applications ranging from robotic manipulation to augmented reality. Prior
methods have tackled this problem through generative models which predict 3D
reconstructions as voxels or point clouds. However, these methods can be
computationally expensive and miss fine details. We introduce a new
differentiable layer for 3D data deformation and use it in DeformNet to learn a
model for 3D reconstruction-through-deformation. DeformNet takes an image
input, searches the nearest shape template from a database, and deforms the
template to match the query image. We evaluate our approach on the ShapeNet
dataset and show that - (a) the Free-Form Deformation layer is a powerful new
building block for Deep Learning models that manipulate 3D data (b) DeformNet
uses this FFD layer combined with shape retrieval for smooth and
detail-preserving 3D reconstruction of qualitatively plausible point clouds
with respect to a single query image (c) compared to other state-of-the-art 3D
reconstruction methods, DeformNet quantitatively matches or outperforms their
benchmarks by significant margins. For more information, visit:
https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP
Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models
We present a new deep learning architecture (called Kd-network) that is
designed for 3D model recognition tasks and works with unstructured point
clouds. The new architecture performs multiplicative transformations and share
parameters of these transformations according to the subdivisions of the point
clouds imposed onto them by Kd-trees. Unlike the currently dominant
convolutional architectures that usually require rasterization on uniform
two-dimensional or three-dimensional grids, Kd-networks do not rely on such
grids in any way and therefore avoid poor scaling behaviour. In a series of
experiments with popular shape recognition benchmarks, Kd-networks demonstrate
competitive performance in a number of shape recognition tasks such as shape
classification, shape retrieval and shape part segmentation.Comment: Spotlight at ICCV'1
Recommended from our members
Deep Neural Networks for 3D Processing and High-Dimensional Filtering
Deep neural networks (DNN) have seen tremendous success in the past few years, advancing state of the art in many AI areas by significant margins. Part of the success can be attributed to the wide adoption of convolutional filters. These filters can effectively capture the invariance in data, leading to faster training and more compact representations, and at the same can leverage efficient parallel implementations on modern hardware. Since convolution operates on regularly structured grids, it is a particularly good fit for texts and images where there are inherent rigid 1D or 2D structures. However, extending DNNs to 3D or higher-dimensional spaces is non-trivial, because data in such spaces often lack regular structure, and the curse of dimensionality can also adversely impact performance in multiple ways.
In this dissertation, we present several new types of neural network operations and architectures for data in 3D and higher-dimensional spaces and demonstrate how we can mitigate these issues while retaining the favorable properties of 2D convolutions. First, we investigate view-based representations for 3D shape recognition. We show that a collection of 2D views can be highly informative, and we can adapt standard 2D DNNs with a simple pooling strategy to recognize objects based on their appearances from multiple viewing angles with unprecedented accuracies. Our next study makes a connection between 3D point cloud processing and sparse high-dimensional filtering. The resulting representation is highly efficient and flexible, and enables native 3D operations as well as joint 2D-3D reasoning. Finally, we show that high-dimensional filtering is also a powerful tool for content-adaptive image filtering. We demonstrate its utility in computer vision applications where preserving sharp details in output is critical, including joint upsampling and semantic segmentation
- …