1,393 research outputs found
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack
Post capture refocusing effect in smartphone cameras is achievable by using
focal stacks. However, the accuracy of this effect is totally dependent on the
combination of the depth layers in the stack. The accuracy of the extended
depth of field effect in this application can be improved significantly by
computing an accurate depth map which has been an open issue for decades. To
tackle this issue, in this paper, a framework is proposed based on
Preconditioned Alternating Direction Method of Multipliers (PADMM) for depth
from the focal stack and synthetic defocus application. In addition to its
ability to provide high structural accuracy and occlusion handling, the
optimization function of the proposed method can, in fact, converge faster and
better than state of the art methods. The evaluation has been done on 21 sets
of focal stacks and the optimization function has been compared against 5 other
methods. Preliminary results indicate that the proposed method has a better
performance in terms of structural accuracy and optimization in comparison to
the current state of the art methods.Comment: 15 pages, 8 figure
Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction
Current bundle adjustment solvers such as the Levenberg-Marquardt (LM)
algorithm are limited by the bottleneck in solving the Reduced Camera System
(RCS) whose dimension is proportional to the camera number. When the problem is
scaled up, this step is neither efficient in computation nor manageable for a
single compute node. In this work, we propose a stochastic bundle adjustment
algorithm which seeks to decompose the RCS approximately inside the LM
iterations to improve the efficiency and scalability. It first reformulates the
quadratic programming problem of an LM iteration based on the clustering of the
visibility graph by introducing the equality constraints across clusters. Then,
we propose to relax it into a chance constrained problem and solve it through
sampled convex program. The relaxation is intended to eliminate the
interdependence between clusters embodied by the constraints, so that a large
RCS can be decomposed into independent linear sub-problems. Numerical
experiments on unordered Internet image sets and sequential SLAM image sets, as
well as distributed experiments on large-scale datasets, have demonstrated the
high efficiency and scalability of the proposed approach. Codes are released at
https://github.com/zlthinker/STBA.Comment: Accepted by ECCV 202
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
- …