102,086 research outputs found

    Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding

    Get PDF
    Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness

    Bethe Ansatz Equations for General Orbifolds of N=4 SYM

    Full text link
    We consider the Bethe Ansatz Equations for orbifolds of N =4 SYM w.r.t. an arbitrary discrete group. Techniques used for the Abelian orbifolds can be extended to the generic non-Abelian case with minor modifications. We show how to make a transition between the different notations in the quiver gauge theory.Comment: LaTeX, 66 pages, 9 eps figures, minor corrections, references adde

    Recursive Motion Estimation on the Essential Manifold

    Get PDF
    Visual motion estimation can be regarded as estimation of the state of a system of difference equations with unknown inputs defined on a manifold. Such a system happens to be "linear", but it is defined on a space (the so called "Essential manifold") which is not a linear (vector) space. In this paper we will introduce a novel perspective for viewing the motion estimation problem which results in three original schemes for solving it. The first consists in "flattening the space" and solving a nonlinear estimation problem on the flat (euclidean) space. The second approach consists in viewing the system as embedded in a larger euclidean space (the smallest of the embedding spaces), and solving at each step a linear estimation problem on a linear space, followed by a "projection" on the manifold (see fig. 5). A third "algebraic" formulation of motion estimation is inspired by the structure of the problem in local coordinates (flattened space), and consists in a double iteration for solving an "adaptive fixed-point" problem (see fig. 6). Each one of these three schemes outputs motion estimates together with the joint second order statistics of the estimation error, which can be used by any structure from motion module which incorporates motion error [20, 23] in order to estimate 3D scene structure. The original contribution of this paper involves both the problem formulation, which gives new insight into the differential geometric structure of visual motion estimation, and the ideas generating the three schemes. These are viewed within a unified framework. All the schemes have a strong theoretical motivation and exhibit accuracy, speed of convergence, real time operation and flexibility which are superior to other existing schemes [1, 20, 23]. Simulations are presented for real and synthetic image sequences to compare the three schemes against each other and highlight the peculiarities of each one

    Attend and Interact: Higher-Order Object Interactions for Video Understanding

    Full text link
    Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships. Furthermore, learning interactions across multiple objects in hundreds of frames for video is computationally infeasible and performance may suffer since a large combinatorial space has to be modeled. In this paper, we propose to efficiently learn higher-order interactions between arbitrary subgroups of objects for fine-grained video understanding. We demonstrate that modeling object interactions significantly improves accuracy for both action recognition and video captioning, while saving more than 3-times the computation over traditional pairwise relationships. The proposed method is validated on two large-scale datasets: Kinetics and ActivityNet Captions. Our SINet and SINet-Caption achieve state-of-the-art performances on both datasets even though the videos are sampled at a maximum of 1 FPS. To the best of our knowledge, this is the first work modeling object interactions on open domain large-scale video datasets, and we additionally model higher-order object interactions which improves the performance with low computational costs.Comment: CVPR 201
    • …
    corecore