32,722 research outputs found

    Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding

    Get PDF
    Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness

    Slow and steady feature analysis: higher order temporal coherence in video

    Full text link
    How can unlabeled video augment visual learning? Existing methods perform "slow" feature analysis, encouraging the representations of temporally close frames to exhibit only small differences. While this standard approach captures the fact that high-level visual signals change slowly over time, it fails to capture *how* the visual content changes. We propose to generalize slow feature analysis to "steady" feature analysis. The key idea is to impose a prior that higher order derivatives in the learned feature space must be small. To this end, we train a convolutional neural network with a regularizer on tuples of sequential frames from unlabeled video. It encourages feature changes over time to be smooth, i.e., similar to the most recent changes. Using five diverse datasets, including unlabeled YouTube and KITTI videos, we demonstrate our method's impact on object, scene, and action recognition tasks. We further show that our features learned from unlabeled video can even surpass a standard heavily supervised pretraining approach.Comment: in Computer Vision and Pattern Recognition (CVPR) 2016, Las Vegas, NV, June 201

    How Does Our Visual System Achieve Shift and Size Invariance?

    Get PDF
    The question of shift and size invariance in the primate visual system is discussed. After a short review of the relevant neurobiology and psychophysics, a more detailed analysis of computational models is given. The two main types of networks considered are the dynamic routing circuit model and invariant feature networks, such as the neocognitron. Some specific open questions in context of these models are raised and possible solutions discussed

    Understanding Slow Feature Analysis: A Mathematical Framework

    Get PDF
    Slow feature analysis is an algorithm for unsupervised learning of invariant representations from data with temporal correlations. Here, we present a mathematical analysis of slow feature analysis for the case where the input-output functions are not restricted in complexity. We show that the optimal functions obey a partial differential eigenvalue problem of a type that is common in theoretical physics. This analogy allows the transfer of mathematical techniques and intuitions from physics to concrete applications of slow feature analysis, thereby providing the means for analytical predictions and a better understanding of simulation results. We put particular emphasis on the situation where the input data are generated from a set of statistically independent sources.\ud The dependence of the optimal functions on the sources is calculated analytically for the cases where the sources have Gaussian or uniform distribution

    Zernike velocity moments for sequence-based description of moving features

    No full text
    The increasing interest in processing sequences of images motivates development of techniques for sequence-based object analysis and description. Accordingly, new velocity moments have been developed to allow a statistical description of both shape and associated motion through an image sequence. Through a generic framework motion information is determined using the established centralised moments, enabling statistical moments to be applied to motion based time series analysis. The translation invariant Cartesian velocity moments suffer from highly correlated descriptions due to their non-orthogonality. The new Zernike velocity moments overcome this by using orthogonal spatial descriptions through the proven orthogonal Zernike basis. Further, they are translation and scale invariant. To illustrate their benefits and application the Zernike velocity moments have been applied to gait recognition—an emergent biometric. Good recognition results have been achieved on multiple datasets using relatively few spatial and/or motion features and basic feature selection and classification techniques. The prime aim of this new technique is to allow the generation of statistical features which encode shape and motion information, with generic application capability. Applied performance analyses illustrate the properties of the Zernike velocity moments which exploit temporal correlation to improve a shape's description. It is demonstrated how the temporal correlation improves the performance of the descriptor under more generalised application scenarios, including reduced resolution imagery and occlusion

    Unsupervised learning of clutter-resistant visual representations from natural videos

    Get PDF
    Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak's trace rule [7]. Such methods exploit the temporal continuity of the visual world by assuming that visual experience over short timescales will tend to have invariant identity content. Thus, by associating representations of frames from nearby times, a representation that tolerates whatever transformations occurred in the video may be achieved. Many previous studies verified that such rules can work in simple situations without background clutter, but the presence of visual clutter has remained problematic for this approach. Here we show that temporal association based on large class-specific filters (templates) avoids the problem of clutter. Our system learns in an unsupervised way from natural videos gathered from the internet, and is able to perform a difficult unconstrained face recognition task on natural images: Labeled Faces in the Wild [8]
    • …
    corecore