20,040 research outputs found

    Fast, invariant representation for human action in the visual system

    Get PDF
    Humans can effortlessly recognize others' actions in the presence of complex transformations, such as changes in viewpoint. Several studies have located the regions in the brain involved in invariant action recognition, however, the underlying neural computations remain poorly understood. We use magnetoencephalography (MEG) decoding and a dataset of well-controlled, naturalistic videos of five actions (run, walk, jump, eat, drink) performed by different actors at different viewpoints to study the computational steps used to recognize actions across complex transformations. In particular, we ask when the brain discounts changes in 3D viewpoint relative to when it initially discriminates between actions. We measure the latency difference between invariant and non-invariant action decoding when subjects view full videos as well as form-depleted and motion-depleted stimuli. Our results show no difference in decoding latency or temporal profile between invariant and non-invariant action recognition in full videos. However, when either form or motion information is removed from the stimulus set, we observe a decrease and delay in invariant action decoding. Our results suggest that the brain recognizes actions and builds invariance to complex transformations at the same time, and that both form and motion information are crucial for fast, invariant action recognition

    Person Re-identification by Local Maximal Occurrence Representation and Metric Learning

    Full text link
    Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.Comment: This paper has been accepted by CVPR 2015. For source codes and extracted features please visit http://www.cbsr.ia.ac.cn/users/scliao/projects/lomo_xqda

    Learning optimised representations for view-invariant gait recognition

    Get PDF
    Gait recognition can be performed without subject cooperation under harsh conditions, thus it is an important tool in forensic gait analysis, security control, and other commercial applications. One critical issue that prevents gait recognition systems from being widely accepted is the performance drop when the camera viewpoint varies between the registered templates and the query data. In this paper, we explore the potential of combining feature optimisers and representations learned by convolutional neural networks (CNN) to achieve efficient view-invariant gait recognition. The experimental results indicate that CNN learns highly discriminative representations across moderate view variations, and these representations can be further improved using view-invariant feature selectors, achieving a high matching accuracy across views

    Automatic vehicle tracking and recognition from aerial image sequences

    Full text link
    This paper addresses the problem of automated vehicle tracking and recognition from aerial image sequences. Motivated by its successes in the existing literature focus on the use of linear appearance subspaces to describe multi-view object appearance and highlight the challenges involved in their application as a part of a practical system. A working solution which includes steps for data extraction and normalization is described. In experiments on real-world data the proposed methodology achieved promising results with a high correct recognition rate and few, meaningful errors (type II errors whereby genuinely similar targets are sometimes being confused with one another). Directions for future research and possible improvements of the proposed method are discussed

    How can cells in the anterior medial face patch be viewpoint invariant?

    Get PDF
    In a recent paper, Freiwald and Tsao (2010) found evidence that the responses of cells in the macaque anterior medial (AM) face patch are invariant to significant changes in viewpoint. The monkey subjects had no prior experience with the individuals depicted in the stimuli and were never given an opportunity to view the same individual from different viewpoints sequentially. These results cannot be explained by a mechanism based on temporal association of experienced views. Employing a biologically plausible model of object recognition (software available at cbcl.mit.edu), we show two mechanisms which could account for these results. First, we show that hair style and skin color provide sufficient information to enable viewpoint recognition without resorting to any mechanism that associates images across views. It is likely that a large part of the effect described in patch AM is attributable to these cues. Separately, we show that it is possible to further improve view-invariance using class-specific features (see Vetter 1997). Faces, as a class, transform under 3D rotation in similar enough ways that it is possible to use previously viewed example faces to learn a general model of how all faces rotate. Novel faces can be encoded relative to these previously encountered “template” faces and thus recognized with some degree of invariance to 3D rotation. Since each object class transforms differently under 3D rotation, it follows that invariant recognition from a single view requires a recognition architecture with a detection step determining the class of an object (e.g. face or non-face) prior to a subsequent identification stage utilizing the appropriate class-specific features
    • …
    corecore