18 research outputs found

    Dorsal stream : from algorithm to neuroscience

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 173-195).The dorsal stream in the primate visual cortex is involved in the perception of motion and the recognition of actions. The two topics, motion processing in the brain, and action recognition in videos, have been developed independently in the field of neuroscience and computer vision. We present a dorsal stream model that can be used for the recognition of actions as well as explaining neurophysiology in the dorsal stream. The model consists of a spatio-temporal feature detectors of increasing complexity: an input image sequence is first analyzed by an array of motion sensitive units which, through a hierarchy of processing stages, lead to position and scale invariant representation of motion in a video sequence. The model outperforms or on par with the state-of-the-art computer vision algorithms on a range of human action datasets. We then describe the extension of the model into a high-throughput system for the recognition of mouse behaviors in their homecage. We provide software and a very large manually annotated video database used for training and testing the system. Our system outperforms a commercial software and performs on par with human scoring, as measured from the ground-truth manual annotations of more than 10 hours of videos of freely behaving mice. We complete the neurobiological side of the model by showing it could explain the motion processing as well as action selectivity in the dorsal stream, based on comparisons between model outputs and the neuronal responses in the dorsal stream. Specifically, the model could explain pattern and component sensitivity and distribution [161], local motion integration [97], and speed-tuning [144] of MT cells. The model, when combining with the ventral stream model [173], could also explain the action and actor selectivity in the STP area. There exists only a few models for the motion processing in the dorsal stream, and these models were not be applied to the real-world computer vision tasks. Our model is one that agrees with (or processes) data at different levels: from computer vision algorithm, practical software, to neuroscience.by Hueihan Jhuang.Ph.D

    A biologically inspired system for action recognition

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 51-58).We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures and extends a neurobiological model of motion processing in the visual cortex. The system consists of a hierarchy of spatio-temporal feature detectors of increasing complexity: an input sequence is first analyzed by an array of motion-direction sensitive units which, through a hierarchy of processing stages, lead to position-invariant spatio-temporal feature detectors. We experiment with different types of motion-direction sensitive units as well as different system architectures. Besides, we find that sparse features in intermediate stages outperform dense ones and that using a simple feature selection approach leads to an efficient system that performs better with far fewer features. We test the approach on different publicly available action datasets, in all cases achieving the best results reported to date.by Hueihan Jhuang.S.M

    Automated home-cage behavioral phenotyping of mice

    Get PDF
    We describe a trainable computer vision system enabling the automated analysis of complex mouse behaviors. We provide software and a very large manually annotated video database used for training and testing the system. Our system outperforms leading commercial software and performs on par with human scoring, as measured from the ground-truth manual annotations of thousands of clips of freely behaving animals. We show that the home-cage behavior profiles provided by the system is sufficient to accurately predict the strain identity of individual animals in the case of two standard inbred and two non-standard mouse strains. Our software should complement existing sensor-based automated approaches and help develop an adaptable, comprehensive, high-throughput, fine-grained, automated analysis of rodent behavior

    Trainable, vision-based automated home cage behavioral phenotyping

    Get PDF
    We describe a fully trainable computer vision system enabling the automated analysis of complex mouse behaviors. Our system computes a sequence of feature descriptors for each video sequence and a classifier is used to learn a mapping from these features to behaviors of interest. We collected a very large manually annotated video database of mouse behaviors for training and testing the system. Our system performs on par with human scoring, as measured from the ground-truth manual annotations of thousands of clips of freely behaving mice. As a validation of the system, we characterized the home cage behaviors of two standard inbred and two nonstandard mouse strains. From this data, we were able to predict the strain identity of individual mice with high accuracy.California Institute of Technology. Broad Fellows Program in Brain CircuitryNational Science Council of Taiwan (TMS-094-1-A032

    Towards understanding action recognition

    Get PDF
    International audienceAlthough action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important - for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that highlevel pose features greatly outperform low/mid level features; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and JHMDB dataset should facilitate a deeper understanding of action recognition algorithms

    Motivation: A Biologically Inspired System for Action Recognition

    Full text link
    The hierarchical object recognition model [1] works well on several benchmark datasets. It models the ventral pathway of the cortex by using the Gabor filter to simulate the orientation selectivity of V1 simple cells, and using max operation to capture the non-linearity of V1 complex cells. Parallel to this ventral pathway, a dorsal pathway, mainly includes V1 and MT, was suggested to be a motion pathway. Besides above orientation selectivity, V1 cell are also selective to direction of motion presented a moving stimulus. MT area is known as having a majority of cells tuned to speed. Having collected evidence of the way motion information is processed in cortex, we would like to know, how the idea of building object recognition model can be extended to do action recognition, which has one more dimension, time, to be considered. In this work, we try to develop such an action recognition system modeling the dorsal pathway, and being parallel to the object recognition model in that it is also biologically plausible and hierarchical. The Problem: Given an input video with some subject performing some action, for example, it can be one person walking, running, waving his hand, can also be a monitored mouse eating or exploring, we try to build a system, serving as a classifier, assigns the video to the correct categorization. In the training stage, the system should model the orientation, and direction tuning of V1 cells, and speed tuning of MT cells. We need to notice that MT cells get a huge amount of projection from V1 cells, and it also has receptive fields with size 4 to 10 times of V1 cells, so the three kinds of tunin

    Low Rank SVM

    Full text link
    Introduction: Many research fields begin with processing data which have hundreds or thousands of variables such as images, speech, text and gene. Curse of dimensionality occurs because hyper-volume grows exponentially with dimensionality, and this makes unsupervised learning of data difficult. There are several ways to deal with curse of dimensionality. One is to incorporate domain knowledge to build a representation of informative features. The second method is to use variable selection algorithm t

    Modeling Appearances with Low-Rank SVM Abstract

    Full text link
    Several authors have noticed that the common representation of images as vectors is sub-optimal. The process of vectorization eliminates spatial relations between some of the nearby image measurements and produces a vector of a dimension which is the product of the measurements ’ dimensions. It seems that images may be better represented when taking into account their structure as a 2D (or multi-D) array. Our work bears similarities to recent work such as 2DPCA or Coupled Subspace Analysis in that we treat images as 2D arrays. The main difference, however, is that unlike previous work which separated representation from the discriminative learning stage, we achieve both by the same method. Our framework, ”Low-Rank separators”, studies the use of a separating hyperplane which are constrained to have the structure of low-rank matrices. We first prove that the low-rank constraint provides preferable generalization properties. We then define two ”Low-rank SVM problems” and propose algorithms to solve these. Finally, we provide supporting experimental evidence for the framework. 1
    corecore