1,230 research outputs found

    Analyzing Structured Scenarios by Tracking People and Their Limbs

    Get PDF
    The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen's Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment

    Recognizing Objects And Reasoning About Their Interactions

    Get PDF
    The task of scene understanding involves recognizing the different objects present in the scene, segmenting the scene into meaningful regions, as well as obtaining a holistic understanding of the activities taking place in the scene. Each of these problems has received considerable interest within the computer vision community. We present contributions to two aspects of visual scene understanding. First we explore multiple methods of feature selection for the problem of object detection. We demonstrate the use of Principal Component Analysis to detect avifauna in field observation videos. We improve on existing approaches by making robust decisions based on regional features and by a feature selection strategy that chooses different features in different parts of the image. We then demonstrate the use of Partial Least Squares to detect vehicles in aerial and satellite imagery. We propose two new feature sets; Color Probability Maps are used to capture the color statistics of vehicles and their surroundings, and Pairs of Pixels are used to capture captures the structural characteristics of objects. A powerful feature selection analysis based on Partial Least Squares is employed to deal with the resulting high dimensional feature space (almost 70,000 dimensions). We also propose an Incremental Multiple Kernel Learning (IMKL) scheme to detect vehicles in a traffic surveillance scenario. Obtaining task and scene specific datasets of visual categories is far more tedious than obtaining a generic dataset of the same classes. Our IMKL approach initializes on a generic training database and then tunes itself to the classification task at hand. Second, we develop a video understanding system for scene elements, such as bus stops, crosswalks, and intersections, that are characterized more by qualitative activities and geometry than by intrinsic appearance. The domain models for scene elements are not learned from a corpus of video, but instead, naturally elicited by humans, and represented as probabilistic logic rules within a Markov Logic Network framework. Human elicited models, however, represent object interactions as they occur in the 3D world rather than describing their appearance projection in some specific 2D image plane. We bridge this gap by recovering qualitative scene geometry to analyze object interactions in the 3D world and then reasoning about scene geometry, occlusions and common sense domain knowledge using a set of meta-rules

    Hand tracking and bimanual movement understanding

    Get PDF
    Bimanual movements are a subset ot human movements in which the two hands move together in order to do a task or imply a meaning A bimanual movement appearing in a sequence of images must be understood in order to enable computers to interact with humans in a natural way This problem includes two main phases, hand tracking and movement recognition. We approach the problem of hand tracking from a neuroscience point ot view First the hands are extracted and labelled by colour detection and blob analysis algorithms In the presence of the two hands one hand may occlude the other occasionally Therefore, hand occlusions must be detected in an image sequence A dynamic model is proposed to model the movement of each hand separately Using this model in a Kalman filtering proccss the exact starting and end points of hand occlusions are detected We exploit neuroscience phenomena to understand the beha\ tour of the hands during occlusion periods Based on this, we propose a general hand tracking algorithm to track and reacquire the hands over a movement including hand occlusion The advantages of the algorithm and its generality are demonstrated in the experiments. In order to recognise the movements first we recognise the movement of a hand Using statistical pattern recognition methods (such as Principal Component Analysis and Nearest Neighbour) the static shape of each hand appearing in an image is recognised A Graph- Matching algorithm and Discrete Midden Markov Models (DHMM) as two spatio-temporal pattern recognition techniques are imestigated tor recognising a dynamic hand gesture For recognising bimanual movements we consider two general forms ot these movements, single and concatenated periodic We introduce three Bayesian networks for recognising die movements The networks are designed to recognise and combinc the gestures of the hands in order to understand the whole movement Experiments on different types ot movement demonstrate the advantages and disadvantages of each network

    Recognizing complex faces and gaits via novel probabilistic models

    Get PDF
    In the field of computer vision, developing automated systems to recognize people under unconstrained scenarios is a partially solved problem. In unconstrained sce- narios a number of common variations and complexities such as occlusion, illumi- nation, cluttered background and so on impose vast uncertainty to the recognition process. Among the various biometrics that have been emerging recently, this dissertation focus on two of them namely face and gait recognition. Firstly we address the problem of recognizing faces with major occlusions amidst other variations such as pose, scale, expression and illumination using a novel PRObabilistic Component based Interpretation Model (PROCIM) inspired by key psychophysical principles that are closely related to reasoning under uncertainty. The model basically employs Bayesian Networks to establish, learn, interpret and exploit intrinsic similarity mappings from the face domain. Then, by incorporating e cient inference strategies, robust decisions are made for successfully recognizing faces under uncertainty. PROCIM reports improved recognition rates over recent approaches. Secondly we address the newly upcoming gait recognition problem and show that PROCIM can be easily adapted to the gait domain as well. We scienti cally de ne and formulate sub-gaits and propose a novel modular training scheme to e ciently learn subtle sub-gait characteristics from the gait domain. Our results show that the proposed model is robust to several uncertainties and yields sig- ni cant recognition performance. Apart from PROCIM, nally we show how a simple component based gait reasoning can be coherently modeled using the re- cently prominent Markov Logic Networks (MLNs) by intuitively fusing imaging, logic and graphs. We have discovered that face and gait domains exhibit interesting similarity map- pings between object entities and their components. We have proposed intuitive probabilistic methods to model these mappings to perform recognition under vari- ous uncertainty elements. Extensive experimental validations justi es the robust- ness of the proposed methods over the state-of-the-art techniques.

    Intelligent evacuation management systems: A review

    Get PDF
    Crowd and evacuation management have been active areas of research and study in the recent past. Various developments continue to take place in the process of efficient evacuation of crowds in mass gatherings. This article is intended to provide a review of intelligent evacuation management systems covering the aspects of crowd monitoring, crowd disaster prediction, evacuation modelling, and evacuation path guidelines. Soft computing approaches play a vital role in the design and deployment of intelligent evacuation applications pertaining to crowd control management. While the review deals with video and nonvideo based aspects of crowd monitoring and crowd disaster prediction, evacuation techniques are reviewed via the theme of soft computing, along with a brief review on the evacuation navigation path. We believe that this review will assist researchers in developing reliable automated evacuation systems that will help in ensuring the safety of the evacuees especially during emergency evacuation scenarios

    Enhanced tracking and recognition of moving objects by reasoning about spatio-temporal continuity.

    Get PDF
    A framework for the logical and statistical analysis and annotation of dynamic scenes containing occlusion and other uncertainties is presented. This framework consists of three elements; an object tracker module, an object recognition/classification module and a logical consistency, ambiguity and error reasoning engine. The principle behind the object tracker and object recognition modules is to reduce error by increasing ambiguity (by merging objects in close proximity and presenting multiple hypotheses). The reasoning engine deals with error, ambiguity and occlusion in a unified framework to produce a hypothesis that satisfies fundamental constraints on the spatio-temporal continuity of objects. Our algorithm finds a globally consistent model of an extended video sequence that is maximally supported by a voting function based on the output of a statistical classifier. The system results in an annotation that is significantly more accurate than what would be obtained by frame-by-frame evaluation of the classifier output. The framework has been implemented and applied successfully to the analysis of team sports with a single camera. Key words: Visua
    corecore