12 research outputs found

    Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories

    Get PDF
    Human action recognition (HAR) is at the core of human-computer interaction and video scene understanding. However, achieving effective HAR in an unconstrained environment is still a challenging task. To that end, trajectory-based video representations are currently widely used. Despite the promising levels of effectiveness achieved by these approaches, problems regarding computational complexity and the presence of redundant trajectories still need to be addressed in a satisfactory way. In this paper, we propose a method for trajectory rejection, reducing the number of redundant trajectories without degrading the effectiveness of HAR. Furthermore, to realize efficient optical flow estimation prior to trajectory extraction, we integrate a method for dynamic frame skipping. Experiments with four publicly available human action datasets show that the proposed approach outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity

    Behavior Monitoring Using Visual Data and Immersive Environments

    Get PDF
    University of Minnesota Ph.D. dissertation.August 2017. Major: Computer Science. Advisor: Nikolaos Papanikolopoulos. 1 computer file (PDF); viii, 99 pages.Mental health disorders are the leading cause of disability in the United States and Canada, accounting for 25 percent of all years of life lost to disability and premature mortality (Disability Adjusted Life Years or DALYs). Furthermore, in the United States alone, spending for mental disorder related care amounted to approximately $201 billion in 2013. Given these costs, significant effort has been spent on researching ways to mitigate the detrimental effects of mental illness. Commonly, observational studies are employed in research on mental disorders. However, observers must watch activities, either live or recorded, and then code the behavior. This process is often long and requires significant effort. Automating these kinds of labor intensive processes can allow these studies to be performed more effectively. This thesis presents efforts to use computer vision and modern interactive technologies to aid in the study of mental disorders. Motor stereotypies are a class of behavior known to co-occur in some patients diagnosed with autism spectrum disorders. Results are presented for activity classification in these behaviors. Behaviors in the context of environment, setup and task were also explored in relation to obsessive compulsive disorder (OCD). Cleaning compulsions are a known symptom of some persons with OCD. Techniques were created to automate coding of handwashing behavior as part of an OCD study to understand the difference between subjects of different diagnosis. Instrumenting the experiment and coding the videos was a limiting factor in this study. Varied and repeatable environments can be enabled through the use of virtual reality. An end-to-end platform was created to investigate this approach. This system allows the creation of immersive environments that are capable of eliciting symptoms. By controlling the stimulus presented and observing the reaction in a simulated system, new ways of assessment are developed. Evaluation was performed to measure the ability to monitor subject behavior and a protocol was established for the system's future use

    Interest Detection in Image, Video and Multiple Videos: Model and Applications

    Get PDF
    Interest detection is detecting an object, event, or process that draws attention. In this dissertation, we focus on interest detection in images, video and multiple videos. Interest detection in an image or a video is closely related to visual attention. However, the interest detection in multiple videos needs to consider all the videos as a whole rather than considering the attention in each single video independently. Visual attention is an important mechanism of human vision. The computational model of visual attention has recently attracted a lot of interest in the computer vision community mainly because it helps find the objects or regions that efficiently represent a scene and thus aids in solving complex vision problems such as scene understanding. In this dissertation, we first introduce a new computational visual-attention model for detecting region of interest in static images and/or videos. This model constructs the saliency map for each image and takes the region with the highest saliency value as the region of interest. Specifically, we use the Earth Mover’s Distance (EMD) to measure the center-surround difference in the receptive field. Furthermore, we propose to take two steps of biologically-inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the Lm-norm and then combining the super features using the Winner-Take- All mechanism. Then, we extend the proposed model to construct dynamic saliency maps from videos by computing the center-surround difference in the spatio-temporal receptive field. Motivated by the natural relation between visual saliency and object/region of interest, we then propose an algorithm to isolate infrequently moving foreground from background with frequent local motions, in which the saliency detection technique is used to identify the foreground (object/region of interest) and background. Traditional motion detection usually assumes that the background is static while the foreground objects are moving most of the time. However, in practice, especially in surveillance, the foreground objects may show infrequent motion. For example, a person may stand in the same place for most of the time. Meanwhile, the background may contain frequent local motions, such as trees and/or grass waving in the breeze. Such complexities may prevent the existing background subtraction algorithms from correctly identifying the foreground objects. In this dissertation, we propose a background subtraction approach that can detect the foreground objects with frequent and/or infrequent motions. Finally, we focus on the task of locating the co-interest person from multiple temporally synchronized videos taken by the multiple wearable cameras. More specifically, we propose a co-interest detection algorithm that can find persons that draw attention from most camera wearers, even if multiple similar-appearance persons are present in the videos. Our basic idea is to exploit the motion pattern, location, and size of persons detected in different synchronized videos and use them to correlate the detected persons across different videos – one person in a video may be the same person in another video at the same time. We utilized a Conditional Random Field (CRF) to achieve this goal, by taking each frame as a node and the detected persons as the states at each node. We collect three sets of wearable-camera videos for testing the proposed algorithm where each set consists of six temporally synchronized videos
    corecore