280 research outputs found

    Automatic Action Annotation in Weakly Labeled Videos

    Full text link
    Manual spatio-temporal annotation of human action in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of an actor in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized Maximum Clique Graph problem using shape, global and fine grained similarity of proposals across the videos. The output of our method is the most action representative proposals from each video. Our method can also annotate multiple instances of the same action in a video. We have validated our approach on three challenging action datasets: UCF Sport, sub-JHMDB and THUMOS'13 and have obtained promising results compared to several baseline methods. Moreover, on UCF Sports, we demonstrate that action classifiers trained on these automatically obtained spatio-temporal annotations have comparable performance to the classifiers trained on ground truth annotation

    A graphical model based solution to the facial feature point tracking problem

    Get PDF
    In this paper a facial feature point tracker that is motivated by applications such as human-computer interfaces and facial expression analysis systems is proposed. The proposed tracker is based on a graphical model framework. The facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated on real video data under various conditions including occluded facial gestures and head movements. It is also compared to two popular methods, one based on Kalman filtering exploiting temporal relations, and the other based on active appearance models (AAM). Improvements provided by the proposed approach are demonstrated through both visual displays and quantitative analysis

    Motion segmentation using an occlusion detector

    Get PDF
    We present a novel method for the detection of motion boundaries in a video sequence based on differential properties of the spatio-temporal domain. Regarding the video sequence as a 3D spatio-temporal function, we consider the second moment matrix of its gradients (averaged over a local window), and show that the eigenvalues of this matrix can be used to detect occlusions and motion discontinuities. Since these cannot always be determined locally (due to false corners and the aperture problem), a scale-space approach is used for extracting the location of motion boundaries. A closed contour is then constructed from the most salient boundary fragments, to provide the final segmentation. The method is shown to give good results on pairs of real images taken in general motion. We use synthetic data to show its robustness to high levels of noise and illumination changes; we also include cases where no intensity edge exists at the location of the motion boundary, or when no parametric motion model can describe the data.

    Contactless polygraph

    Get PDF
    The constant transformation of our surroundings often goes unnoticed, but it occurs nonetheless. Imperceptibly, our skin undergoes slight color changes as our hearts beat, and our heads subtly move with each breath. While these alterations may escape our eyes, they are captured by cameras. We can magnify the variation in those small signals to extract important information such as heartbeat rate, which in turn can be used in many real-life applications such as polygraph tests. The polygraph, or lie detector test, is a tool that measures physiological changes in an individual to detect deception. This thesis aimed to explore the feasibility of using video information to extract heartbeat rates, replacing the traditional sensors that are commonly used to extract this information. Through a series of experiments involving videos of people standing still, and a video of a real-life suspect that was lying, it was consistently possible to enhance the color changes that occur in a person's skin when their heartbeats. Though it was not possible to develop a fully-functioning lie detector test, this thesis proves that the motion and color magnification technique is reliable even when applied to videos of individuals engaged in speech and movement. It also proved that it is an interesting solution that has the potential of revolutionizing how interviews and interrogations are conducted
    corecore