9 research outputs found

    Unsupervised Human Action Detection by Action Matching

    Full text link
    We propose a new task of unsupervised action detection by action matching. Given two long videos, the objective is to temporally detect all pairs of matching video segments. A pair of video segments are matched if they share the same human action. The task is category independent---it does not matter what action is being performed---and no supervision is used to discover such video segments. Unsupervised action detection by action matching allows us to align videos in a meaningful manner. As such, it can be used to discover new action categories or as an action proposal technique within, say, an action detection pipeline. Moreover, it is a useful pre-processing step for generating video highlights, e.g., from sports videos. We present an effective and efficient method for unsupervised action detection. We use an unsupervised temporal encoding method and exploit the temporal consistency in human actions to obtain candidate action segments. We evaluate our method on this challenging task using three activity recognition benchmarks, namely, the MPII Cooking activities dataset, the THUMOS15 action detection benchmark and a new dataset called the IKEA dataset. On the MPII Cooking dataset we detect action segments with a precision of 21.6% and recall of 11.7% over 946 long video pairs and over 5000 ground truth action segments. Similarly, on THUMOS dataset we obtain 18.4% precision and 25.1% recall over 5094 ground truth action segment pairs.Comment: IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017 Workshop

    Motion denoising with application to time-lapse photography

    Get PDF
    Motions can occur over both short and long time scales. We introduce motion denoising, which treats short-term changes as noise, long-term changes as signal, and re-renders a video to reveal the underlying long-term events. We demonstrate motion denoising for time-lapse videos. One of the characteristics of traditional time-lapse imagery is stylized jerkiness, where short-term changes in the scene appear as small and annoying jitters in the video, often obfuscating the underlying temporal events of interest. We apply motion denoising for resynthesizing time-lapse videos showing the long-term evolution of a scene with jerky short-term changes removed. We show that existing filtering approaches are often incapable of achieving this task, and present a novel computational approach to denoise motion without explicit motion analysis. We demonstrate promising experimental results on a set of challenging time-lapse sequences.United States. National Geospatial-Intelligence Agency (NEGI-1582-04-0004)Shell ResearchUnited States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-06-1-0734)National Science Foundation (U.S.) (0964004

    Slice Matching for Accurate Spatio-Temporal Alignment

    Get PDF
    International audienceVideo synchronization and alignment is a rather recent topic in computer vision. It usually deals with the problem of aligning sequences recorded simultaneously by static, jointly- or independently-moving cameras. In this paper, we investigate the more difficult problem of matching videos captured at different times from independently-moving cameras, whose trajectories are approximately co-incident or parallel. To this end, we propose a novel method that pixel-wise aligns videos and allows thus to automatically highlight their differences. This primarily aims at visual surveillance but the method can be adopted as is by other related video applications, like object transfer (augmented reality) or high dynamic range video. We build upon a slice matching scheme to first synchronize the sequences, while we develop a spatio-temporal alignment scheme to spatially register corresponding frames and re- fine the temporal mapping. We investigate the performance of the proposed method on videos recorded from vehicles driven along different types of roads and compare with related previous works

    Synchronizing Video Cameras with Non-overlapping Fields of View

    Full text link

    Aligning sequences and actions by maximizing space-time correlations

    No full text
    Abstract. We introduced an algorithm for sequence alignment, based on maximizing local space-time correlations. Our algorithm aligns sequences of the same action performed at different times and places by different people, possibly at different speeds, and wearing different clothes. Moreover, the algorithm offers a unified approach to the problem of sequence alignment for a wide range of scenarios (e.g., sequence pairs taken with stationary or jointly moving cameras, with the same or different photometric properties, with or without moving objects). Our algorithm is applied directly to the dense space-time intensity information of the two sequences (or to filtered versions of them). This is done without prior segmentation of foreground moving objects, and without prior detection of corresponding features across the sequences. Examples of challenging sequences with complex actions are shown, including ballet dancing, actions in the presence of other complex scene dynamics (clutter), as well as multi-sensor sequence pairs.

    A Non-Intrusive Multi-Sensor RGB-D System for Preschool Classroom Behavior Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2017. Major: Computer Science. Advisor: Nikolaos Papanikolopoulos. 1 computer file (PDF); vii, 121 pages + 2 mp4 video filesMental health disorders are a leading cause of disability in North America and can represent a significant source of financial burden. Early intervention is a key aspect in treating mental disorders as it can dramatically increase the probability of a positive outcome. One key factor to early intervention is the knowledge of risk-markers -- genetic, neural, behavioral and/or social deviations -- that indicate the development of a particular mental disorder. Once these risk-markers are known, it is important to have tools for reliable identification of these risk-markers. For visually observable risk-markers, discovery and screening ideally should occur in a natural environment. However, this often incurs a high cost. Current advances in technology allow for the development of assistive systems that could aid in the detection and screening of visually observable risk-markers in every-day environments, like a preschool classroom. This dissertation covers the development of such a system. The system consists of a series of networked sensors that are able to collect data from a wide baseline. These sensors generate color images and depth maps that can be used to create a 3D point cloud reconstruction of the classroom. The wide baseline nature of the setup helps to minimize the effects of occlusion, since data is captured from multiple distinct perspectives. These point clouds are used to detect occupants in the room and track them throughout their activities. This tracking information is then used to analyze classroom and individual behaviors, enabling the screening for specific risk-markers and also the ability to create a corpus of data that could be used to discover new risk-markers. This system has been installed at the Shirley G. Moore Lab school, a research preschool classroom in the Institute of Child Development at the University of Minnesota. Recordings have been taken and analyzed from actual classes. No instruction or pre-conditioning was given to the instructors or the children in these classes. Portions of this data have also been manually annotated to create groundtruth data that was used to validate the efficacy of the proposed system
    corecore