8,251 research outputs found

    Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval

    Full text link
    Humans use context and scene knowledge to easily localize moving objects in conditions of complex illumination changes, scene clutter and occlusions. In this paper, we present a method to leverage human knowledge in the form of annotated video libraries in a novel search and retrieval based setting to track objects in unseen video sequences. For every video sequence, a document that represents motion information is generated. Documents of the unseen video are queried against the library at multiple scales to find videos with similar motion characteristics. This provides us with coarse localization of objects in the unseen video. We further adapt these retrieved object locations to the new video using an efficient warping scheme. The proposed method is validated on in-the-wild video surveillance datasets where we outperform state-of-the-art appearance-based trackers. We also introduce a new challenging dataset with complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for Video Technolog

    Automatic Action Annotation in Weakly Labeled Videos

    Full text link
    Manual spatio-temporal annotation of human action in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of an actor in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized Maximum Clique Graph problem using shape, global and fine grained similarity of proposals across the videos. The output of our method is the most action representative proposals from each video. Our method can also annotate multiple instances of the same action in a video. We have validated our approach on three challenging action datasets: UCF Sport, sub-JHMDB and THUMOS'13 and have obtained promising results compared to several baseline methods. Moreover, on UCF Sports, we demonstrate that action classifiers trained on these automatically obtained spatio-temporal annotations have comparable performance to the classifiers trained on ground truth annotation

    STV-based Video Feature Processing for Action Recognition

    Get PDF
    In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end

    Interactive Perception for Cluttered Environments

    Get PDF
    Robotics research tends to focus upon either non-contact sensing or machine manipulation, but not both. This paper explores the benefits of combining the two by addressing the problem of extracting and classifying unknown objects within a cluttered environment, such as found in recycling and service robot applications. In the proposed approach, a pile of objects lies on a flat background, and the goal of the robot is to sift through the pile and classify each object so that it can be studied further. One object should be removed at a time with minimal disturbance to the other objects. We propose an algorithm, based upon graph-based segmentation and stereo matching, that automatically computes a desired grasp point that enables the objects to be removed one at a time. The algorithm then isolates each object to be classified by color, shape and flexibility. Experiments on a number of different objects demonstrate the ability of classifying each item through interaction and labeling them for further use and study

    Mathematics and Morphogenesis of the City: A Geometrical Approach

    Full text link
    Cities are living organisms. They are out of equilibrium, open systems that never stop developing and sometimes die. The local geography can be compared to a shell constraining its development. In brief, a city's current layout is a step in a running morphogenesis process. Thus cities display a huge diversity of shapes and none of traditional models from random graphs, complex networks theory or stochastic geometry takes into account geometrical, functional and dynamical aspects of a city in the same framework. We present here a global mathematical model dedicated to cities that permits describing, manipulating and explaining cities' overall shape and layout of their street systems. This street-based framework conciliates the topological and geometrical sides of the problem. From the static analysis of several French towns (topology of first and second order, anisotropy, streets scaling) we make the hypothesis that the development of a city follows a logic of division / extension of space. We propose a dynamical model that mimics this logic and which from simple general rules and a few parameters succeeds in generating a large diversity of cities and in reproducing the general features the static analysis has pointed out.Comment: 13 pages, 13 figure
    corecore