13,203 research outputs found

    Research on Object Tracking Technology for Orderless and Blurred Movement under Complex Scenes

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Visual tracking is widely found in anomaly behaviour detection, self-driving, virtual reality. Recent researches reported that classic methods, including the Tracking-Learning-Detection method, the Particle Filter and the mean shift, were surpassed by deep learning in accuracy and correlation filtering in speed. However, correlation filtering can be affected by boundary effects. The conventional correlation filtering fixes the size of its detection window. When its detection window only captures partial target images due to large and sudden scale variations, the correlation filtering fails to locate the tracked target. When the target is undergoing violent shaking, motion blurs and orderless movements appear along with it. The conventional correlation filtering locks itself in the previous position of the target, and hence, the target is out of the sight of the correlation filtering. In this case, the correlation filtering drifts or fails to track. Therefore, this thesis topic is to track single-objects under complex scenes with attributes of motion blurs, orderless motions and scale variations. The main research innovation is listed as follows. (1) An approach for searching orderless movements is designed in a generative-discriminative tracking model. To address the uncertain orderless movements, a coarse-to-fine tracking framework is adopted. A spatio-temporal correlation is learned for the detection in the subsequent frames. Experiments are conducted on public databases with orderless motion attributes to validate the robustness of the proposed approach. (2) A template matching method is proposed for tracking objects with motion blurs. An effective target motion model is designed to provide supplementary appearance features. A robust similarity measure is proposed to address the outliers caused by motion blurs. Our approach outperforms other approaches in a public benchmark database with motion blurs. (3) An ensemble framework is designed to tackle scale variations. The scale of a target is estimated based on the Gaussian Particle Filtering. A high-confidence strategy is used to validate the reliability of tracking results. Our approach with hand-crafted or CNN features outperforms the methods based on correlation filtering and deep learning in databases with scale variations. To sum up, this thesis addresses boundary effects, model drifts, fixed search windows and easily interfered hand-crafted features of objects. Different trackers are proposed for tracking single-objects with orderless movements, motion blurs and scale variations. As future work, our methods can be extended to using a neural network to further improve single-object tracking models

    Unsupervised Object Discovery and Tracking in Video Collections

    Get PDF
    This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary processes: discovery and tracking. The first one establishes correspondences between prominent regions across videos, and the second one associates successive similar object regions within the same video. Interestingly, our algorithm also discovers the implicit topology of frames associated with instances of the same object class across different videos, a role normally left to supervisory information in the form of class labels in conventional image and video understanding methods. Indeed, as demonstrated by our experiments, our method can handle video collections featuring multiple object classes, and substantially outperforms the state of the art in colocalization, even though it tackles a broader problem with much less supervision

    Learning to track for spatio-temporal action localization

    Get PDF
    We propose an effective approach for spatio-temporal action localization in realistic videos. The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features. It then tracks high-scoring proposals throughout the video using a tracking-by-detection approach. Our tracker relies simultaneously on instance-level and class-level detectors. The tracks are scored using a spatio-temporal motion histogram, a descriptor at the track level, in combination with the CNN features. Finally, we perform temporal localization of the action using a sliding-window approach at the track level. We present experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB and UCF-101 action localization datasets, where our approach outperforms the state of the art with a margin of 15%, 7% and 12% respectively in mAP

    Click Carving: Segmenting Objects in Video with Point Clicks

    Full text link
    We present a novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest. Whereas conventional interactive pipelines take the user's initialization as a starting point, we show the value in the system taking the lead even in initialization. In particular, for a given video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using image and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2-3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. Finally, the mask is propagated across the video to produce a spatio-temporal object tube. On three challenging datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent balance of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2 to 12 times the effort.Comment: A preliminary version of the material in this document was filed as University of Texas technical report no. UT AI16-0

    DART: Distribution Aware Retinal Transform for Event-based Cameras

    Full text link
    We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201

    Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories

    Get PDF
    Human action recognition (HAR) is at the core of human-computer interaction and video scene understanding. However, achieving effective HAR in an unconstrained environment is still a challenging task. To that end, trajectory-based video representations are currently widely used. Despite the promising levels of effectiveness achieved by these approaches, problems regarding computational complexity and the presence of redundant trajectories still need to be addressed in a satisfactory way. In this paper, we propose a method for trajectory rejection, reducing the number of redundant trajectories without degrading the effectiveness of HAR. Furthermore, to realize efficient optical flow estimation prior to trajectory extraction, we integrate a method for dynamic frame skipping. Experiments with four publicly available human action datasets show that the proposed approach outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity
    corecore