Search CORE

19,508 research outputs found

Recommended from our members

A Visual Tracking Study and A Proposal of Modifications

Author: Tseng Yu Hua Nicole
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

On-line visual tracking of a specified target in motion throughout frames of video clips faces challenges in robust identification of the target in the current frame based on the past frames. Three approaches for tracking the target image patch are described and compared. These approaches utilize particle filtering and principal component analysis (PCA) to identify the most likely location of the target in the current frame and a low dimensional subspace representation of the patches of images to be kept as the templates in the dictionary for the identification. By using a combination of methods and compare the result of each, a new model based is proposed. The goal is to achieve a more robust and accurate tracking of a target throughout the video and continue updating the identification templates to adapt the target changes, such as apparences in lighting, angle, scale and occlusions. The challenges in tracking are to introduction of the "right" templates into the identification templates in the dictionary and identify the most accurate particle image patch while tracking the target with the right tracking patch scaling. The first approach considered and on which the structure of the visual tracker is based is the "Incremental Learning for Robust Visual Tracking" by D. Ross et al., which is a computationally fast tracker that utilizes a method of low dimensional subspace for the identification template dictionary and incremental PCA for its tracking. The tracker has a simple rule in accepting the patches of images to be in the identification template dictionary after the image patch has gone through a singular value decomposition (SVD), where it eliminates singular values are smaller than

10^{-6}

of the sum of squared sinuglar values and the corresponding bases are also eliminated. This elimination scheme has very limited robustness in tracking, therefore, more selective processes in accepting identification templates in the dictionary are explored and introduced on top of the existing method in comparison and to address the challenges in on-line video tracking. The second approach is the "Least Soft-Threshold Squares Tracking" proposed by D. Wang et al. solves the least soft-threshold squares distance problem to identify the distances of the particles to the templates in the dictionary, which greatly improves the tracking accuracy. This method is also computationally cheap in comparison to the first approach, and its accuracy is also better than the first approach, but it would sometimes fail to track in some applications. Finally, the third approach reviewed is the "Robust Visual Tracking and Vehicle Classification via Sparse Representation" by X. Mei et al. is to weight each particles when selecting the most likely target patch so the best patch has a highest weighted probability which ensures it being selected and introduced to the template dictionary. This approach performs well in comparison to the first and the second approaches in tracking accuracy and robustness, but this approach is extremely computationally expensive. Three new components are proposed in an effort to mitigate some of the limitations that the three approaches exhibit. One such component is to simply reject the image patches that exhibit too great of difference to the current template dictionary, which resulted in improved tracking robustness. This method is computationally cheap and easy to implement. Another component introduced is a second set of dictionary that is composed of admitted image patches, which is used for tracking when the image patches appears to be too dissimilar to the dictionary with low dimensional representation. It is expected that with more well defined and stronger features, it forces the tracking to identify the target. Finally, the third component introduced is the to prevent shrinkage of the target boundary box by weighting the particles drawn with the ratio of area change so that more weight is placed on particles with less arial change. This increases the likelihood of recovering the target again if tracking loses the target, and instead of shrinking the boundary box, the tracking is biased to staying with the image patch of the same size. The resulting performance of the proposed tracking scheme has not been noticeably improved, part of the reason is because the metrics available to identify a noisy image patch from the good image patches are not always indicative of the noisy-good image patch divide

eScholarship - University of California

Understanding and Diagnosing Visual Tracking Systems

Author: Jia Jiaya
Shi Jianping
Wang Naiyan
Yeung Dit-Yan
Publication venue
Publication date: 23/04/2015
Field of study

Several benchmark datasets for visual tracking research have been proposed in recent years. Despite their usefulness, whether they are sufficient for understanding and diagnosing the strengths and weaknesses of different trackers remains questionable. To address this issue, we propose a framework by breaking a tracker down into five constituent parts, namely, motion model, feature extractor, observation model, model updater, and ensemble post-processor. We then conduct ablative experiments on each component to study how it affects the overall result. Surprisingly, our findings are discrepant with some common beliefs in the visual tracking research community. We find that the feature extractor plays the most important role in a tracker. On the other hand, although the observation model is the focus of many studies, we find that it often brings no significant improvement. Moreover, the motion model and model updater contain many details that could affect the result. Also, the ensemble post-processor can improve the result substantially when the constituent trackers have high diversity. Based on our findings, we put together some very elementary building blocks to give a basic tracker which is competitive in performance to the state-of-the-art trackers. We believe our framework can provide a solid baseline when conducting controlled experiments for visual tracking research

arXiv.org e-Print Archive

Crossref

Discriminative Scale Space Tracking

Author: Danelljan Martin
Felsberg Michael
Häger Gustav
Khan Fahad Shahbaz
Publication venue
Publication date: 20/09/2016
Field of study

Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5% in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50% higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.Comment: To appear in TPAMI. This is the journal extension of the VOT2014-winning DSST tracking metho

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Realtime State Estimation with Tactile and Visual sensing. Application to Planar Manipulation

Author: Rodriguez Alberto
Yu Kuan-Ting
Publication venue
Publication date: 15/05/2018
Field of study

Accurate and robust object state estimation enables successful object manipulation. Visual sensing is widely used to estimate object poses. However, in a cluttered scene or in a tight workspace, the robot's end-effector often occludes the object from the visual sensor. The robot then loses visual feedback and must fall back on open-loop execution. In this paper, we integrate both tactile and visual input using a framework for solving the SLAM problem, incremental smoothing and mapping (iSAM), to provide a fast and flexible solution. Visual sensing provides global pose information but is noisy in general, whereas contact sensing is local, but its measurements are more accurate relative to the end-effector. By combining them, we aim to exploit their advantages and overcome their limitations. We explore the technique in the context of a pusher-slider system. We adapt iSAM's measurement cost and motion cost to the pushing scenario, and use an instrumented setup to evaluate the estimation quality with different object shapes, on different surface materials, and under different contact modes

arXiv.org e-Print Archive

Crossref