18 research outputs found
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
An Improved CAMSHIFT Tracking Algorithm Applying on Surveillance Videos
[[abstract]]In this paper, we present an improved version of CAMSHIFT algorithm applying on surveillance videos. A 2D, hue and brightness, histogram is used to describe the color feature of the target. In this way, videos with poor quality or achromatic points can be characterized better. The flooding process and contribution evaluation are executed to obtain a precise target histogram which reflects true color information and enhances discrimination ability. The proposed method is compared with existing methods and shows steady and satisfactory results.[[sponsorship]]Information Engineering Research Institute[[conferencedate]]20130303~20130304[[iscallforpapers]]Y[[conferencelocation]]Phuket, Thailan
Deep Siamese Networks toward Robust Visual Tracking
Recently, Siamese neural networks have been widely used in visual object tracking to leverage the template matching mechanism. Siamese network architecture contains two parallel streams to estimate the similarity between two inputs and has the ability to learn their discriminative features. Various deep Siamese-based tracking frameworks have been proposed to estimate the similarity between the target and the search region. In this chapter, we categorize deep Siamese networks into three categories by the position of the merging layers as late merge, intermediate merge and early merge architectures. In the late merge architecture, inputs are processed as two separate streams and merged at the end of the network, while in the intermediate merge architecture, inputs are initially processed separately and merged intermediate well before the final layer. Whereas in the early merge architecture, inputs are combined at the start of the network and a unified data stream is processed by a single convolutional neural network. We evaluate the performance of deep Siamese trackers based on the merge architectures and their output such as similarity score, response map, and bounding box in various tracking challenges. This chapter will give an overview of the recent development in deep Siamese trackers and provide insights for the new developments in the tracking field
Towards an automated photogrammetry-based approach for monitoring and controlling construction site activities
© 2018 Elsevier B.V. The construction industry has a poor productivity record, which was predominantly ascribed to inadequate monitoring of how a project is progressing at any given time. Most available approaches do not offer key stakeholders a shared understanding of project performance in real-time, which as a result fail to identify any project slippage on the original schedule. This paper reports on the development of a novel automatic system for monitoring, updating and controlling construction site activities in real-time. The proposed system seeks to harness advances in close-range photogrammetry to deliver an original approach that is capable of continuous monitoring of construction activities, with progress status determined, at any given time, throughout the construction lifecycle. The proposed approach has the potential to identify any deviation of as planned construction schedules, so prompt action can be taken because of an automatic notification system, which informs decision-makers via emails and SMS. This system was rigorously tested in a real-life case study of an in-progress construction site. The findings revealed that the proposed system achieved a significant high level of accuracy and automation, and was relatively cheap and easier to operate
2D recurrent neural networks for robust visual tracking of non-rigid bodies
© Springer International Publishing Switzerland 2016. The efficient tracking of articulated bodies over time is an essential element of pattern recognition and dynamic scenes analysis. This paper proposes a novel method for robust visual tracking, based on the combination of image-based prediction and weighted correlation. Starting from an initial guess, neural computation is applied to predict the position of the target in each video frame. Normalized cross-correlation is then applied to refine the predicted target position. Image-based prediction relies on a novel architecture, derived from the Elman’s Recurrent Neural Networks and adopting nearest neighborhood connections between the input and context layers in order to store the temporal information content of the video. The proposed architecture, named 2D Recurrent Neural Network, ensures both a limited complexity and a very fast learning stage. At the same time, it guarantees fast execution times and excellent accuracy for the considered tracking task. The effectiveness of the proposed approach is demonstrated on a very challenging set of dynamic image sequences, extracted from the final of triple jump at the London 2012 Summer Olympics. The system shows remarkable performance in all considered cases, characterized by changing background and a large variety of articulated motions