69 research outputs found
Exploring Methods to Enhance Appearance-Based Video Object Tracking using Dynamics Theory
To be defined upon arrival.The task of Video Object Tracking has for a long time received attention within the field of Computer Vision, and many different approaches have tried to tackle its challenges, being the ones based on appearance and motion some of the most popular ones. The main focus of this thesis is to fuse both strategies in order to exploit their strengths and overcome each other's flaws. To achieve this goal, we propose a unified framework that combines, in an online manner, an off-the-shelf single-object siamese tracker, which is modified to perform multi-object tracking and to provide more than one detection candidate, with a novel motion module. This module detects when the proposed target position is not dynamically consistent and, if that is the case, predicts an alternative which is used to choose the best among the rest of candidates. Our approach is evaluated on the challenging Similar Multi-Object Tracking (SMOT) dataset and achieves a relevant precision improvement of the 5% with respect to the baseline. We present an extension to the SMOT dataset, the eSMOT, including more sequences with complex dynamic scenarios, where the performance of our model is excellent, therefore we use its predictions to label the Ground Truth. Although there is still room for enhancement mainly regarding the efficiency of the approach, this work has served as a relevant proof of concept for the intuitions behind it and consequently, research in this direction will surely continue
Survey on Vision-based Path Prediction
Path prediction is a fundamental task for estimating how pedestrians or
vehicles are going to move in a scene. Because path prediction as a task of
computer vision uses video as input, various information used for prediction,
such as the environment surrounding the target and the internal state of the
target, need to be estimated from the video in addition to predicting paths.
Many prediction approaches that include understanding the environment and the
internal state have been proposed. In this survey, we systematically summarize
methods of path prediction that take video as input and and extract features
from the video. Moreover, we introduce datasets used to evaluate path
prediction methods quantitatively.Comment: DAPI 201
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Recommended from our members
An analysis of training methodologies for deep visual trackers
This thesis considers the problem of training convolutional neural networks for online visual tracking. A major challenge for single object visual tracking is that most training sets with frame-level track annotations are quite small, due to the prohibitive cost of manual annotation. Current training approaches either supplement the annotations with other data sources (e.g., object-detection training data) or generate noisy variants of the track annotations. In either case, the data generation and training methods have ignored the fact that tracking involves sequences of decisions (one per frame) that are dependent on one another. Thus, the objectives optimized by these learning algorithms are not directly tied to the end goal of tracking performance. To further study this issue, we consider the state-of-the-art imitation learning algorithm, DAGGER, for training an online tracker. We observe that the DAGGER faces difficulty when applied to tracking, because online trackers typically experience unrecoverable failures, especially early in training. To rectify this issue we introduce, analyze, and evaluate a variation of DAGGER, called DAGGER with Resets (DAGGER), a novel imitation learning framework which maintains the theoretical properties of DAGGER and is more appropriate for training deep trackers. Our main contribution is to compare different training methods, including DAGGER and DAGGER, across a variety of datasets and multiple trackers. Our experimental results show this principled training approach and methodical random augmentation is able to outperform existing training approaches across multiple visual tracking datasets.Keywords: computer science, machine learning, visual tracking, deep learning, imitation learning, computer visio
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Deep learning has the potential to revolutionize sports performance, with
applications ranging from perception and comprehension to decision. This paper
presents a comprehensive survey of deep learning in sports performance,
focusing on three main aspects: algorithms, datasets and virtual environments,
and challenges. Firstly, we discuss the hierarchical structure of deep learning
algorithms in sports performance which includes perception, comprehension and
decision while comparing their strengths and weaknesses. Secondly, we list
widely used existing datasets in sports and highlight their characteristics and
limitations. Finally, we summarize current challenges and point out future
trends of deep learning in sports. Our survey provides valuable reference
material for researchers interested in deep learning in sports applications
Homography Estimation in Complex Topological Scenes
Surveillance videos and images are used for a broad set of applications,
ranging from traffic analysis to crime detection. Extrinsic camera calibration
data is important for most analysis applications. However, security cameras are
susceptible to environmental conditions and small camera movements, resulting
in a need for an automated re-calibration method that can account for these
varying conditions. In this paper, we present an automated camera-calibration
process leveraging a dictionary-based approach that does not require prior
knowledge on any camera settings. The method consists of a custom
implementation of a Spatial Transformer Network (STN) and a novel topological
loss function. Experiments reveal that the proposed method improves the IoU
metric by up to 12% w.r.t. a state-of-the-art model across five synthetic
datasets and the World Cup 2014 dataset.Comment: Will be published in Intelligent Vehicle Symposium 202
A Robust Structured Tracker Using Local Deep Features
Deep features extracted from convolutional neural networks have been recently utilized in visual tracking to obtain a generic and semantic representation of target candidates. In this paper, we propose a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we propose an efficient and fast numerical algorithm that consists of two subproblems with the close-form solutions. Different evaluations in terms of success and precision on the benchmarks of challenging image sequences (e.g., OTB50 and OTB100) demonstrate the superior performance of the STLDF against several state-of-the-art trackers
Deep learning techniques for visual object tracking
Visual object tracking plays a crucial role in various vision systems, including biometric analysis, medical imaging, smart traffic systems, and video surveillance. Despite notable advancements in visual object tracking over the past few decades, many tracking algorithms still face challenges due to factors like illumination changes, deformation, and scale variations.
This thesis is divided into three parts. The first part introduces the visual object tracking problem and discusses the traditional approaches that have been used to study it. We then propose a novel method called Tracking by Iterative Multi-Refinements, which addresses the issue of locating the target by redefining the search for the ideal bounding box. This method utilizes an iterative process to forecast a sequence of bounding box adjustments, enabling the tracking algorithm to handle multiple non-conflicting transformations simultaneously. As a result, it achieves faster tracking and can handle a higher number of composite transformations.
In the second part of this thesis we explore the application of reinforcement learning (RL) to visual tracking. Presenting a general RL framework applicable to problems that require a sequence of decisions. We discuss various families of popular RL approaches, including value-based methods, policy gradient approaches, and Actor-Critic Methods. Furthermore, we delve into the application of RL to visual tracking, where an RL agent predicts the target's location, selects hyperparameters, correlation filters, or target appearance. A comprehensive comparison of these approaches is provided, along with a taxonomy of state-of-the-art methods.
The third part presents a novel method that addresses the need for online tuning of offline-trained tracking models. Typically, offline-trained models, whether through supervised learning or reinforcement learning, require additional tuning during online tracking to achieve optimal performance. The duration of this tuning process depends on the number of layers that need training for the new target. However, our thesis proposes a pioneering approach that expedites the training of convolutional neural networks (CNNs) while preserving their high performance levels.
In summary, this thesis extensively explores the area of visual object tracking and its related domains, covering traditional approaches, novel methodologies like Tracking by Iterative Multi-Refinements, the application of reinforcement learning, and a pioneering method for accelerating CNN training. By addressing the challenges faced by existing tracking algorithms, this research aims to advance the field of visual object tracking and contributes to the development of more robust and efficient tracking systems
- …