Search CORE

69 research outputs found

Exploring Methods to Enhance Appearance-Based Video Object Tracking using Dynamics Theory

Author: Alonso Poal Marina
Publication venue: Universitat Politècnica de Catalunya
Publication date: 02/06/2020
Field of study

To be defined upon arrival.The task of Video Object Tracking has for a long time received attention within the field of Computer Vision, and many different approaches have tried to tackle its challenges, being the ones based on appearance and motion some of the most popular ones. The main focus of this thesis is to fuse both strategies in order to exploit their strengths and overcome each other's flaws. To achieve this goal, we propose a unified framework that combines, in an online manner, an off-the-shelf single-object siamese tracker, which is modified to perform multi-object tracking and to provide more than one detection candidate, with a novel motion module. This module detects when the proposed target position is not dynamically consistent and, if that is the case, predicts an alternative which is used to choose the best among the rest of candidates. Our approach is evaluated on the challenging Similar Multi-Object Tracking (SMOT) dataset and achieves a relevant precision improvement of the 5% with respect to the baseline. We present an extension to the SMOT dataset, the eSMOT, including more sequences with complex dynamic scenarios, where the performance of our model is excellent, therefore we use its predictions to label the Ground Truth. Although there is still room for enhancement mainly regarding the efficiency of the approach, this work has served as a relevant proof of concept for the intuitions behind it and consequently, research in this direction will surely continue

UPCommons. Portal del coneixement obert de la UPC

Survey on Vision-based Path Prediction

Author: A Lerner
A Robicquet
CG Keller
D Helbing
D Munoz
D Weinland
E Shelhamer
H Zhu
JANE BROMLEY
JFP Kooij
KM Kitani
L Ballan
Nicolas Schneider
R Benenson
S Huang
S Singh
S Yi
SZ Bokhari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the environment surrounding the target and the internal state of the target, need to be estimated from the video in addition to predicting paths. Many prediction approaches that include understanding the environment and the internal state have been proposed. In this survey, we systematically summarize methods of path prediction that take video as input and and extract features from the video. Moreover, we introduce datasets used to evaluate path prediction methods quantitatively.Comment: DAPI 201

arXiv.org e-Print Archive

Crossref

Going Deeper into Action Recognition: A Survey

Author: Harandi Mehrtash
Herath Samitha
Porikli Fatih
Publication venue
Publication date: 01/01/2017
Field of study

Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

arXiv.org e-Print Archive

The Australian National University

Recommended from our members

An analysis of training methodologies for deep visual trackers

Author: Fiez Trevor
Publication venue: 'Oregon State University'
Publication date
Field of study

This thesis considers the problem of training convolutional neural networks for online visual tracking. A major challenge for single object visual tracking is that most training sets with frame-level track annotations are quite small, due to the prohibitive cost of manual annotation. Current training approaches either supplement the annotations with other data sources (e.g., object-detection training data) or generate noisy variants of the track annotations. In either case, the data generation and training methods have ignored the fact that tracking involves sequences of decisions (one per frame) that are dependent on one another. Thus, the objectives optimized by these learning algorithms are not directly tied to the end goal of tracking performance. To further study this issue, we consider the state-of-the-art imitation learning algorithm, DAGGER, for training an online tracker. We observe that the DAGGER faces difficulty when applied to tracking, because online trackers typically experience unrecoverable failures, especially early in training. To rectify this issue we introduce, analyze, and evaluate a variation of DAGGER, called DAGGER with Resets (DAGGER), a novel imitation learning framework which maintains the theoretical properties of DAGGER and is more appropriate for training deep trackers. Our main contribution is to compare different training methods, including DAGGER and DAGGER, across a variety of datasets and multiple trackers. Our experimental results show this principled training approach and methodical random augmentation is able to outperform existing training approaches across multiple visual tracking datasets.Keywords: computer science, machine learning, visual tracking, deep learning, imitation learning, computer visio

ScholarsArchive@OSU

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

Author: Cao Shidong
Chai Wenhao
Hao Shengyu
Hu Wenhao
Hwang Jenq-Neng
Song Mingli
Wang Gaoang
Wang Guanhong
Zhao Zhonghan
Publication venue
Publication date: 06/07/2023
Field of study

Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision. This paper presents a comprehensive survey of deep learning in sports performance, focusing on three main aspects: algorithms, datasets and virtual environments, and challenges. Firstly, we discuss the hierarchical structure of deep learning algorithms in sports performance which includes perception, comprehension and decision while comparing their strengths and weaknesses. Secondly, we list widely used existing datasets in sports and highlight their characteristics and limitations. Finally, we summarize current challenges and point out future trends of deep learning in sports. Our survey provides valuable reference material for researchers interested in deep learning in sports applications

arXiv.org e-Print Archive

Homography Estimation in Complex Topological Scenes

Author: Bondarau Egor
D'Amicantonio Giacomo
De With Peter H. N.
Publication venue
Publication date: 02/08/2023
Field of study

Surveillance videos and images are used for a broad set of applications, ranging from traffic analysis to crime detection. Extrinsic camera calibration data is important for most analysis applications. However, security cameras are susceptible to environmental conditions and small camera movements, resulting in a need for an automated re-calibration method that can account for these varying conditions. In this paper, we present an automated camera-calibration process leveraging a dictionary-based approach that does not require prior knowledge on any camera settings. The method consists of a custom implementation of a Spatial Transformer Network (STN) and a novel topological loss function. Experiments reveal that the proposed method improves the IoU metric by up to 12% w.r.t. a state-of-the-art model across five synthetic datasets and the World Cup 2014 dataset.Comment: Will be published in Intelligent Vehicle Symposium 202

arXiv.org e-Print Archive

A Robust Structured Tracker Using Local Deep Features

Author: Farzaneh Amir Hossein
Javanmardi Mohammadreza
Qi Xiaojun
Publication venue: Hosted by Utah State University Libraries
Publication date: 20/05/2020
Field of study

Deep features extracted from convolutional neural networks have been recently utilized in visual tracking to obtain a generic and semantic representation of target candidates. In this paper, we propose a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we propose an efficient and fast numerical algorithm that consists of two subproblems with the close-form solutions. Different evaluations in terms of success and precision on the benchmarks of challenging image sequences (e.g., OTB50 and OTB100) demonstrate the superior performance of the STLDF against several state-of-the-art trackers

DigitalCommons@USU

Deep learning techniques for visual object tracking

Author: CRUCIATA Giorgio
Publication venue: place:Palermo
Publication date: 26/06/2023
Field of study

Visual object tracking plays a crucial role in various vision systems, including biometric analysis, medical imaging, smart traffic systems, and video surveillance. Despite notable advancements in visual object tracking over the past few decades, many tracking algorithms still face challenges due to factors like illumination changes, deformation, and scale variations. This thesis is divided into three parts. The first part introduces the visual object tracking problem and discusses the traditional approaches that have been used to study it. We then propose a novel method called Tracking by Iterative Multi-Refinements, which addresses the issue of locating the target by redefining the search for the ideal bounding box. This method utilizes an iterative process to forecast a sequence of bounding box adjustments, enabling the tracking algorithm to handle multiple non-conflicting transformations simultaneously. As a result, it achieves faster tracking and can handle a higher number of composite transformations. In the second part of this thesis we explore the application of reinforcement learning (RL) to visual tracking. Presenting a general RL framework applicable to problems that require a sequence of decisions. We discuss various families of popular RL approaches, including value-based methods, policy gradient approaches, and Actor-Critic Methods. Furthermore, we delve into the application of RL to visual tracking, where an RL agent predicts the target's location, selects hyperparameters, correlation filters, or target appearance. A comprehensive comparison of these approaches is provided, along with a taxonomy of state-of-the-art methods. The third part presents a novel method that addresses the need for online tuning of offline-trained tracking models. Typically, offline-trained models, whether through supervised learning or reinforcement learning, require additional tuning during online tracking to achieve optimal performance. The duration of this tuning process depends on the number of layers that need training for the new target. However, our thesis proposes a pioneering approach that expedites the training of convolutional neural networks (CNNs) while preserving their high performance levels. In summary, this thesis extensively explores the area of visual object tracking and its related domains, covering traditional approaches, novel methodologies like Tracking by Iterative Multi-Refinements, the application of reinforcement learning, and a pioneering method for accelerating CNN training. By addressing the challenges faced by existing tracking algorithms, this research aims to advance the field of visual object tracking and contributes to the development of more robust and efficient tracking systems

Archivio istituzionale della ricerca - Università di Palermo