99 research outputs found

    Novel Aggregated Solutions for Robust Visual Tracking in Traffic Scenarios

    Get PDF
    This work proposes novel approaches for object tracking in challenging scenarios like severe occlusion, deteriorated vision and long range multi-object reidentification. All these solutions are only based on image sequence captured by a monocular camera and do not require additional sensors. Experiments on standard benchmarks demonstrate an improved state-of-the-art performance of these approaches. Since all the presented approaches are smartly designed, they can run at a real-time speed

    Utilization and experimental evaluation of occlusion aware kernel correlation filter tracker using RGB-D

    Get PDF
    Unlike deep-learning which requires large training datasets, correlation filter-based trackers like Kernelized Correlation Filter (KCF) uses implicit properties of tracked images (circulant matrices) for training in real-time. Despite their practical application in tracking, a need for a better understanding of the fundamentals associated with KCF in terms of theoretically, mathematically, and experimentally exists. This thesis first details the workings prototype of the tracker and investigates its effectiveness in real-time applications and supporting visualizations. We further address some of the drawbacks of the tracker in cases of occlusions, scale changes, object rotation, out-of-view and model drift with our novel RGB-D Kernel Correlation tracker. We also study the use of particle filter to improve trackers\u27 accuracy. Our results are experimentally evaluated using a) standard dataset and b) real-time using Microsoft Kinect V2 sensor. We believe this work will set the basis for better understanding the effectiveness of kernel-based correlation filter trackers and to further define some of its possible advantages in tracking

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Learning Discriminative Features and Structured Models for Segmentation in Microscopy and Natural Images

    Get PDF
    Segmenting images is a significant challenge that has drawn a lot of attention from different fields of artificial intelligence and has many practical applications. One such challenge addressed in this thesis is the segmentation of electron microscope (EM) imaging of neural tissue. EM microscopy is one of the key tools used to analyze neural tissue and understand the brain, but the huge amounts of data it produces make automated analysis necessary. In addition to the challenges specific to EM data, the common problems encountered in image segmentation must also be addressed. These problems include extracting discriminative features from the data and constructing a statistical model using ground-truth data. Although complex models appear to be more attractive because they allow for more expressiveness, they also lead to a higher computational complexity. On the other hand, simple models come with a lower complexity but less faithfully express the real world. Therefore, one of the most challenging tasks in image segmentation is in constructing models that are expressive enough while remaining tractable. In this work, we propose several automated graph partitioning approaches that address these issues. These methods reduce the computational complexity by operating on supervoxels instead of voxels, incorporating features capable of describing the 3D shape of the target objects and using structured models to account for correlation in output variables. One of the non-trivial issues with such models is that their parameters must be carefully chosen for optimal performance. A popular approach to learning model parameters is a maximum-margin approach called Structured SVM (SSVM) that provides optimality guarantees but also suffers from two main drawbacks. First, SSVM-based approaches are usually limited to linear kernels, since more powerful nonlinear kernels cause the learning to become prohibitively expensive. In this thesis, we introduce an approach to “kernelize” the features so that a linear SSVM framework can leverage the power of nonlinear kernels without incurring their high computational cost. Second, the optimality guarentees are violated for complex models with strong inter-relations between the output variables. We propose a new subgradient-based method that is more robust and leads to improved convergence properties and increased reliability. The different approaches presented in this thesis are applicable to both natural and medical images. They are able to segment mitochondria at a performance level close to that of a human annotator, and outperform state-of-the-art segmentation techniques while still benefiting from a low learning time

    Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval

    Get PDF
    In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data. We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos. The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval. We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features\u27 ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms. Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method

    Motion-Augmented Inference and Joint Kernels in Structured Learning for Object Tracking and Integration with Object Segmentation

    Get PDF
    Video object tracking is a fundamental task of continuously following an object of interest in a video sequence. It has attracted considerable attention in both academia and industry due to its diverse applications, such as in automated video surveillance, augmented and virtual reality, medical, automated vehicle navigation and tracking, and smart devices. Challenges in video object tracking arise from occlusion, deformation, background clutter, illumination variation, fast object motion, scale variation, low resolution, rotation, out-of-view, and motion blur. Object tracking remains, therefore, as an active research field. This thesis explores improving object tracking by employing 1) advanced techniques in machine learning theory to account for intrinsic changes in the object appearance under those challenging conditions, and 2) object segmentation. More specifically, we propose a fast and competitive method for object tracking by modeling target dynamics as a random stochastic process, and using structured support vector machines. First, we predict target dynamics by harmonic means and particle filter in which we exploit kernel machines to derive a new entropy based observation likelihood distribution. Second, we employ online structured support vector machines to model object appearance, where we analyze responses of several kernel functions for various feature descriptors and study how such kernels can be optimally combined to formulate a single joint kernel function. During learning, we develop a probability formulation to determine model updates and use sequential minimal optimization-step to solve the structured optimization problem. We gain efficiency improvements in the proposed object tracking by 1) exploiting particle filter for sampling the search space instead of commonly adopted dense sampling strategies, and 2) introducing a motion-augmented regularization term during inference to constrain the output search space. We then extend our baseline tracker to detect tracking failures or inaccuracies and reinitialize itself when needed. To that end, we integrate object segmentation into tracking. First, we use binary support vector machines to develop a technique to detect tracking failures (or inaccuracies) by monitoring internal variables of our baseline tracker. We leverage learned examples from our baseline tracker to train the employed binary support vector machines. Second, we propose an automated method to re-initialize the tracker to recover from tracking failures by integrating an active contour based object segmentation and using particle filter to sample bounding boxes for segmentation. Through extensive experiments on standard video datasets, we subjectively and objectively demonstrate that both our baseline and extended methods strongly compete against state-of-the-art object tracking methods on challenging video conditions

    Automated detection and tracking of slalom paddlers from broadcast image sequences using cascade classifiers and discriminative correlation filters

    Get PDF
    This paper addresses the problem of automatic detection and tracking of slalom paddlers through a long sequence of sports broadcast images comprised of persistent view changes. In this context, the task of visual object tracking is particularly challenging due to frequent shot transitions (i.e. camera switches), which violate the fundamental spatial continuity assumption used by most of the state-of-the-art object tracking algorithms. The problem is further compounded by significant variations in object location, shape and appearance in typical sports scenarios where the athletes often move rapidly. To overcome these challenges, we propose a Periodically Prior Regularised Discriminative Correlation Filters (PPRDCF) framework, which exploits recent successful Discriminative Correlation Filters (DCF) with a periodic regularisation by a prior that constitutes a rich discriminative cascade classifier. The PPRDCF framework reduces the corruption of positive samples during online learning of the correlation filters by negative training samples. Our framework detects rapid shot transitions to reinitialise the tracker. It successfully recovers the tracker when the location, view or scale of the object changes or the tracker drifts from the object. The PPRDCF also provides the race context by detection of the ordered course obstacles and their spatial relations to the paddler. Our framework robustly outputs the evidence base pre-requisite to derived race kinematics for analysis of performance. Experiments are performed on task-specific dataset containing Canoe/Kayak Slalom race image sequences with successful results obtained

    Multiple Pedestrians and Vehicles Tracking in Aerial Imagery Using a Convolutional Neural Network

    Get PDF
    In this paper, we address various challenges in multi-pedestrian and vehicle tracking in high-resolution aerial imagery by intensive evaluation of a number of traditional and Deep Learning based Single- and Multi-Object Tracking methods. We also describe our proposed Deep Learning based Multi-Object Tracking method Aerial MPTNet that fuses appearance, temporal, and graphical information using a Siamese Neural Network, a Long Short-Term Memory, and a Graph Convolutional Neural Network module for more accurate and stable tracking. Moreover, we investigate the influence of the Squeeze-and-Excitation layers and Online Hard Example Mining on the performance of Aerial MPTNet. To the best of our knowledge, we are the first to use these two for regression-based Multi-Object Tracking. Additionally, we studied and compared the L1 and Huber loss functions. In our experiments, we extensively evaluate AerialMPTNet on three aerial Multi-Object Trackingdatasets, namely AerialMPT and KIT AIS pedestrian and vehicle datasets. Qualitative and quantitative results show that AerialMPTNet outperforms all previous methods for the pedestrian datasets and achieves competitive results for the vehicle dataset. In addition, Long Short-Term Memory and Graph Convolutional Neural Network modules enhance the tracking performance. Moreover, using Squeeze-and-Excitation and Online Hard Example Mining significantly helps for some cases while degrades the results for other cases. In addition, according to the results, L1 yields better results with respect to Huber loss for most of the scenarios. The presented results provide a deep insight into challenges and opportunities of the aerial Multi-Object Tracking domain, paving the way for future research
    corecore