25 research outputs found

    Visual Object Tracking in First Person Vision

    Get PDF
    The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used “off-the-shelf” or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated

    Young drivers’ pedestrian anti-collision braking operation data modelling for ADAS development

    Get PDF
    Smart cities and smart mobility come from intelligent systems designed by humans. Artificial Intelligence (AI) is contributing significantly to the development of these systems, and the automotive industry is the most prominent example of "smart" technology entering the market: there are Advanced Driver Assistance System (ADAS), Radar/LIDAR detection units and camera-based Computer Vision systems that can assess driving conditions. Actually, these technologies have become consumer goods and services in mass-produced vehicles to provide human drivers with tools for a more comfortable and safer driving. Nevertheless, they need to be further improved for progress in the transition to fully automated driving or simply to increase vehicle automation levels. To this end, it becomes imperative to accurately predict driver’s decisions, model human driving behaviors, and introduce more accurate risk assessment metrics. This paper presents a system that can learn to predict the future braking behavior of a driver in a typically urban vehicle-pedestrian conflict, i.e., when a pedestrian enters a zebra crossing from the curb and a vehicle is approaching. The algorithm proposes a sequential prediction of relevant operational indicators that continuously describe the encounter process. A car driving simulator was used to collect reliable data on braking behaviours of a cohort of 68 licensed university students, who faced the same urban scenario. The vehicle speed, steering wheel angle, and pedal activity were recorded as the participants approached the crosswalk, along with the azimuth angle of the pedestrian and the relative longitudinal distance between the vehicle and the pedestrian: the proposed system employs the vehicle information as human driving decisions and the pedestrian information as explanatory variables of the environmental state. In fact, the pedestrian’s polar coordinates are usually calculated by an on-board millimeter-wave radar which is typically used to perceive the environment around a vehicle. All mentioned information is represented in the form of time series data and is used to train a recurrent neural network in a supervised machine learning process. The main purpose of this research is to define a system of behavioral profiles in non-collision conditions that could be used for enhancing the existing intelligent driving systems, e.g., to reduce the number of warnings when the driver is not on a collision course with a pedestrian. Preliminary experiments reveal the feasibility of the proposed system

    CoCoLoT: Combining Complementary Trackers in Long-Term Visual Tracking

    No full text
    How to combine the complementary capabilities of an ensemble of different algorithms has been of central interest in visual object tracking. A significant progress on such a problem has been achieved, but considering short-term tracking scenarios. Instead, long-term tracking settings have been substantially ignored by the solutions. In this paper, we explicitly consider long-term tracking scenarios and provide a framework, named CoCoLoT, that combines the characteristics of complementary visual trackers to achieve enhanced long-term tracking performance. CoCoLoT perceives whether the trackers are following the target object through an online learned deep verification model, and accordingly activates a decision policy which selects the best performing tracker as well as it corrects the performance of the failing one. The proposed methodology is evaluated extensively and the comparison with several other solutions reveals that it competes favourably with the state-of-the-art on the most popular long-term visual tracking benchmarks

    Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation

    No full text
    Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results

    Bituminous mixtures experimental data modeling using a hyperparameters‐optimized machine learning approach

    Get PDF
    This study introduces a machine learning approach based on Artificial Neural Networks (ANNs) for the prediction of Marshall test results, stiffness modulus and air voids data of different bituminous mixtures for road pavements. A novel approach for an objective and semi‐automatic identification of the optimal ANN’s structure, defined by the so‐called hyperparameters, has been introduced and discussed. Mechanical and volumetric data were obtained by conducting laboratory tests on 320 Marshall specimens, and the results were used to train the neural network. The k‐fold Cross Validation method has been used for partitioning the available data set, to obtain an unbiased evaluation of the model predictive error. The ANN’s hyperparameters have been optimized using the Bayesian optimization, that overcame efficiently the more costly trial‐and‐error procedure and automated the hyperparameters tuning. The proposed ANN model is characterized by a Pearson coefficient value of 0.868

    Deep convolutional feature details for better knee disorder diagnoses in magnetic resonance images

    No full text
    Convolutional neural networks (CNNs) applied to magnetic resonance imaging (MRI) have demonstrated their ability in the automatic diagnosis of knee injuries. Despite the promising results, the currently available solutions do not take into account the particular anatomy of knee disorders. Existing works have shown that injuries are localized in small-sized knee regions near the center of MRI scans. Based on such insights, we propose MRPyrNet, a CNN architecture capable of extracting more relevant features from these regions. Our solution is composed of a Feature Pyramid Network with Pyramidal Detail Pooling, and can be plugged into any existing CNN-based diagnostic pipeline. The first module aims to enhance the CNN intermediate features to better detect the small-sized appearance of disorders, while the second one captures such kind of evidence by maintaining its detailed information. An extensive evaluation campaign is conducted to understand in-depth the potential of the proposed solution. The experimental results achieved demonstrate that the application of MRPyrNet to baseline methodologies improves their diagnostic capability, especially in the case of anterior cruciate ligament tear and meniscal tear because of MRPyrNet's ability in exploiting the relevant appearance features of such disorders. Code is available at https://github.com/matteo-dunnhofer/MRPyrNet

    Is First Person Vision Challenging for Object Tracking?

    No full text
    Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks

    Visual tracking by means of deep reinforcement learning and an expert demonstrator

    No full text
    In the last decade many different algorithms have been proposed to track a generic object in videos. Their execution on recent large-scale video datasets can produce a great amount of various tracking behaviours. New trends in Reinforcement Learning showed that demonstrations of an expert agent can be efficiently used to speed-up the process of policy learning. Taking inspiration from such works and from the recent applications of Reinforcement Learning to visual tracking, we propose two novel trackers, A3CT, which exploits demonstrations of a state-of-the-art tracker to learn an effective tracking policy, and A3CTD, that takes advantage of the same expert tracker to correct its behaviour during tracking. Through an extensive experimental validation on the GOT-10k, OTB-100, LaSOT, UAV123 and VOT benchmarks, we show that the proposed trackers achieve state-of-the-art performance while running in real-time

    Lord of the Rings: Hanoi Pooling and Self-Knowledge Distillation for Fast and Accurate Vehicle Re-Identification

    No full text
    Vehicle re-identification has seen increasing interest thanks to its fundamental impact on intelligent surveillance systems and smart transportation. The visual data acquired from monitoring camera networks comes with severe challenges including occlusions, color and illumination changes as well as orientation issues (a vehicle can be seen from the side/front/rear due to different camera viewpoints). To deal with such challenges, the community has spent much effort in learning robust feature representations that hinge on additional visual attributes and part-driven methods, but with the side-effects of requiring extensive human annotation labor as well as increasing computational complexity. We propose an approach that learns a feature representation robust to vehicle orientation issues without the need for extra-labeled data and adding negligible computational overheads. The former objective is achieved through the introduction of a Hanoi pooling layer exploiting ring regions and the image pyramid approach yielding a multi-scale representation of vehicle appearance. The latter is tackled by transferring the accuracy of a deep network to its first layers, thus reducing the inference effort by the early stop of a test example. This is obtained by means of a self-knowledge distillation framework encouraging multi-exit network decisions to agree with each other. Results demonstrate that the proposed approach significantly improves the accuracy of early (i.e., very fast) exits while maintaining the same accuracy of a deep (slow) baseline. Moreover, our solution obtains the best existing performance on three benchmark datasets
    corecore