17 research outputs found

    Joint Detection and Tracking in Videos with Identification Features

    Full text link
    Recent works have shown that combining object detection and tracking tasks, in the case of video data, results in higher performance for both tasks, but they require a high frame-rate as a strict requirement for performance. This is assumption is often violated in real-world applications, when models run on embedded devices, often at only a few frames per second. Videos at low frame-rate suffer from large object displacements. Here re-identification features may support to match large-displaced object detections, but current joint detection and re-identification formulations degrade the detector performance, as these two are contrasting tasks. In the real-world application having separate detector and re-id models is often not feasible, as both the memory and runtime effectively double. Towards robust long-term tracking applicable to reduced-computational-power devices, we propose the first joint optimization of detection, tracking and re-identification features for videos. Notably, our joint optimization maintains the detector performance, a typical multi-task challenge. At inference time, we leverage detections for tracking (tracking-by-detection) when the objects are visible, detectable and slowly moving in the image. We leverage instead re-identification features to match objects which disappeared (e.g. due to occlusion) for several frames or were not tracked due to fast motion (or low-frame-rate videos). Our proposed method reaches the state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge among online trackers, and 3rd overall.Comment: Accepted at Image and Vision Computing Journa

    Improved detection of small objects in road network sequences using CNN and super resolution

    Get PDF
    The detection of small objects is one of the problems present in deep learning due to the context of the scene or the low number of pixels of the objects to be detected. According to these problems, current pre-trained models based on convolutional neural networks usually give a poor average precision, highlighting some as CenterNet HourGlass104 with a mean average precision of 25.6%, or SSD-512 with 9%. This work focuses on the detection of small objects. In particular, our proposal aims to vehicle detection from images captured by video surveillance cameras with pretrained models without modifying their structures, so it does not require retraining the network to improve the detection rate of the elements. For better performance, a technique has been developed which, starting from certain initial regions, detects a higher number of objects and improves their class inference without modifying or retraining the network. The neural network is integrated with processes that are in charge of increasing the resolution of the images to improve the object detection performance. This solution has been tested for a set of traffic images containing elements of different scales to check the efficiency depending on the detections obtained by the model. Our proposal achieves good results in a wide range of situations, obtaining, for example, an average score of 45.1% with the EfficientDet-D4 model for the first video sequence, compared to the 24.3% accuracy initially provided by the pre-trained model.This work is partially supported by the Ministry of Science, Innovation and Universities of Spain [grant number RTI2018-094645-B-I00], project name Automated detection with low-cost hardware of unusual activities in video sequences. It is also partially supported by the Autonomous Government of Andalusia (Spain) under project UMA18-FEDERJA-084, project name Detection of anomalous behaviour agents by deep learning in low-cost video surveillance intelligent systems. All of them include funds from the European Regional Development Fund (ERDF). It is also partially supported by the University of Málaga (Spain) under grants B1-2019_01, project name Anomaly detection on roads by moving cameras, and B1-2019_02, project name Self-Organizing Neural Systems for Non-Stationary Environments. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the SCBI (Supercomputing and Bioinformatics) center of the University of Málaga. The authors acknowledge the funding from the Universidad de Málaga. I.G.-A. is funded by a scholarship from the Autonomous Government of Andalusia (Spain) under the Young Employment operative program [grant number SNGJ5Y6-15]. They also gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs. Funding for open access charge: Universidad de Málaga / CBUA

    DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

    Full text link
    Although YOLOv2 approach is extremely fast on object detection; its backbone network has the low ability on feature extraction and fails to make full use of multi-scale local region features, which restricts the improvement of object detection accuracy. Therefore, this paper proposed a DC-SPP-YOLO (Dense Connection and Spatial Pyramid Pooling Based YOLO) approach for ameliorating the object detection accuracy of YOLOv2. Specifically, the dense connection of convolution layers is employed in the backbone network of YOLOv2 to strengthen the feature extraction and alleviate the vanishing-gradient problem. Moreover, an improved spatial pyramid pooling is introduced to pool and concatenate the multi-scale local region features, so that the network can learn the object features more comprehensively. The DC-SPP-YOLO model is established and trained based on a new loss function composed of mean square error and cross entropy, and the object detection is realized. Experiments demonstrate that the mAP (mean Average Precision) of DC-SPP-YOLO proposed on PASCAL VOC datasets and UA-DETRAC datasets is higher than that of YOLOv2; the object detection accuracy of DC-SPP-YOLO is superior to YOLOv2 by strengthening feature extraction and using the multi-scale local region features.Comment: 23 pages, 9 figures, 9 table
    corecore