7 research outputs found

    Real-time siamese multiple object tracker with enhanced proposals

    Get PDF
    Maintaining the identity of multiple objects in real-time video is a challenging task, as it is not always feasible to run a detector on every frame. Thus, motion estimation systems are often employed, which either do not scale well with the number of targets or produce features with limited semantic information. To solve the aforementioned problems and allow the tracking of dozens of arbitrary objects in real-time, we propose SiamMOTION. SiamMOTION includes a novel proposal engine that produces quality features through an attention mechanism and a region-of-interest extractor fed by an inertia module and powered by a feature pyramid network. Finally, the extracted tensors enter a comparison head that efficiently matches pairs of exemplars and search areas, generating quality predictions via a pairwise depthwise region proposal network and a multi-object penalization module. SiamMOTION has been validated on five public benchmarks, achieving leading performance against current state-of-the-art trackers. Code available at: https://www.github.com/lorenzovaquero/SiamMOTIONThis research was partially funded by the Spanish Ministerio de Ciencia e Innovación [grant numbers PID2020-112623GB-I00, RTI2018-097088-B-C32], and the Galician Consellería de Cultura, Educación e Universidade [grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04]. These grants are co-funded by the European Regional Development Fund (ERDF). Lorenzo Vaquero is supported by the Spanish Ministerio de Universidades under the FPU national plan (FPU18/03174). We also gratefully acknowledge the support of NVIDIA Corporation for hardware donations used for this researchS

    Tracking more than 100 arbitrary objects at 25 FPS through deep learning

    Get PDF
    Most video analytics applications rely on object detectors to localize objects in frames. However, when real-time is a requirement, running the detector at all the frames is usually not possible. This is somewhat circumvented by instantiating visual object trackers between detector calls, but this does not scale with the number of objects. To tackle this problem, we present SiamMT, a new deep learning multiple visual object tracking solution that applies single-object tracking principles to multiple arbitrary objects in real-time. To achieve this, SiamMT reuses feature computations, implements a novel crop-and-resize operator, and defines a new and efficient pairwise similarity operator. SiamMT naturally scales up to several dozens of targets, reaching 25 fps with 122 simultaneous objects for VGA videos, or up to 100 simultaneous objects in HD720 video. SiamMT has been validated on five large real-time benchmarks, achieving leading performance against current state-of-the-art trackersThis research was partially funded by the Spanish Ministerio de Ciencia e Innovación [grant numbers PID2020-112623GB-I00, RTI2018-097088-B-C32], and the Galician Consellería de Cultura, Educación e Universidade [grant numbers ED431C 2018/29, ED431C 2017/69, accreditation 2016–2019, ED431G/08]. These grants are co-funded by the European Regional Development Fund (ERDF). Lorenzo Vaquero is supported by the Spanish Ministerio de Universidades under the FPU national plan (FPU18/03174)S

    Short-term anchor linking and long-term self-guided attention for video object detection

    Get PDF
    We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnetThis research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016-2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)S

    Efficient edge filtering of directly-follows graphs for process mining

    Get PDF
    Automated process discovery is a process mining operation that takes as input an event log of a business process and generates a diagrammatic representation of the process. In this setting, a common diagrammatic representation generated by commercial tools is the directly-follows graph (DFG). In some real-life scenarios, the DFG of an event log contains hundreds of edges, hindering its understandability. To overcome this shortcoming, process mining tools generally offer the possibility of filtering the edges in the DFG. We study the problem of efficiently filtering the DFG extracted from an event log while retaining the most frequent relations. We formalize this problem as an optimization problem, specifically, the problem of finding a sound spanning subgraph of a DFG with a minimal number of edges and a maximal sum of edge frequencies. We show that this problem is an instance of an NP-hard problem and outline several polynomial-time heuristics to compute approximate solutions. Finally, we report on an evaluation of the efficiency and optimality of the proposed heuristics using 13 real-life event logsWe thank Luciano García-Baíuelos for proposing the idea of combining the results of Chu-Liu-Edmonds’ algorithm to filter a DFG. We also thank Adriano Augusto for providing us with the implementation of the Split Miner filtering technique. This research was funded by the Spanish Ministry of Economy and Competitiveness (TIN2017-84796-C2-1-R) and the Galician Ministry of Education, Culture and Universities (ED431G/08). These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program). D. Chapela-Campa is supported by the Spanish Ministry of Education, under the FPU national plan (FPU16/04428 and EST19/00135). This research is also funded by the Estonian Research Council (grant PRG1226)S

    A full data augmentation pipeline for small object detection based on generative adversarial networks

    Get PDF
    Object detection accuracy on small objects, i.e., objects under 32 32 pixels, lags behind that of large ones. To address this issue, innovative architectures have been designed and new datasets have been released. Still, the number of small objects in many datasets does not suffice for training. The advent of the generative adversarial networks (GANs) opens up a new data augmentation possibility for training architectures without the costly task of annotating huge datasets for small objects. In this paper, we propose a full pipeline for data augmentation for small object detection which combines a GAN-based object generator with techniques of object segmentation, image inpainting, and image blending to achieve high-quality synthetic data. The main component of our pipeline is DS-GAN, a novel GAN-based architecture that generates realistic small objects from larger ones. Experimental results show that our overall data augmentation method improves the performance of state-of-the-art models up to 11.9% AP on UAVDT and by 4.7% AP on iSAID, both for the small objects subset and for a scenario where the number of training instances is limited.This research was partially funded by the Spanish Ministerio de Ciencia e Innovación [grant numbers PID2020-112623GB-I00, RTI2018-097088-B-C32], and the Galician Consellería de Cultura, Educación e Universidade [grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04]. These grants are co-funded by the European Regional Development Fund (ERDF). This paper was supported by European Union’s Horizon 2020 research and innovation programme under grant numberS

    Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos

    No full text
    This paper addresses the problem of exploiting spatiotemporal information to improve small object detection precision in video. We propose a two-stage object detector called FANet based on short-term spatiotemporal feature aggregation and long-term object linking to refine object detections. First, we generate a set of short tubelet proposals. Then, we aggregate RoI pooled deep features throughout the tubelet using a new temporal pooling operator that summarizes the information with a fixed output size independent of the tubelet length. In addition, we define a double head implementation that we feed with spatiotemporal information for spatiotemporal classification and with spatial information for object localization and spatial classification. Finally, a long-term linking method builds long tubes with the previously calculated short tubelets to overcome detection errors. The association strategy addresses the generally low overlap between instances of small objects in consecutive frames by reducing the influence of the overlap in the final linking score. We evaluated our model in three different datasets with small objects, outperforming previous state-of-the-art spatiotemporal object detectors and our spatial baselineOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureS

    Automatic linguistic reporting of customer activity patterns in open malls

    No full text
    In this work, we present a complete system to produce an automatic linguistic reporting about the customer activity patterns inside open malls, a mixed distribution of classical malls joined with the shops on the street. These reports can assist to design marketing campaigns by means of identifying the best places to catch the attention of customers. Activity patterns are estimated with process mining techniques and the key information of localization. Localization is obtained with a parallelized solution based on WiFi fingerprint system to speed up the solution. In agreement with the best practices for human evaluation of natural language generation systems, the linguistic quality of the generated report was evaluated by 41 experts who filled in an online questionnaire. Results are encouraging, since the average global score of the linguistic quality dimension is 6.17 (0.76 of standard deviation) in a 7- point Likert scale. This expresses a high degree of satisfaction of the generated reports and validates the adequacy of automatic natural language textual reports as a complementary tool to process model visualizationThis work has been partially supported by the Spanish Ministry of Science Innovation and Universities and the European Regional Development Fund (ERDF/FEDER) Grants RTI2018-099646-BI00, TIN2017-84796-C2-1-R, TIN2017-90773-REDT, RED2018-102641-T and RYC-2016-19802 (Ramón y Cajal program, José M. Alonso). Also by the Galician Ministry of Education, University and Professional Training and the ERDF/FEDER program (ED431F2018/02, ED431C2018/29, ED431G2019/04 grants)S
    corecore