5 research outputs found

    Exploiting semantic segmentation to boost reinforcement learning in video game environments

    Full text link
    In this work we explore enhancing performance of reinforcement learning algorithms in video game environments by feeding it better, more relevant data. For this purpose, we use semantic segmentation to transform the images that would be used as input for the reinforcement learning algorithm from their original domain to a simplified semantic domain with just silhouettes and class labels instead of textures and colors, and then we train the reinforcement learning algorithm with these simplified images. We have conducted different experiments to study multiple aspects: feasibility of our proposal, and potential benefits to model generalization and transfer learning. Experiments have been performed with the Super Mario Bros video game as the testing environment. Our results show multiple advantages for this method. First, it proves that using semantic segmentation enables reaching higher performance than the baseline reinforcement learning algorithm without modifying the actual algorithm, and in fewer episodes; second, it shows noticeable performance improvements when training on multiple levels at the same time; and finally, it allows to apply transfer learning for models trained on visually different environments. We conclude that using semantic segmentation can certainly help reinforcement learning algorithms that work with visual data, by refining it. Our results also suggest that other computer vision techniques may also be beneficial for data prepossessing. Models and code will be available on github upon acceptanceOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work is part of the preliminary tasks related to the Harvesting Visual Data (HVD) project (PID2021- 125051OB-I00) funded by the Ministerio de Ciencia e Innovacin of the Spanish Governmen

    Semantic-aware scene recognition

    Full text link
    Scene recognition is currently one of the top-challenging research fields in computer vision. This may be due to the ambiguity between classes: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different. Convolutional Neural Networks (CNNs) have significantly boosted performance in scene recognition, albeit it is still far below from other recognition tasks (e.g., object or image recognition). In this paper, we describe a novel approach for scene recognition based on an end-to-end multi-modal CNN that combines image and context information by means of an attention module. Context information, in the shape of a semantic segmentation, is used to gate features extracted from the RGB image by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations. This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them. Experimental results on three publicly available datasets show that the proposed approach outperforms every other state-of-the-art method while significantly reducing the number of network parameters. All the code and data used along this paper is available at: https://github.com/vpulab/Semantic-Aware-Scene-RecognitionThis study has been partially supported by the Spanish Government through its TEC2017-88169-R MobiNetVideo projec

    Enhancing vehicle re-identification via synthetic training datasets and re-ranking based on video-clips information

    Full text link
    Vehicle re-identification (ReID) aims to find a specific vehicle identity across multiple non-overlapping cameras. The main challenge of this task is the large intra-class and small inter-class variability of vehicles appearance, sometimes related with large viewpoint variations, illumination changes or different camera resolutions. To tackle these problems, we proposed a vehicle ReID system based on ensembling deep learning features and adding different post-processing techniques. In this paper, we improve that proposal by: incorporating large-scale synthetic datasets in the training step; performing an exhaustive ablation study showing and analyzing the influence of synthetic content in ReID datasets, in particular CityFlow-ReID and VeRi-776; and extending post-processing by including different approaches to the use of gallery video-clips of the target vehicles in the re-ranking step. Additionally, we present an evaluation framework in order to evaluate CityFlow-ReID: as this dataset has not public ground truth annotations, AI City Challenge provided an on-line evaluation service which is no more available; our evaluation framework allows researchers to keep on evaluating the performance of their systems in the CityFlow-ReID datasetOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Natur

    Towards automatic waste containers management in cities via computer vision: containers localization and geo-positioning in city maps

    Full text link
    This paper describes the scientific achievements of a collaboration between a research group and the waste management division of a company. While these results might be the basis for several practical or commercial developments, we here focus on a novel scientific contribution: a methodology to automatically generate geo-located waste container maps. It is based on the use of Computer Vision algorithms to detect waste containers and identify their geographic location and dimensions. Algorithms analyze a video sequence and provide an automatic discrimination between images with and without containers. More precisely, two state-of-the-art object detectors based on deep learning techniques have been selected for testing, according to their performance and to their adaptability to an on-board real-time environment: EfficientDet and YOLOv5. Experimental results indicate that the proposed visual model for waste container detection is able to effectively operate with consistent performance disregarding the container type (organic waste, plastic, glass and paper recycling,…) and the city layout, which has been assessed by evaluating it on eleven different Spanish cities that vary in terms of size, climate, urban layout and containers’ appearanceThis work has been supported by URBASER S.A. and the Universidad Autonoma ´ de Madrid under project REVGA of the ”Segunda Edicion ´ del Programa de Fomento de la Transferencia del Conocimiento” cal

    Semantic-driven multi-camera pedestrian detection

    Full text link
    Abstract: In the current worldwide situation, pedestrian detection has reemerged as a pivotal tool for intelligent video-based systems aiming to solve tasks such as pedestrian tracking, social distancing monitoring or pedestrian mass counting. Pedestrian detection methods, even the top performing ones, are highly sensitive to occlusions among pedestrians, which dramatically degrades their performance in crowded scenarios. The generalization of multi-camera setups permits to better confront occlusions by combining information from different viewpoints. In this paper, we present a multi-camera approach to globally combine pedestrian detections leveraging automatically extracted scene context. Contrarily to the majority of the methods of the state-of-the-art, the proposed approach is scene-agnostic, not requiring a tailored adaptation to the target scenario–e.g., via fine-tuning. This noteworthy attribute does not require ad hoc training with labeled data, expediting the deployment of the proposed method in real-world situations. Context information, obtained via semantic segmentation, is used (1) to automatically generate a common area of interest for the scene and all the cameras, avoiding the usual need of manually defining it, and (2) to obtain detections for each camera by solving a global optimization problem that maximizes coherence of detections both in each 2D image and in the 3D scene. This process yields tightly fitted bounding boxes that circumvent occlusions or miss detections. The experimental results on five publicly available datasets show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad hoc annotations nor human-guided configurationThis study has been partially supported by the Spanish Government through its TEC2017- 88169-R MobiNetVideo projec
    corecore