19 research outputs found

    Supervised learning and inference of semantic information from road scene images

    Get PDF
    Premio Extraordinario de Doctorado de la UAH en el año académico 2013-2014Nowadays, vision sensors are employed in automotive industry to integrate advanced functionalities that assist humans while driving. However, autonomous vehicles is a hot field of research both in academic and industrial sectors and entails a step beyond ADAS. Particularly, several challenges arise from autonomous navigation in urban scenarios due to their naturalistic complexity in terms of structure and dynamic participants (e.g. pedestrians, vehicles, vegetation, etc.). Hence, providing image understanding capabilities to autonomous robotics platforms is an essential target because cameras can capture the 3D scene as perceived by a human. In fact, given this need for 3D scene understanding, there is an increasing interest on joint objects and scene labeling in the form of geometry and semantic inference of the relevant entities contained in urban environments. In this regard, this Thesis tackles two challenges: 1) the prediction of road intersections geometry and, 2) the detection and orientation estimation of cars, pedestrians and cyclists. Different features extracted from stereo images of the KITTI public urban dataset are employed. This Thesis proposes a supervised learning of discriminative models that rely on strong machine learning techniques for data mining visual features. For the first task, we use 2D occupancy grid maps that are built from the stereo sequences captured by a moving vehicle in a mid-sized city. Based on these bird?s eye view images, we propose a smart parameterization of the layout of straight roads and 4 intersecting roads. The dependencies between the proposed discrete random variables that define the layouts are represented with Probabilistic Graphical Models. Then, the problem is formulated as a structured prediction, in which we employ Conditional Random Fields (CRF) for learning and convex Belief Propagation (dcBP) and Branch and Bound (BB) for inference. For the validation of the proposed methodology, a set of tests are carried out, which are based on real images and synthetic images with varying levels of random noise. In relation to the object detection and orientation estimation challenge in road scenes, this Thesis goal is to compete in the international challenge known as KITTI evaluation benchmark, which encourages researchers to push forward the current state of the art on visual recognition methods, particularized for 3D urban scene understanding. This Thesis proposes to modify the successful part-based object detector known as DPM in order to learn richer models from 2.5D data (color and disparity). Therefore, we revisit the DPM framework, which is based on HOG features and mixture models trained with a latent SVM formulation. Next, this Thesis performs a set of modifications on top of DPM: I) An extension to the DPM training pipeline that accounts for 3D-aware features. II) A detailed analysis of the supervised parameter learning. III) Two additional approaches: "feature whitening" and "stereo consistency check". Additionally, a) we analyze the KITTI dataset and several subtleties regarding to the evaluation protocol; b) a large set of cross-validated experiments show the performance of our contributions and, c) finally, our best performing approach is publicly ranked on the KITTI website, being the first one that reports results with stereo data, yielding an increased object detection precision (3%-6%) for the class 'car' and ranking first for the class cyclist

    Supervised learning and inference of semantic information from road scene images

    Get PDF
    Premio Extraordinario de Doctorado de la UAH en el año académico 2013-2014Nowadays, vision sensors are employed in automotive industry to integrate advanced functionalities that assist humans while driving. However, autonomous vehicles is a hot field of research both in academic and industrial sectors and entails a step beyond ADAS. Particularly, several challenges arise from autonomous navigation in urban scenarios due to their naturalistic complexity in terms of structure and dynamic participants (e.g. pedestrians, vehicles, vegetation, etc.). Hence, providing image understanding capabilities to autonomous robotics platforms is an essential target because cameras can capture the 3D scene as perceived by a human. In fact, given this need for 3D scene understanding, there is an increasing interest on joint objects and scene labeling in the form of geometry and semantic inference of the relevant entities contained in urban environments. In this regard, this Thesis tackles two challenges: 1) the prediction of road intersections geometry and, 2) the detection and orientation estimation of cars, pedestrians and cyclists. Different features extracted from stereo images of the KITTI public urban dataset are employed. This Thesis proposes a supervised learning of discriminative models that rely on strong machine learning techniques for data mining visual features. For the first task, we use 2D occupancy grid maps that are built from the stereo sequences captured by a moving vehicle in a mid-sized city. Based on these bird?s eye view images, we propose a smart parameterization of the layout of straight roads and 4 intersecting roads. The dependencies between the proposed discrete random variables that define the layouts are represented with Probabilistic Graphical Models. Then, the problem is formulated as a structured prediction, in which we employ Conditional Random Fields (CRF) for learning and convex Belief Propagation (dcBP) and Branch and Bound (BB) for inference. For the validation of the proposed methodology, a set of tests are carried out, which are based on real images and synthetic images with varying levels of random noise. In relation to the object detection and orientation estimation challenge in road scenes, this Thesis goal is to compete in the international challenge known as KITTI evaluation benchmark, which encourages researchers to push forward the current state of the art on visual recognition methods, particularized for 3D urban scene understanding. This Thesis proposes to modify the successful part-based object detector known as DPM in order to learn richer models from 2.5D data (color and disparity). Therefore, we revisit the DPM framework, which is based on HOG features and mixture models trained with a latent SVM formulation. Next, this Thesis performs a set of modifications on top of DPM: I) An extension to the DPM training pipeline that accounts for 3D-aware features. II) A detailed analysis of the supervised parameter learning. III) Two additional approaches: "feature whitening" and "stereo consistency check". Additionally, a) we analyze the KITTI dataset and several subtleties regarding to the evaluation protocol; b) a large set of cross-validated experiments show the performance of our contributions and, c) finally, our best performing approach is publicly ranked on the KITTI website, being the first one that reports results with stereo data, yielding an increased object detection precision (3%-6%) for the class 'car' and ranking first for the class cyclist

    Key Information Extraction in Purchase Documents using Deep Learning and Rule-based Corrections

    Full text link
    Deep Learning (DL) is dominating the fields of Natural Language Processing (NLP) and Computer Vision (CV) in the recent times. However, DL commonly relies on the availability of large data annotations, so other alternative or complementary pattern-based techniques can help to improve results. In this paper, we build upon Key Information Extraction (KIE) in purchase documents using both DL and rule-based corrections. Our system initially trusts on Optical Character Recognition (OCR) and text understanding based on entity tagging to identify purchase facts of interest (e.g., product codes, descriptions, quantities, or prices). These facts are then linked to a same product group, which is recognized by means of line detection and some grouping heuristics. Once these DL approaches are processed, we contribute several mechanisms consisting of rule-based corrections for improving the baseline DL predictions. We prove the enhancements provided by these rule-based corrections over the baseline DL results in the presented experiments for purchase documents from public and NielsenIQ datasets.Comment: Conference on Computational Linguistics (COLING 2022). PAN-DL Worksho

    Percepción y análisis 3D de escenas de carretera para vehículos autónomos

    Get PDF
    Este trabajo aborda dos dificultades presentes en los sistemas actuales de localización y creación de mapas de forma simultánea (SLAM, en inglés Simultaneous Localization And Mapping) para la conducción autónoma: la reconstrucción 3D del entorno recorrido por el vehículo (mapeado) y la estimación del movimiento y la posición del vehículo (odometría). En este TFM se propone el uso de la tecnología LiDAR para la percepción 3D del entorno del vehículo y la posterior extracción de información semántica de interés. Los datos obtenidos del LiDAR se formatean como nubes dispersas de puntos 3D y se propone el uso de una variante del algoritmo ICP (Iterative Closest Point) para realizar la alineación de dichas nubes. En primer lugar, se presenta un estudio de mercado de los dispositivos LiDAR actuales aplicados a los sistemas de asistencia a la conducción y vehículos autónomos. Para el desarrollo de este trabajo se han utilizado datos de varios LiDAR de la marca Velodyne: VLP-16, HDL-32E y HDL-64E. En segundo lugar, se han estudiado y evaluado diferentes variantes del algoritmo ICP con el objetivo de realizar la reconstrucción 3D y la odometría. También se proponen una serie de métodos para la optimización del algoritmo, mejorando la precisión y reduciendo el tiempo de cómputo. Para la evaluación del método y algoritmos propuestos se han utilizado datos de dos sensores. Un LiDAR Velodyne HDL-32E instalado en un coche de la Fundación Vicomtech y un LiDAR Velodyne HDL-64E instalado en un coche del Karlsruhe Institute of Technology. En este trabajo se consigue una estimación precisa de la odometría del vehículo, se realiza la reconstrucción de 3D del recorrido del vehículo y se crean imágenes panorámicas, usando solamente datos procedentes de sensores LiDAR. El trabajo se ha desarrollado utilizando de las librerías OpenCV y PCL (Point Cloud Library)

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries

    Text detection and recognition on traffic panels from street-level imagery using visual appearance

    Get PDF
    Traffic sign detection and recognition has been thoroughly studied for a long time. However, traffic panel detection and recognition still remains a challenge in computer vision due to its different types and the huge variability of the information depicted in them. This paper presents a method to detect traffic panels in street-level images and to recognize the information contained on them, as an application to intelligent transportation systems (ITS). The main purpose can be to make an automatic inventory of the traffic panels located in a road to support road maintenance and to assist drivers. Our proposal extracts local descriptors at some interest keypoints after applying blue and white color segmentation. Then, images are represented as a “bag of visual words” and classified using Naïve Bayes or support vector machines. This visual appearance categorization method is a new approach for traffic panel detection in the state of the art. Finally, our own text detection and recognition method is applied on those images where a traffic panel has been detected, in order to automatically read and save the information depicted in the panels. We propose a language model partly based on a dynamic dictionary for a limited geographical area using a reverse geocoding service. Experimental results on real images from Google Street View prove the efficiency of the proposed method and give way to using street-level images for different applications on ITS.Ministerio de Economía y CompetitividadComunidad de Madri

    Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls

    Get PDF
    Tracking-by-detection based on segmentation, Kalman predictions and LSAP association.Occlusion management: SVM kernel metric for GCH+LBP+HOG image features.Overall performance near to 85% while tracking under occlusions in CAVIAR dataset.Human behavior analysis (exits, loitering, etc.) in naturalistic scenes in shops.Real-time multi-camera performance with a processing capacity near to 50fps/camera. Expert video-surveillance systems are a powerful tool applied in varied scenarios with the aim of automatizing the detection of different risk situations and helping human security officers to take appropriate decisions in order to enhance the protection of assets. In this paper, we propose a complete expert system focused on the real-time detection of potentially suspicious behaviors in shopping malls. Our video-surveillance methodology contributes several innovative proposals that compose a robust application which is able to efficiently track the trajectories of people and to discover questionable actions in a shop context. As a first step, our system applies an image segmentation to locate the foreground objects in scene. In this case, the most effective background subtraction algorithms of the state of the art are compared to find the most suitable for our expert video-surveillance application. After the segmentation stage, the detected blobs may represent full or partial people bodies, thus, we have implemented a novel blob fusion technique to group the partial blobs into the final human targets. Then, we contribute an innovative tracking algorithm which is not only based on people trajectories as the most part of state-of-the-art methods, but also on people appearance in occlusion situations. This tracking is carried out employing a new two-step method: (1) the detections-to-tracks association is solved by using Kalman filtering combined with an own-designed cost optimization for the Linear Sum Assignment Problem (LSAP); and (2) the occlusion management is based on SVM kernels to compute distances between appearance features such as GCH, LBP and HOG. The application of these three features for recognizing human appearance provides a great performance compared to other description techniques, because color, texture and gradient information are effectively combined to obtain a robust visual description of people. Finally, the resultant trajectories of people obtained in the tracking stage are processed by our expert video-surveillance system for analyzing human behaviors and identifying potential shopping mall alarm situations, as are shop entry or exit of people, suspicious behaviors such as loitering and unattended cash desk situations. With the aim of evaluating the performance of some of the main contributions of our proposal, we use the publicly available CAVIAR dataset for testing the proposed tracking method with a success near to 85% in occlusion situations. According to this performance, we corroborate in the presented results that the precision and efficiency of our tracking method is comparable and slightly superior to the most recent state-of-the-art works. Furthermore, the alarms given off by our application are evaluated on a naturalistic private dataset, where it is evidenced that our expert video-surveillance system can effectively detect suspicious behaviors with a low computational cost in a shopping mall context.Ministerio de Economía y CompetitividadComunidad de Madri

    Bidirectional loop closure detection on panoramas for visual navigation

    No full text
    Abstract — Visual loop closure detection plays a key role in navigation systems for intelligent vehicles. Nowadays, state-of-the-art algorithms are focused on unidirectional loop closures, but there are situations where they are not sufficient for identifying previously visited places. Therefore, the detection of bidirectional loop closures when a place is revisited in a different direction provides a more robust visual navigation. We propose a novel approach for identifying bidirectional loop closures on panoramic image sequences. Our proposal combines global binary descriptors and a matching strategy based on cross-correlation of sub-panoramas, which are defined as the different parts of a panorama. A set of experiments considering several binary descriptors (ORB, BRISK, FREAK, LDB) is provided, where LDB excels as the most suitable. The proposed matching proffers a reliable bidirectional loop closure detection, which is not efficiently solved in any other previous research. Our method is successfully validated and compare

    Fusion of optimized indicators from Advanced Driver Assistance Systems (ADAS) for driver drowsiness detection

    Get PDF
    This paper presents a non-intrusive approach for monitoring driver drowsiness using the fusion of several optimized indicators based on driver physical and driving performance measures, obtained from ADAS (Advanced Driver Assistant Systems) in simulated conditions. The paper is focused on real-time drowsiness detection technology rather than on long-term sleep/awake regulation prediction technology. We have developed our own vision system in order to obtain robust and optimized driver indicators able to be used in simulators and future real environments. These indicators are principally based on driver physical and driving performance skills. The fusion of several indicators, proposed in the literature, is evaluated using a neural network and a stochastic optimization method to obtain the best combination. We propose a new method for ground-truth generation based on a supervised Karolinska Sleepiness Scale (KSS). An extensive evaluation of indicators, derived from trials over a third generation simulator with several test subjects during different driving sessions, was performed. The main conclusions about the performance of single indicators and the best combinations of them are included, as well as the future works derived from this study.Ministerio de Economía y CompetitividadMinisterio de Ciencia e Innovació
    corecore