16 research outputs found

    Towards Synthetic Dataset Generation for Semantic Segmentation Networks

    Get PDF
    Recent work in semantic segmentation research for autonomous vehicles has shifted towards multimodal techniques. The driving factor behind this is a lack of reliable and ample ground truth annotation data of real-world adverse weather and lighting conditions. Human labeling of such adverse conditions is oftentimes erroneous and very expensive. However, it is a worthwhile endeavour to identify ways to make unimodal semantic segmentation networks more robust. It encourages cost reduction through reduced reliance on sensor fusion. Also, a more robust unimodal network can be used towards multimodal techniques for increased overall system performance. The objective of this thesis is to converge upon a synthetic dataset generation method and testing framework that is conducive towards rapid validation of unimodal semantic segmentation network architectures. We explore multiple avenues of synthetic dataset generation. Insights gained through these explorations guide us towards designing the ProcSy method. ProcSy consists of a procedurally-created, virtual replica of a real-world operational design domain around the city of Waterloo, Ontario. Ground truth annotations, depth, and occlusion data can be produced in real-time. The ProcSy method generates repeatable scenes with quantifiable variations of adverse weather and lighting conditions. We demonstrate experiments using the ProcSy method on DeepLab v3+, a state-of-the-art network for unimodal semantic segmentation tasks. We gain insights about the behaviour of DeepLab on unseen adverse weather conditions. Based on empirical testing, we identify optimization techniques towards data collection for robustly training the network

    Local and Cooperative Autonomous Vehicle Perception from Synthetic Datasets

    Get PDF
    The purpose of this work is to increase the performance of autonomous vehicle 3D object detection using synthetic data. This work introduces the Precise Synthetic Image and LiDAR (PreSIL) dataset for autonomous vehicle perception. Grand Theft Auto V (GTA V), a commercial video game, has a large, detailed world with realistic graphics, which provides a diverse data collection environment. Existing works creating synthetic Light Detection and Ranging (LiDAR) data for autonomous driving with GTA V have not released their datasets, rely on an in-game raycasting function which represents people as cylinders, and can fail to capture vehicles past 30 metres. This work describes a novel LiDAR simulator within GTA V which collides with detailed models for all entities no matter the type or position. The PreSIL dataset consists of over 50,000 frames and includes high-definition images with full resolution depth information, semantic segmentation (images), point-wise segmentation (point clouds), and detailed annotations for all vehicles and people. Collecting additional data with the PreSIL framework is entirely automatic and requires no human intervention of any kind. The effectiveness of the PreSIL dataset is demonstrated by showing an improvement of up to 5% average precision on the KITTI 3D Object Detection benchmark challenge when state-of-the-art 3D object detection networks are pre-trained with the PreSIL dataset. The PreSIL dataset and generation code are available at https://tinyurl.com/y3tb9sxy Synthetic data also enables data generation which is genuinely hard to create in the real world. In the next major chapter of this thesis, a new synthethic dataset, the TruPercept dataset, is created with perceptual information from multiple viewpoints. A novel system is proposed for cooperative perception, perception including information from multiple viewpoints. The TruPercept model is presented. TruPercept integrates trust modelling for vehicular ad hoc networks (VANETs) with information from perception, with a focus on 3D object detection. A discussion is presented on how this might create a safer driving experience for fully autonomous vehicles. The TruPercept dataset is used to experimentally evaluate the TruPercept model against traditional local perception (single viewpoint) models. The TruPercept model is also contrasted with existing methods for trust modeling used in ad hoc network environments. This thesis also offers insights into how V2V communication for perception can be managed through trust modeling, aiming to improve object detection accuracy, across contexts with varying ease of observability. The TruPercept model and data are available at https://tinyurl.com/y2nwy52

    Switching GAN-based Image Filters to Improve Perception for Autonomous Driving

    Get PDF
    Autonomous driving holds the potential to increase human productivity, reduce accidents caused by human errors, allow better utilization of roads, reduce traffic accidents and congestion, free up parking space and provide many other advantages. Perception of Autonomous Vehicles (AV) refers to the use of sensors to perceive the world, e.g. using cameras to detect and classify objects. Traffic scene understanding is a key research problem in perception in autonomous driving, and semantic segmentation is a useful method to address this problem. Adverse weather conditions are a reality that AV must contend with. Conditions like rain, snow, haze, etc. can drastically reduce visibility and thus affect computer vision models. Models for perception for AVs are currently designed for and tested on predominantly ideal weather conditions under good illumination. The most complete solution may be to have the segmentation networks be trained on all possible adverse conditions. Thus a dataset to train a segmentation network to make it robust to rain would need to have adequate data that cover these conditions well. Moreover, labeling is an expensive task. It is particularly expensive for semantic segmentation, as each object in a scene needs to be identified and each pixel annotated in the right class. Thus, the adverse weather is a challenging problem for perception models in AVs. This thesis explores the use of Generative Adversarial Networks (GAN) in order to improve semantic segmentation. We design a framework and a methodology to evaluate the proposed approach. The framework consists of an Adversity Detector, and a series of denoising filters. The Adversity Detector is an image classifier that takes as input clear weather or adverse weather scenes, and attempts to predict whether the given image contains rain, or puddles, or other conditions that can adversely affect semantic segmentation. The filters are denoising generative adversarial networks that are trained to remove the adverse conditions from images in order to translate the image to a domain the segmentation network has been trained on, i.e. clear weather images. We use the prediction from the Adversity Detector to choose which GAN filter to use. The methodology we devise for evaluating our approach uses the trained filters to output sets of images that we can then run segmentation tasks on. This, we argue, is a better metric for evaluating the GANs than similarity measures such as SSIM. We also use synthetic data so we can perform systematic evaluation of our technique. We train two kinds of GANs, one that uses paired data (CycleGAN), and one that does not (Pix2Pix). We have concluded that GAN architectures that use unpaired data are not sufficiently good models for denoising. We train the denoising filters using the other architecture and we found them easy to train, and they show good results. While these filters do not show better performance than when we train our segmentation network with adverse weather data, we refer back to the point that training the segmentation network requires labelled data which is expensive to collect and annotate, particularly for adverse weather and lighting conditions. We implement our proposed framework and report a 17\% increase in performance in segmentation over the baseline results obtained when we do not use our framework

    Leveraging Metadata for Computer Vision on Unmanned Aerial Vehicles

    Get PDF
    The integration of computer vision technology into Unmanned Aerial Vehicles (UAVs) has become increasingly crucial in various aerial vision-based applications. Despite the great significant success of generic computer vision methods, a considerable performance drop is observed when applied to the UAV domain. This is due to large variations in imaging conditions, such as varying altitudes, dynamically changing viewing angles, and varying capture times resulting in vast changes in lighting conditions. Furthermore, the need for real-time algorithms and the hardware constraints pose specific problems that require special attention in the development of computer vision algorithms for UAVs. In this dissertation, we demonstrate that domain knowledge in the form of meta data is a valuable source of information and thus propose domain-aware computer vision methods by using freely accessible sensor data. The pipeline for computer vision systems on UAVs is discussed, from data mission planning, data acquisition, labeling and curation, to the construction of publicly available benchmarks and leaderboards and the establishment of a wide range of baseline algorithms. Throughout, the focus is on a holistic view of the problems and opportunities in UAV-based computer vision, and the aim is to bridge the gap between purely software-based computer vision algorithms and environmentally aware robotic platforms. The results demonstrate that incorporating meta data obtained from onboard sensors, such as GPS, barometers, and inertial measurement units, can significantly improve the robustness and interpretability of computer vision models in the UAV domain. This leads to more trustworthy models that can overcome challenges such as domain bias, altitude variance, synthetic data inefficiency, and enhance perception through environmental awareness in temporal scenarios, such as video object detection, tracking and video anomaly detection. The proposed methods and benchmarks provide a foundation for future research in this area, and the results suggest promising directions for developing environmentally aware robotic platforms. Overall, this work highlights the potential of combining computer vision and robotics to tackle real-world challenges and opens up new avenues for interdisciplinary research

    Simulador de imágenes omnidireccionales fotorealistas para visión por computador

    Get PDF
    La motivación de este proyecto es la necesidad de bases de imágenes omnidireccionales y panorámicas para visión por computador. Su elevado campo de visión permite obtener una gran cantidad de información del entorno a partir de una única imagen. Sin embargo, la distorsión propia de estas imágenes requiere desarrollar algoritmos específicos para su tratamiento e interpretación. Además, un elevado número de imágenes es imprescindible para el correcto entrenamiento de algoritmos de visión por computador basados en aprendizaje profundo. La adquisición, etiquetado y preparación de estas imágenes de forma manual con sistemas reales requiere una cantidad de tiempo y volumen de trabajo que en la práctica limita el tamaño de estas bases de datos. En este trabajo se propone la implementación de una herramienta que permita generar imágenes omnidireccionales sintéticas fotorrealistas que automatice la generación y el etiquetado como estrategia para aumentar el tamaño de estas bases de datos. Este trabajo se apoya en los entornos virtuales que se pueden crear con el motor de videojuegos Unreal Engine 4, el cual se utiliza junto a uno de sus plugin, UnrealCV. A partir de estos entornos virtuales se construyen imágenes de una variedad de cámaras omnidireccionales y 360º con calidad fotorrealista. Las características del entorno permiten además generar imágenes de profundidad y semánticas. Al hacerse todo de forma virtual, se pueden controlar los parámetros de adquisición de la cámara y las características del entorno, permitiendo construir una base de datos con un etiquetado automático sin supervisión. Conocidos los parámetros de calibración, posición y orientación de la cámara y la distribución del entorno y sus objetos, se puede conseguir el ground truth para diversos algoritmos de visión. Con las imágenes e información que se dispone, se pueden evaluar algoritmos de extracción de rectas en imágenes dióptricas y catadióptricas, obtención de layouts en panoramas o métodos de reconstrucción 3D como la localización y mapeado simultáneos (SLAM).<br /

    Attic Inscriptions Online (AIO)

    Get PDF

    Information technologies for epigraphy and cultural heritage. Proceedings of the first EAGLE international conference

    Get PDF
    This peer-reviewed volume contains selected papers from the First EAGLE International Conference on Information Technologies for Epigraphy and Cultural Heritage, held in Paris between September 29 and October 1, 2014. Here are assembled for the first time in a unique volume contributions regarding all aspects of Digital Epigraphy: Models, Vocabularies, Translations, User Engagements, Image Analysis, 3D methodologies, and ongoing projects at the cutting edge of digital humanities. The scope of this book is not limited to Greek and Latin epigraphy; it provides an overview of projects related to all epigraphic inquiry and its related communities. This approach intends to furnish the reader with the broadest possible perspective of the discipline, while at the same time giving due attention to the specifics of unique issues
    corecore