10 research outputs found

    Automatic Labelling of Point Clouds Using Image Semantic Segmentation

    Get PDF
    Isesõitvaid autosid loetakse tehisintellekti järgmiseks suureks saavutuseks. Need kasutavad mitmesuguseid sensoreid, nt kaamera ja LiDAR, et koguda infot ümbritseva maailma kohta. LiDAR salvestab andmed punktipilvena, milles iga punkt on esitatud kolmemõõtmeliste koordinaatidega. Uusimad sügavad närvivõrgud suudavad käsitleda punktipilve algsel kujul, kuid märgendatud andmete kogumine treeningprotsessi jaoks on keeruline ning kulukas. Käesoleva töö eesmärk on kasutada semantiliselt segmenteeritud pilte 3D punktipilve märgendamiseks, võimaldades seeläbi koguda eelmainitud mudelite treenimiseks märgendatud andmeid odavamalt. Lisaks hindame olemasolevate semantilise segmenteerimise mudelite kasutamist suure koguse punktipilvede märgendamiseks automaatselt. Meetodi testimiseks kasutame KITTI andmestikku, sest see sisaldab nii kaamera kui ka LiDARi andmeid iga stseeni jaoks. Kaamera piltide pikseltasemel märgendamiseks kasutame DeepLabv3+ semantilise segmentatsiooni mudelit. Saadud märgendused projitseeritakse seejärel 3D punktipilvele, mille pealt treenitakse PointNet++ mudel. Viimane on seejärel võimeline punktipilvi segmenteerima ilma lisainfota. Eksperimentide tulemused näitavad, et PointNet++ suudab projitseeritud märgendustest võrdlemisi hästi õppida. Tulemuste võrdlused objektide teadaolevate asukohtadega on paljulubavad, saavutades kõrge täpsuse jalakäijate tuvastamisel ning keskmise täpsuse autode tuvastamisel.Autonomous driving is often seen as the next big breakthrough in artificial intelligence. Autonomous vehicles use a variety of sensors to obtain knowledge from the world, for example cameras and LiDARs. LiDAR provides 3D data about the surrounding world in the form of a point cloud. New deep learning models have emerged that allow for learning directly on point clouds, but obtaining labelled data for training these models is difficult and expensive. We propose to use semantically segmented camera images to project labels from 2D to 3D, therefore enabling the use of cheaper ground truth data to train the aforementioned models. Furthermore, we evaluate the use of mature 2D semantic segmentation models to automatically label vast amounts of point cloud data. This approach is tested on the KITTI dataset, as it provides corresponding camera and LiDAR data for each scene. The DeepLabv3+ semantic segmentation model is used to label the camera images with pixel-level labels, which are then projected onto the 3D point cloud and finally a PointNet++ model is trained to do segmentation from point clouds only. Experiments show that projected 2D labels can be learned reasonably well by PointNet++. Evaluating the results with 3D ground truth provided with KITTI dataset produced promising results, with accuracy being high for detecting pedestrians, but mediocre for cars

    Video surveillance using deep transfer learning and deep domain adaptation: Towards better generalization

    Get PDF
    Recently, developing automated video surveillance systems (VSSs) has become crucial to ensure the security and safety of the population, especially during events involving large crowds, such as sporting events. While artificial intelligence (AI) smooths the path of computers to think like humans, machine learning (ML) and deep learning (DL) pave the way more, even by adding training and learning components. DL algorithms require data labeling and high-performance computers to effectively analyze and understand surveillance data recorded from fixed or mobile cameras installed in indoor or outdoor environments. However, they might not perform as expected, take much time in training, or not have enough input data to generalize well. To that end, deep transfer learning (DTL) and deep domain adaptation (DDA) have recently been proposed as promising solutions to alleviate these issues. Typically, they can (i) ease the training process, (ii) improve the generalizability of ML and DL models, and (iii) overcome data scarcity problems by transferring knowledge from one domain to another or from one task to another. Although the increasing number of articles proposed to develop DTL- and DDA-based VSSs, a thorough review that summarizes and criticizes the state-of-the-art is still missing. To that end, this paper introduces, to the best of the authors' knowledge, the first overview of existing DTL- and DDA-based video surveillance to (i) shed light on their benefits, (ii) discuss their challenges, and (iii) highlight their future perspectives.This research work was made possible by research grant support (QUEX-CENG-SCDL-19/20-1) from Supreme Committee for Delivery and Legacy (SC) in Qatar. The statements made herein are solely the responsibility of the authors. Open Access funding provided by the Qatar National Library.Scopu

    Event-Based Algorithms For Geometric Computer Vision

    Get PDF
    Event cameras are novel bio-inspired sensors which mimic the function of the human retina. Rather than directly capturing intensities to form synchronous images as in traditional cameras, event cameras asynchronously detect changes in log image intensity. When such a change is detected at a given pixel, the change is immediately sent to the host computer, where each event consists of the x,y pixel position of the change, a timestamp, accurate to tens of microseconds, and a polarity, indicating whether the pixel got brighter or darker. These cameras provide a number of useful benefits over traditional cameras, including the ability to track extremely fast motions, high dynamic range, and low power consumption. However, with a new sensing modality comes the need to develop novel algorithms. As these cameras do not capture photometric intensities, novel loss functions must be developed to replace the photoconsistency assumption which serves as the backbone of many classical computer vision algorithms. In addition, the relative novelty of these sensors means that there does not exist the wealth of data available for traditional images with which we can train learning based methods such as deep neural networks. In this work, we address both of these issues with two foundational principles. First, we show that the motion blur induced when the events are projected into the 2D image plane can be used as a suitable substitute for the classical photometric loss function. Second, we develop self-supervised learning methods which allow us to train convolutional neural networks to estimate motion without any labeled training data. We apply these principles to solve classical perception problems such as feature tracking, visual inertial odometry, optical flow and stereo depth estimation, as well as recognition tasks such as object detection and human pose estimation. We show that these solutions are able to utilize the benefits of event cameras, allowing us to operate in fast moving scenes with challenging lighting which would be incredibly difficult for traditional cameras

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Participative Urban Health and Healthy Aging in the Age of AI

    Get PDF
    This open access book constitutes the refereed proceedings of the 18th International Conference on String Processing and Information Retrieval, ICOST 2022, held in Paris, France, in June 2022. The 15 full papers and 10 short papers presented in this volume were carefully reviewed and selected from 33 submissions. They cover topics such as design, development, deployment, and evaluation of AI for health, smart urban environments, assistive technologies, chronic disease management, and coaching and health telematics systems

    XXV Congreso Argentino de Ciencias de la Computación - CACIC 2019: libro de actas

    Get PDF
    Trabajos presentados en el XXV Congreso Argentino de Ciencias de la Computación (CACIC), celebrado en la ciudad de Río Cuarto los días 14 al 18 de octubre de 2019 organizado por la Red de Universidades con Carreras en Informática (RedUNCI) y Facultad de Ciencias Exactas, Físico-Químicas y Naturales - Universidad Nacional de Río CuartoRed de Universidades con Carreras en Informátic
    corecore