124 research outputs found

    Smart environment monitoring through micro unmanned aerial vehicles

    Get PDF
    In recent years, the improvements of small-scale Unmanned Aerial Vehicles (UAVs) in terms of flight time, automatic control, and remote transmission are promoting the development of a wide range of practical applications. In aerial video surveillance, the monitoring of broad areas still has many challenges due to the achievement of different tasks in real-time, including mosaicking, change detection, and object detection. In this thesis work, a small-scale UAV based vision system to maintain regular surveillance over target areas is proposed. The system works in two modes. The first mode allows to monitor an area of interest by performing several flights. During the first flight, it creates an incremental geo-referenced mosaic of an area of interest and classifies all the known elements (e.g., persons) found on the ground by an improved Faster R-CNN architecture previously trained. In subsequent reconnaissance flights, the system searches for any changes (e.g., disappearance of persons) that may occur in the mosaic by a histogram equalization and RGB-Local Binary Pattern (RGB-LBP) based algorithm. If present, the mosaic is updated. The second mode, allows to perform a real-time classification by using, again, our improved Faster R-CNN model, useful for time-critical operations. Thanks to different design features, the system works in real-time and performs mosaicking and change detection tasks at low-altitude, thus allowing the classification even of small objects. The proposed system was tested by using the whole set of challenging video sequences contained in the UAV Mosaicking and Change Detection (UMCD) dataset and other public datasets. The evaluation of the system by well-known performance metrics has shown remarkable results in terms of mosaic creation and updating, as well as in terms of change detection and object detection

    Advances and Applications of Computer Vision Techniques in Vehicle Trajectory Generation and Surrogate Traffic Safety Indicators

    Full text link
    The application of Computer Vision (CV) techniques massively stimulates microscopic traffic safety analysis from the perspective of traffic conflicts and near misses, which is usually measured using Surrogate Safety Measures (SSM). However, as video processing and traffic safety modeling are two separate research domains and few research have focused on systematically bridging the gap between them, it is necessary to provide transportation researchers and practitioners with corresponding guidance. With this aim in mind, this paper focuses on reviewing the applications of CV techniques in traffic safety modeling using SSM and suggesting the best way forward. The CV algorithm that are used for vehicle detection and tracking from early approaches to the state-of-the-art models are summarized at a high level. Then, the video pre-processing and post-processing techniques for vehicle trajectory extraction are introduced. A detailed review of SSMs for vehicle trajectory data along with their application on traffic safety analysis is presented. Finally, practical issues in traffic video processing and SSM-based safety analysis are discussed, and the available or potential solutions are provided. This review is expected to assist transportation researchers and engineers with the selection of suitable CV techniques for video processing, and the usage of SSMs for various traffic safety research objectives

    Object Detection in Omnidirectional Images

    Get PDF
    Nowadays, computer vision (CV) is widely used to solve real-world problems, which pose increasingly higher challenges. In this context, the use of omnidirectional video in a growing number of applications, along with the fast development of Deep Learning (DL) algorithms for object detection, drives the need for further research to improve existing methods originally developed for conventional 2D planar images. However, the geometric distortion that common sphere-to-plane projections produce, mostly visible in objects near the poles, in addition to the lack of omnidirectional open-source labeled image datasets has made an accurate spherical image-based object detection algorithm a hard goal to achieve. This work is a contribution to develop datasets and machine learning models particularly suited for omnidirectional images, represented in planar format through the well-known Equirectangular Projection (ERP). To this aim, DL methods are explored to improve the detection of visual objects in omnidirectional images, by considering the inherent distortions of ERP. An experimental study was, firstly, carried out to find out whether the error rate and type of detection errors were related to the characteristics of ERP images. Such study revealed that the error rate of object detection using existing DL models with ERP images, actually, depends on the object spherical location in the image. Then, based on such findings, a new object detection framework is proposed to obtain a uniform error rate across the whole spherical image regions. The results show that the pre and post-processing stages of the implemented framework effectively contribute to reducing the performance dependency on the image region, evaluated by the above-mentioned metric

    Dataset of Panoramic Images for People Tracking in Service Robotics

    Get PDF
    We provide a framework for constructing a guided robot for usage in hospitals in this thesis. The omnidirectional camera on the robot allows it to recognize and track the person who is following it. Furthermore, when directing the individual to their preferred position in the hospital, the robot must be aware of its surroundings and avoid accidents with other people or items. To train and evaluate our robot's performance, we developed an auto-labeling framework for creating a dataset of panoramic videos captured by the robot's omnidirectional camera. We labeled each person in the video and their real position in the robot's frame, enabling us to evaluate the accuracy of our tracking system and guide the development of the robot's navigation algorithms. Our research expands on earlier work that has established a framework for tracking individuals using omnidirectional cameras. We want to contribute to the continuing work to enhance the precision and dependability of these tracking systems, which is essential for the creation of efficient guiding robots in healthcare facilities, by developing a benchmark dataset. Our research has the potential to improve the patient experience and increase the efficiency of healthcare institutions by reducing staff time spent guiding patients through the facility.We provide a framework for constructing a guided robot for usage in hospitals in this thesis. The omnidirectional camera on the robot allows it to recognize and track the person who is following it. Furthermore, when directing the individual to their preferred position in the hospital, the robot must be aware of its surroundings and avoid accidents with other people or items. To train and evaluate our robot's performance, we developed an auto-labeling framework for creating a dataset of panoramic videos captured by the robot's omnidirectional camera. We labeled each person in the video and their real position in the robot's frame, enabling us to evaluate the accuracy of our tracking system and guide the development of the robot's navigation algorithms. Our research expands on earlier work that has established a framework for tracking individuals using omnidirectional cameras. We want to contribute to the continuing work to enhance the precision and dependability of these tracking systems, which is essential for the creation of efficient guiding robots in healthcare facilities, by developing a benchmark dataset. Our research has the potential to improve the patient experience and increase the efficiency of healthcare institutions by reducing staff time spent guiding patients through the facility

    A Case Study on the Advantages of 3D Walkthroughs over Photo Stitching Techniques

    Get PDF
    Virtual tours and interactive walkthroughs enable a more in-depth platform for communicating information. Many current techniques employ the use of Photo Stitching to accomplish this. However, over the last decade advancements in computing power and the accessibility of game engines, meant that developing rich 3D content for virtual tours is more possible than ever before. As such, the purpose of this paper is to present a study into the advantages of developing an interactive 3D virtual tour of student facilities, using the Unreal Development 4 Game Engine, for educational establishments. The project aims to demonstrate a comparison between the use of Photo Stitching and 3D Modelled interactive walkthrough for developing rich visual environments. The research reveals that the approach in this paper can improve educational facilities prominence within universities, and contains many advantages over Photo Stitching techniques

    Deep neural network for city mapping using Google Street View data

    Get PDF
    S rozvojem výpočetní síly a rozsáhlýmidatovými soubory vede masivní zlepšeníhluboké neuronové sítě k mnoha rozšíře-ným aplikacím. Jednou z aplikací hlubokéneuronové sítě je řešení problémů počíta-čového vidění, jako je klasifikace a segmen-tace. Soutěž jako ImageNet Výzva provizuální rozpoznávání ve velkém měřítku posunula schopnost na další úroveň;v některých případech je klasifikace lepšínež lidská.Tato práce je příkladem aplikace vyu-žívající schopnost neuronových sítí. Do-kument popisuje implementaci, metodiku,experimenty prováděné pro vývoj softwa-rových řešení pomocí hluboké neuronovésítě na obrázkových prostředcích z ob-rázků Google Street View .Uživatel poskytuje soubor geojson se-stávající z oblasti zájmu ve tvaru čtvercenebo mnohoúhelníku jako vstup. GoogleStreetView API stáhne dostupné ob-rázky. Snímky jsou nejprve zpracoványpomocí nejmodernějších CNN (Mask R-CNN), aby detekovaly objekty, kla-sifikovaly je pomocí skóre spolehlivosti,vytvořily ohraničující rámeček a kolemdetekovaného objektu malovaly pixely. .Textový soubor ukládá informace, jakojsou souřadnice ohraničovacího rámečku,název třídy a hodnoty masky.Obyčejný RGB (panoramatický) sní-mek z GSV neobsahuje žádné hloubkovéúdaje. Obrázky jsou zpracovávány s jinýmnejmodernějším CNN (monodepth2),aby se odhadla hloubka objektů v obra-zech po pixelech.Průměrná hodnota hloubky v masce sepoužívá jako vzdálenost objektu. Souřad-nice ohraničovacího rámečku se používajípro umístění objektu v jiných osách.Výsledné výstupy jsou markery deteko-vaných objektů, které jsou základem mapy.Sloupcový graf pro vizualizaci počtu de-tekcí ve třídě. Textový soubor obsahujícípočet detekcí pro každou třídu. Výstupz každého kroku zpracování výše, jakojsou detekce, hloubkové obrázky, hodnotymasky pro porovnání a vyhodnocení.With the advancement of computation power and large datasets, a massive improvement of the deep neural network leads to many widespread applications. One of the applications of the deep neural network is solving computer vision problems like classification and segmentation.Competition like ImageNet Large Scale Visual Recognition Challenge, took the capability to the next level; in some cases, classification is better than human. This thesis is an example of an application that utilizes the ability of neural networks. The document describes the implementation, methodology, experiments done for developing software solutions by using the deep neural network on image resources form Google Street View images. The user provides a geojson file consists of an area of interest in the form of square or polygon as the input. Google StreetView API downloads the available images. The images are first processed with the state of the art CNN (Mask R-CNN) to detect the objects, classify them with the confidence score, generate a bounding box, and a pixel-wise mask around the detected object. The text file stores information like coordinates of the bounding box, name of the class, and the mask values. An ordinary RGB ( panoramic ) image from GSV does not consist of any depth data. The images are processed with another state of art CNN (monodepth2), to estimate the pixel-wise depth of the objects in the images. The averaged value of the depth within the mask is used as the distance of the object. The coordinates of the bounding box are used for positioning of the object in other axes. The resulting outputs are markers of detected objects underlying in the map. A bar graph to visualize the number of detection per class. A text file containing the number of detection per each class. The output from each processing step above, like detections, depth images, mask values to compare and evaluate

    Action recognition from RGB-D data

    Get PDF
    In recent years, action recognition based on RGB-D data has attracted increasing attention. Different from traditional 2D action recognition, RGB-D data contains extra depth and skeleton modalities. Different modalities have their own characteristics. This thesis presents seven novel methods to take advantages of the three modalities for action recognition. First, effective handcrafted features are designed and frequent pattern mining method is employed to mine the most discriminative, representative and nonredundant features for skeleton-based action recognition. Second, to take advantages of powerful Convolutional Neural Networks (ConvNets), it is proposed to represent spatio-temporal information carried in 3D skeleton sequences in three 2D images by encoding the joint trajectories and their dynamics into color distribution in the images, and ConvNets are adopted to learn the discriminative features for human action recognition. Third, for depth-based action recognition, three strategies of data augmentation are proposed to apply ConvNets to small training datasets. Forth, to take full advantage of the 3D structural information offered in the depth modality and its being insensitive to illumination variations, three simple, compact yet effective images-based representations are proposed and ConvNets are adopted for feature extraction and classification. However, both of previous two methods are sensitive to noise and could not differentiate well fine-grained actions. Fifth, it is proposed to represent a depth map sequence into three pairs of structured dynamic images at body, part and joint levels respectively through bidirectional rank pooling to deal with the issue. The structured dynamic image preserves the spatial-temporal information, enhances the structure information across both body parts/joints and different temporal scales, and takes advantages of ConvNets for action recognition. Sixth, it is proposed to extract and use scene flow for action recognition from RGB and depth data. Last, to exploit the joint information in multi-modal features arising from heterogeneous sources (RGB, depth), it is proposed to cooperatively train a single ConvNet (referred to as c-ConvNet) on both RGB features and depth features, and deeply aggregate the two modalities to achieve robust action recognition

    Object detection, distributed cloud computing and parallelization techniques for autonomous driving systems.

    Get PDF
    Autonomous vehicles are increasingly becoming a necessary trend towards building the smart cities of the future. Numerous proposals have been presented in recent years to tackle particular aspects of the working pipeline towards creating a functional end-to-end system, such as object detection, tracking, path planning, sentiment or intent detection, amongst others. Nevertheless, few efforts have been made to systematically compile all of these systems into a single proposal that also considers the real challenges these systems will have on the road, such as real-time computation, hardware capabilities, etc. This paper reviews the latest techniques towards creating our own end-to-end autonomous vehicle system, considering the state-of-the-art methods on object detection, and the possible incorporation of distributed systems and parallelization to deploy these methods. Our findings show that while techniques such as convolutional neural networks, recurrent neural networks, and long short-term memory can effectively handle the initial detection and path planning tasks, more efforts are required to implement cloud computing to reduce the computational time that these methods demand. Additionally, we have mapped different strategies to handle the parallelization task, both within and between the networks

    Deep Learning for Image Analysis in Satellite and Traffic Applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen
    corecore