    Cyclist detection in LIDAR scans using faster R-CNN and synthetic depth images

    Deep Learning Based Methods for Outdoor Robot Localization and Navigation

    The number of elderly people is increasing around the globe. In order to support the growing of ageing society, mobile robot is one of viable choices for assisting the elders in their daily activities. These activities happen in any places, either indoor or outdoor. Although outdoor activities benefit the elders in many ways, outdoor environments contain difficulties from their unpredictable natures. Mobile robots for supporting humans in outdoor environments must automatically traverse through various difficulties in the environments using suitable navigation systems.Core components of mobile robots always include the navigation segments. Navigation system helps guiding the robot to its destination where it can perform its designated tasks. There are various tools to be chosen for navigation systems. Outdoor environments are mostly open for conventional navigation tools such as Global Positioning System (GPS) devices. In this thesis three systems for localization and navigation of mobile robots based on visual data and deep learning algorithms are proposed. The first localization system is based on landmark detection. The Faster Regional-Convolutional Neural Network (Faster R-CNN) detects landmarks and signs in the captured image. A Feed-Forward Neural Network (FFNN) is trained to determine robot location coordinates and compass orientation from detected landmarks. The dataset consists of images, geolocation data and labeled bounding boxes to train and test two proposed localization methods. Results are illustrated with absolute errors from the comparisons between localization results and reference geolocation data in the dataset. The second system is the navigation system based on visual data and a deep reinforcement learning algorithm called Deep Q Network (DQN). The employed DQN automatically guides the mobile robot with visual data in the form of images, which received from the only Universal Serial Bus (USB) camera that attached to the robot. DQN consists of a deep neural network called convolutional neural network (CNN), and a reinforcement learning algorithm named Q-Learning. It can make decisions with visual data as input, using experiences from consequences of trial-and-error attempts. Our DQN agents are trained in the simulation environments provided by a platform based on a First-Person Shooter (FPS) game named ViZDoom. Simulation is implemented for training to avoid any possible damage on the real robot during trial-and-error process. Perspective from the simulation is the same as if a camera is attached to the front of the mobile robot. There are many differences between the simulation and the real world. We applied a markerbased Augmented Reality (AR) algorithm to reduce differences between the simulation and the world by altering visual data from the camera with resources from the simulation.The second system is assigned the task of simple navigation to the robot, in which the starting location is fixed but the goal location is random in the designated zone. The robot must be able to detect and track the goal object using a USB camera as its only sensor. Once started, the robot must move from its starting location to the designated goal object. Our DQN navigation method is tested in the simulation and on the real robot. Performances of our DQN are measured quantitatively via average total scores and the number of success navigation attempts. The results show that our DQN can effectively guide a mobile robot to the goal object of the simple navigation tasks, for both the simulation and the real world.The third system employs a Transfer Learning (TL) strategy to reduce training time and resources required for the training of newly added tasks of DQN agents. The new task is the task of reaching the goal while also avoiding obstacles at the same time. Additionally, the starting and the goal locations are all random within the specified areas. The employed transfer learning strategy uses the whole network of the DQN agent trained for the first simple navigation task as the base for training the DQN agent for the second task. The training in our TL strategy decrease the exploration factor, which cause the agent to rely on the existing knowledge from the base network more than randomly selecting actions during the training. It results in the decreased training time, in which optimal solutions can be found faster than training from scratch.We evaluate performances of our TL strategy by comparing the DQN agents trained with our TL at different exploration factor values and the DQN agent trained from scratch. Additionally, agents trained from our TL are trained with the decreased number of episodes to extensively display performances of our TL agents. All DQN agents for the second navigation task are tested in the simulation to avoid any possible and uncontrollable damages from the obstacles. Performances are measured through success attempts and average total scores, same as in the first navigation task. Results show that DQN agents trained via the TL strategy can greatly outperform the agent trained from scratch, despite the lower number of training episodes.博士(工学)法政大学 (Hosei University

    Arquitectura de detección de actividades criminales basada en análisis de vídeo en tiempo real

    [ES] Esta tesis doctoral propone el desarrollo de una arquitectura para sistema de detección de actividades criminales en vídeo aplicado a sistemas de mando y control para seguridad ciudadana. Este sistema está basado en la técnica de Deep Learning Faster R-CNN y tiene el novedoso enfoque de tratar las acciones criminales como los hurtos callejeros, en donde pueden ser identificados objetos como evidencia en una escena de vídeo. Esta tesis muestra el desarrollo de dicha aplicación, que demuestra ser efectiva, identificando la manera de reducir el costo computacional del análisis de vídeo cuadro a cuadro obteniendo rendimientos congruentes con las tasas de cuadros por segundo generados por cámaras de sistema de vídeo vigilancia ciudadana. También es objeto de estudio una posible implementación en el sistema de seguridad ciudadana de la Policía Nacional de Colombia.[EN] This doctoral thesis proposes the development of a system to detect criminal activities in video applied to command and control systems for citizen security. This system is based on the Deep Learning technique called Faster R-CNN and has the novel approach of treating criminal actions like street thefts as objects that can be identified in a video scene. This thesis shows the development of this application and the way to reduce the computational cost of the video analysis frame by frame, obtaining performances congruent with the frame rate generated by citizen video surveillance system cameras. There is also a possible implementation in the citizen security system of the National Police of Colombia is being studied.[CA] Esta tesi doctoral proposa el desenrotllament d'una arquitectura per a sistema de detecció d'activitats criminals en vídeo aplicat a sistemes de comandament i control per a seguretat ciutadana. Este sistema està basat en la tècnica de Deep Learning Faster R-CNN i té el nou enfocament de tractar les accions criminals com les afanades guies de carrers com a objectes que poden ser identificats en una escena de vídeo. Esta tesi mostra el desenrotllament de la dita aplicació, que demostra ser efectiva, identificant la manera de reduir el cost computacional de l'anàlisi de vídeo quadro a quadro obtenint rendiments congruents amb les taxes de cuados per segon generats per cambres de sistema de vídeo vigilància ciutadana. També s'estudia una possible implementació en el sistema de seguretat ciutadana de la Policia Nacional de Colòmbia.Suárez Páez, JE. (2020). Arquitectura de detección de actividades criminales basada en análisis de vídeo en tiempo real [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/153162TESI

    Exploring the challenges and opportunities of image processing and sensor fusion in autonomous vehicles: A comprehensive review

    Autonomous vehicles are at the forefront of future transportation solutions, but their success hinges on reliable perception. This review paper surveys image processing and sensor fusion techniques vital for ensuring vehicle safety and efficiency. The paper focuses on object detection, recognition, tracking, and scene comprehension via computer vision and machine learning methodologies. In addition, the paper explores challenges within the field, such as robustness in adverse weather conditions, the demand for real-time processing, and the integration of complex sensor data. Furthermore, we examine localization techniques specific to autonomous vehicles. The results show that while substantial progress has been made in each subfield, there are persistent limitations. These include a shortage of comprehensive large-scale testing, the absence of diverse and robust datasets, and occasional inaccuracies in certain studies. These issues impede the seamless deployment of this technology in real-world scenarios. This comprehensive literature review contributes to a deeper understanding of the current state and future directions of image processing and sensor fusion in autonomous vehicles, aiding researchers and practitioners in advancing the development of reliable autonomous driving systems

    A Robust Object Detection System for Driverless Vehicles through Sensor Fusion and Artificial Intelligence Techniques

    Since the early 1990s, various research domains have been concerned with the concept of autonomous driving, leading to the widespread implementation of numerous advanced driver assistance features. However, fully automated vehicles have not yet been introduced to the market. The process of autonomous driving can be outlined through the following stages: environment perception, ego-vehicle localization, trajectory estimation, path planning, and vehicle control. Environment perception is partially based on computer vision algorithms that can detect and track surrounding objects. The process of objects detection performed by autonomous vehicles is considered challenging for several reasons, such as the presence of multiple dynamic objects in the same scene, interaction between objects, real-time speed requirements, and the presence of diverse weather conditions (e.g., rain, snow, fog, etc.). Although many studies have been conducted on objects detection performed by autonomous vehicles, it remains a challenging task, and improving the performance of object detection in diverse driving scenes is an ongoing field. This thesis aims to develop novel methods for the detection and 3D localization of surrounding dynamic objects in driving scenes in different rainy weather conditions. In this thesis, firstly, owing to the frequent occurrence of rain and its negative effect on the performance of objects detection operation, a real-time lightweight deraining network is proposed; it works on single real-time images separately. Rain streaks and the accumulation of rain streaks introduce distinct visual degradation effects to captured images. The proposed deraining network effectively removes both rain streaks and accumulated rain streaks from images. It makes use of the progressive operation of two main stages: rain streaks removal and rain streaks accumulation removal. The rain streaks removal stage is based on a Residual Network (ResNet) to maintain real-time performance and avoid adding to the computational complexity. Furthermore, the application of recursive computations involves the sharing of network parameters. Meanwhile, distant rain streaks accumulate and induce a distortion similar to fogging. Thus, it could be mitigated in a way similar to defogging. This stage relies on a transmission-guided lightweight network (TGL-Net). The proposed deraining network was evaluated on five datasets having synthetic rain of different properties and two other datasets with real rainy scenes. Secondly, an emphasis has been put on proposing a novel sensory system that achieves realtime multiple dynamic objects detection in driving scenes. The proposed sensory system utilizes a monocular camera and a 2D Light Detection and Ranging (LiDAR) sensor in a complementary fusion approach. YOLOv3- a baseline real-time object detection algorithm has been used to detect and classify objects in images captured by the camera; detected objects are surrounded by bounding boxes to localize them within the frames. Since objects present in a driving scene are dynamic and usually occluding each other, an algorithm has been developed to differentiate objects whose bounding boxes are overlapping. Moreover, the locations of bounding boxes within frames (in pixels) are converted into real-world angular coordinates. A 2D LiDAR was used to obtain depth measurements while maintaining low computational requirements in order to save resources for other autonomous driving related operations. A novel technique has been developed and tested for processing and mapping 2D LiDAR measurements with corresponding bounding boxes. The detection accuracy of the proposed system was manually evaluated in different real-time scenarios. Finally, the effectiveness of the proposed deraining network was validated in terms of its impact on objects detection in the context of de-rained images. Results of the proposed deraining network were compared to existing baseline deraining networks and have shown that the running time of the proposed network is 2.23× faster than the average running time of baseline deraining networks while achieving 1.2× improvement when tested on different synthetic datasets. Moreover, tests on the LiDAR measurements showed an average error of ±0.04m in real driving scenes. Also, both deraining and objects detection are jointly tested, and it was demonstrated that performing deraining ahead of objects detection caused 1.45× enhancement in the object detection precision