125 research outputs found

    Monocular 3D Object Detection via Ego View-to-Bird’s Eye View Translation

    Get PDF
    The advanced development in autonomous agents like self-driving cars can be attributed to computer vision, a branch of artificial intelligence that enables software to understand the content of image and video. These autonomous agents require a three-dimensional modelling of its surrounding in order to operate reliably in the real-world. Despite the significant progress of 2D object detectors, they have a critical limitation in location sensitive applications as they do not provide accurate physical information of objects in 3D space. 3D object detection is a promising topic that can provide relevant solutions which could improve existing 2D based applications. Due to the advancements in deep learning methods and relevant datasets, the task of 3D scene understanding has evolved greatly in the past few years. 3D object detection and localization are crucial in autonomous driving tasks such as obstacle avoidance, path planning and motion control. Traditionally, there have been successful methods towards 3D object detection but they rely on highly expensive 3D LiDAR sensors for accurate depth information. On the other hand, 3D object detection from single monocular images is inexpensive but lacks in accuracy. The primary reason for such a disparity in performance is that the monocular image-based methods attempt at inferring 3D information from 2D images. In this work, we try to bridge the performance gap observed in single image input by introducing different mapping strategies between the 2D image data and its corresponding 3D representation and use it to perform object detection in 3D. The performance of the proposed method is evaluated on the popular KITTI 3D object detection benchmark dataset

    An Approach Of Features Extraction And Heatmaps Generation Based Upon Cnns And 3D Object Models

    Get PDF
    The rapid advancements in artificial intelligence have enabled recent progress of self-driving vehicles. However, the dependence on 3D object models and their annotations collected and owned by individual companies has become a major problem for the development of new algorithms. This thesis proposes an approach of directly using graphics models created from open-source datasets as the virtual representation of real-world objects. This approach uses Machine Learning techniques to extract 3D feature points and to create annotations from graphics models for the recognition of dynamic objects, such as cars, and for the verification of stationary and variable objects, such as buildings and trees. Moreover, it generates heat maps for the elimination of stationary/variable objects in real-time images before working on the recognition of dynamic objects. The proposed approach helps to bridge the gap between the virtual and physical worlds and to facilitate the development of new algorithms for self-driving vehicles

    Software Porting of a 3D Reconstruction Algorithm to Razorcam Embedded System on Chip

    Get PDF
    A method is presented to calculate depth information for a UAV navigation system from Keypoints in two consecutive image frames using a monocular camera sensor as input and the OpenCV library. This method was first implemented in software and run on a general-purpose Intel CPU, then ported to the RazorCam Embedded Smart-Camera System and run on an ARM CPU onboard the Xilinx Zynq-7000. The results of performance and accuracy testing of the software implementation are then shown and analyzed, demonstrating a successful port of the software to the RazorCam embedded system on chip that could potentially be used onboard a UAV with tight constraints of size, weight, and power. The potential impacts will be seen through the continuation of this research in the Smart ES lab at University of Arkansas

    Single and multiple stereo view navigation for planetary rovers

    Get PDF
    © Cranfield UniversityThis thesis deals with the challenge of autonomous navigation of the ExoMars rover. The absence of global positioning systems (GPS) in space, added to the limitations of wheel odometry makes autonomous navigation based on these two techniques - as done in the literature - an inviable solution and necessitates the use of other approaches. That, among other reasons, motivates this work to use solely visual data to solve the robot’s Egomotion problem. The homogeneity of Mars’ terrain makes the robustness of the low level image processing technique a critical requirement. In the first part of the thesis, novel solutions are presented to tackle this specific problem. Detection of robust features against illumination changes and unique matching and association of features is a sought after capability. A solution for robustness of features against illumination variation is proposed combining Harris corner detection together with moment image representation. Whereas the first provides a technique for efficient feature detection, the moment images add the necessary brightness invariance. Moreover, a bucketing strategy is used to guarantee that features are homogeneously distributed within the images. Then, the addition of local feature descriptors guarantees the unique identification of image cues. In the second part, reliable and precise motion estimation for the Mars’s robot is studied. A number of successful approaches are thoroughly analysed. Visual Simultaneous Localisation And Mapping (VSLAM) is investigated, proposing enhancements and integrating it with the robust feature methodology. Then, linear and nonlinear optimisation techniques are explored. Alternative photogrammetry reprojection concepts are tested. Lastly, data fusion techniques are proposed to deal with the integration of multiple stereo view data. Our robust visual scheme allows good feature repeatability. Because of this, dimensionality reduction of the feature data can be used without compromising the overall performance of the proposed solutions for motion estimation. Also, the developed Egomotion techniques have been extensively validated using both simulated and real data collected at ESA-ESTEC facilities. Multiple stereo view solutions for robot motion estimation are introduced, presenting interesting benefits. The obtained results prove the innovative methods presented here to be accurate and reliable approaches capable to solve the Egomotion problem in a Mars environment

    Contribuciones a la estimación de la pose de la cámara en aplicaciones industriales de realidad aumentada

    Get PDF
    Augmented Reality (AR) aims to complement the visual perception of the user environment superimposing virtual elements. The main challenge of this technology is to combine the virtual and real world in a precise and natural way. To carry out this goal, estimating the user position and orientation in both worlds at all times is a crucial task. Currently, there are numerous techniques and algorithms developed for camera pose estimation. However, the use of synthetic square markers has become the fastest, most robust and simplest solution in these cases. In this scope, a big number of marker detection systems have been developed. Nevertheless, most of them presents some limitations, (1) their unattractive and non-customizable visual appearance prevent their use in industrial products and (2) the detection rate is drastically reduced in presence of noise, blurring and occlusions. In this doctoral dissertation the above-mentioned limitations are addressed. In first place, a comparison has been made between the different marker detection systems currently available in the literature, emphasizing the limitations of each. Secondly, a novel approach to design, detect and track customized markers capable of easily adapting to the visual limitations of commercial products has been developed. In third place, a method that combines the detection of black and white square markers with keypoints and contours has been implemented to estimate the camera position in AR applications. The main motivation of this work is to offer a versatile alternative (based on contours and keypoints) in cases where, due to noise, blurring or occlusions, it is not possible to identify markers in the images. Finally, a method for reconstruction and semantic segmentation of 3D objects using square markers in photogrammetry processes has been presented.La Realidad Aumentada (AR) tiene como objetivo complementar la percepción visual del entorno circunstante al usuario mediante la superposición de elementos virtuales. El principal reto de dicha tecnología se basa en fusionar, de forma precisa y natural, el mundo virtual con el mundo real. Para llevar a cabo dicha tarea, es de vital importancia conocer en todo momento tanto la posición, así como la orientación del usuario en ambos mundos. Actualmente, existen un gran número de técnicas de estimación de pose. No obstante, el uso de marcadores sintéticos cuadrados se ha convertido en la solución más rápida, robusta y sencilla utilizada en estos casos. En este ámbito de estudio, existen un gran número de sistemas de detección de marcadores ampliamente extendidos. Sin embargo, su uso presenta ciertas limitaciones, (1) su aspecto visual, poco atractivo y nada customizable impiden su uso en ciertos productos industriales en donde la personalización comercial es un aspecto crucial y (2) la tasa de detección se ve duramente decrementada ante la presencia de ruido, desenfoques y oclusiones Esta tesis doctoral se ocupa de las limitaciones anteriormente mencionadas. En primer lugar, se ha realizado una comparativa entre los diferentes sistemas de detección de marcadores actualmente en uso, enfatizando las limitaciones de cada uno. En segundo lugar, se ha desarrollado un novedoso enfoque para diseñar, detectar y trackear marcadores personalizados capaces de adaptarse fácilmente a las limitaciones visuales de productos comerciales. En tercer lugar, se ha implementado un método que combina la detección de marcadores cuadrados blancos y negros con keypoints y contornos, para estimar de la posición de la cámara en aplicaciones AR. La principal motivación de este trabajo se basa en ofrecer una alternativa versátil (basada en contornos y keypoints) en aquellos casos donde, por motivos de ruido, desenfoques u oclusiones no sea posible identificar marcadores en las imágenes. Por último, se ha desarrollado un método de reconstrucción y segmentación semántica de objetos 3D utilizando marcadores cuadrados en procesos de fotogrametría

    3D Human Body Pose-Based Activity Recognition for Driver Monitoring Systems

    Get PDF

    Design and Evaluation of Data Dissemination Algorithms to Improve Object Detection in Autonomous Driving Networks

    Get PDF
    In the last few years, the amount of information that is produced by an autonomous vehicle is increasing proportionally with the number and resolution of sensors that cars are equipped with. Cars can be provided with cameras and Light Detection and Ranging (LiDAR) sensors, respectively needed to obtain a two-dimensional (2D) and three-dimensional (3D) representation of the environment. Due to the huge amount of data that multiple self-driving vehicles can push over a communication network, how these data are selected, stored, and sent is crucial. Various techniques have been developed to manage vehicular data; for example, compression can be used to alleviate the burden of data transmission over bandwidth-constrained channels and facilitate real-time communications. However, aggressive levels of compression may corrupt automotive data, and prevent proper detection of critical road objects in the scene. Along these lines, in this thesis, we studied the trade-off between compression efficiency and accuracy. To do so, we considered synthetic automotive data generated from the SELMA dataset. Then, we compared the performance of several state-of-the-art algorithms, based on machine learning, for compressing and detecting objects on LiDAR point clouds. We were able to reduce the point cloud by tens to hundreds times without any significant loss in the final detection accuracy. In a second phase, we focused our attention on the optimization of the number and type of sensors that are more meaningful to object detection operations. Notably, we tested our dataset on a sensor fusion algorithm that can combine both 2D and 3D data to have a better understanding of the environment. The results show that, although sensor fusion always achieves more accurate detections, using 3D inputs only can obtain similar results for large objects while mitigating the burden on the channel.In the last few years, the amount of information that is produced by an autonomous vehicle is increasing proportionally with the number and resolution of sensors that cars are equipped with. Cars can be provided with cameras and Light Detection and Ranging (LiDAR) sensors, respectively needed to obtain a two-dimensional (2D) and three-dimensional (3D) representation of the environment. Due to the huge amount of data that multiple self-driving vehicles can push over a communication network, how these data are selected, stored, and sent is crucial. Various techniques have been developed to manage vehicular data; for example, compression can be used to alleviate the burden of data transmission over bandwidth-constrained channels and facilitate real-time communications. However, aggressive levels of compression may corrupt automotive data, and prevent proper detection of critical road objects in the scene. Along these lines, in this thesis, we studied the trade-off between compression efficiency and accuracy. To do so, we considered synthetic automotive data generated from the SELMA dataset. Then, we compared the performance of several state-of-the-art algorithms, based on machine learning, for compressing and detecting objects on LiDAR point clouds. We were able to reduce the point cloud by tens to hundreds times without any significant loss in the final detection accuracy. In a second phase, we focused our attention on the optimization of the number and type of sensors that are more meaningful to object detection operations. Notably, we tested our dataset on a sensor fusion algorithm that can combine both 2D and 3D data to have a better understanding of the environment. The results show that, although sensor fusion always achieves more accurate detections, using 3D inputs only can obtain similar results for large objects while mitigating the burden on the channel
    • …
    corecore