200 research outputs found

    Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications

    Get PDF
    It’s critical for an autonomous vehicle to acquire accurate and real-time information of the objects in its vicinity, which will fully guarantee the safety of the passengers and vehicle in various environment. 3D LIDAR can directly obtain the position and geometrical structure of the object within its detection range, while vision camera is very suitable for object recognition. Accordingly, this paper presents a novel object detection and identification method fusing the complementary information of two kind of sensors. We first utilize the 3D LIDAR data to generate accurate object-region proposals effectively. Then, these candidates are mapped into the image space where the regions of interest (ROI) of the proposals are selected and input to a convolutional neural network (CNN) for further object recognition. In order to identify all sizes of objects precisely, we combine the features of the last three layers of the CNN to extract multi-scale features of the ROIs. The evaluation results on the KITTI dataset demonstrate that : (1) Unlike sliding windows that produce thousands of candidate object-region proposals, 3D LIDAR provides an average of 86 real candidates per frame and the minimal recall rate is higher than 95%, which greatly lowers the proposals extraction time; (2) The average processing time for each frame of the proposed method is only 66.79ms, which meets the real-time demand of autonomous vehicles; (3) The average identification accuracies of our method for car and pedestrian on the moderate level are 89.04% and 78.18% respectively, which outperform most previous methods

    A Voting Algorithm for Dynamic Object Identification and Pose Estimation

    Get PDF
    While object identification enables autonomous vehicles to detect and recognize objects from real-time images, pose estimation further enhances their capability of navigating in a dynamically changing environment. This thesis proposes an approach which makes use of keypoint features from 3D object models for recognition and pose estimation of dynamic objects in the context of self-driving vehicles. A voting technique is developed to vote out a suitable model from the repository of 3D models that offers the best match with the dynamic objects in the input image. The matching is done based on the identified keypoints on the image and the keypoints corresponding to each template model stored in the repository. A confidence score value is then assigned to measure the confidence with which the system can confirm the presence of the matched object in the input image. Being dynamic objects with complex structure, human models in the COCO-DensePose dataset, along with the DensePose deep-learning model developed by the Facebook research team, have been adopted and integrated into the system for 3D pose estimation of pedestrians on the road. Additionally, object tracking is performed to find the speed and location details for each of the recognized dynamic objects from consecutive image frames of the input video. This research demonstrates with experimental results that the use of 3D object models enhances the confidence of recognition and pose estimation of dynamic objects in the real-time input image. The 3D pose information of the recognized dynamic objects along with their corresponding speed and location information would help the autonomous navigation system of the self-driving cars to take appropriate navigation decisions, thus ensuring smooth and safe driving

    Scene categorization with multiscale category-specific visual words

    Get PDF
    We propose a novel scene categorization method based on multiscale category-specific visual words. The novelty of the proposed method lies In two aspects: (1) visual words are quantized In a multiscale manner that combines the global-feature-based and local-feature-based scene categorization approaches into a uniform framework; (2) unlike traditional visual word creation methods, which quantize visual words from the entire set of training, we form visual words from the training images grouped in different categories and then collate visual words from different categories to form the final codebook. This generation strategy Is capable of enhancing the discriminative ability of the visual words, which is useful for achieving better classification performance. The proposed method is evaluated over two scene classification data sets with 8 and 13 scene categories, respectively. The experimental results show that the classification performance is significantly improved by using the multiscale category-specific visual words over that achieved by using the traditional visual words. Moreover, the proposed method Is comparable with the best methods reported in previous literature in terms of classification accuracy rate (88.81% and 85.05% accuracy rates for data sets 1 and 2, respectively) and has the advantage in simplicity. © 2009 Society of Photo Optical Instrumentation Engineers.published_or_final_versio

    Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

    Get PDF
    Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE

    Organising and structuring a visual diary using visual interest point detectors

    Get PDF
    As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individual’s photographs. Microsoft’s SenseCam, a device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter. We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue. Although there is a significant volume of work in the literature in the object detection and recognition and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data

    Deep learning-based vessel detection from very high and medium resolution optical satellite images as component of maritime surveillance systems

    Get PDF
    This thesis presents an end-to-end multiclass vessel detection method from optical satellite images. The proposed workflow covers the complete processing chain and involves rapid image enhancement techniques, the fusion with automatic identification system (AIS) data, and the detection algorithm based on convolutional neural networks (CNN). The algorithms presented are implemented in the form of independent software processors and integrated in an automated processing chain as part of the Earth Observation Maritime Surveillance System (EO-MARISS).In der vorliegenden Arbeit wird eine Methode zur Detektion von Schiffen unterschiedlicher Klassen in optischen Satellitenbildern vorgestellt. Diese gliedert sich in drei aufeinanderfolgende Funktionen: i) die Bildbearbeitung zur Verbesserung der Bildeigenschaften, ii) die Datenfusion mit den Daten des Automatischen Identifikation Systems (AIS) und iii) dem auf „Convolutional Neural Network“ (CNN) basierenden Detektionsalgorithmus. Die vorgestellten Algorithmen wurden in Form eigenständiger Softwareprozessoren implementiert und als Teil des maritimen Erdbeobachtungssystems integriert

    Object detection, recognition and classification using computer vision and artificial intelligence approaches

    Get PDF
    Object detection and recognition has been used extensively in recent years to solve numerus challenges in different fields. Due to the vital roles they play, object detection and recognition has enabled quantum leaps in many industry fields by helping to overcome some serious challenges and obstacles. For example, worldwide security concerns have drawn the attention and stimulated the use of highly intelligent computer vision technology to provide security in different environments and in diverse terrains. In addition, some wildlife is at present exposed to danger and extinction worldwide. Therefore, early detection and recognition of potential threats to wildlife have become essential and timely. The extent of using computer vision and artificial intelligence to convert the seemingly insecure world to a more secure one has been widely accepted. Such technologies are used in monitoring, tracking, organising, analysing objects in a scene and for a number of other countless purposes. [Continues.

    Multi-Modal Learning For Adaptive Scene Understanding

    Get PDF
    Modern robotics systems typically possess sensors of different modalities. Segmenting scenes observed by the robot into a discrete set of classes is a central requirement for autonomy. Equally, when a robot navigates through an unknown environment, it is often necessary to adjust the parameters of the scene segmentation model to maintain the same level of accuracy in changing situations. This thesis explores efficient means of adaptive semantic scene segmentation in an online setting with the use of multiple sensor modalities. First, we devise a novel conditional random field(CRF) inference method for scene segmentation that incorporates global constraints, enforcing particular sets of nodes to be assigned the same class label. To do this efficiently, the CRF is formulated as a relaxed quadratic program whose maximum a posteriori(MAP) solution is found using a gradient-based optimization approach. These global constraints are useful, since they can encode "a priori" information about the final labeling. This new formulation also reduces the dimensionality of the original image-labeling problem. The proposed model is employed in an urban street scene understanding task. Camera data is used for the CRF based semantic segmentation while global constraints are derived from 3D laser point clouds. Second, an approach to learn CRF parameters without the need for manually labeled training data is proposed. The model parameters are estimated by optimizing a novel loss function using self supervised reference labels, obtained based on the information from camera and laser with minimum amount of human supervision. Third, an approach that can conduct the parameter optimization while increasing the model robustness to non-stationary data distributions in the long trajectories is proposed. We adopted stochastic gradient descent to achieve this goal by using a learning rate that can appropriately grow or diminish to gain adaptability to changes in the data distribution

    Machine vision for UAS ground operations: using semantic segmentation with a bayesian network classifier

    Get PDF
    This paper discusses the machine vision element of a system designed to allow Unmanned Aerial System (UAS) to perform automated taxiing around civil aerodromes, with only a monocular camera. The purpose of the computer vision system is to provide direct sensor data which can be used to validate vehicle position, in addition to detecting potential collision risks. In practice, untrained clustering is used to segment the visual feed before descriptors of each cluster (primarily colour and texture) are used to estimate the class. As the competency of each individual estimate can vary dependent on multiple factors (number of pixels, lighting conditions and even surface type). A Bayesian network is used to perform probabilistic data fusion, in order to improve the classification results. This result is shown to perform accurate image segmentation in real-world conditions, providing information viable for localisation and obstacle detection
    • …
    corecore