12 research outputs found

    Semantic 3D Occupancy Mapping through Efficient High Order CRFs

    Full text link
    Semantic 3D mapping can be used for many applications such as robot navigation and virtual interaction. In recent years, there has been great progress in semantic segmentation and geometric 3D mapping. However, it is still challenging to combine these two tasks for accurate and large-scale semantic mapping from images. In the paper, we propose an incremental and (near) real-time semantic mapping system. A 3D scrolling occupancy grid map is built to represent the world, which is memory and computationally efficient and bounded for large scale environments. We utilize the CNN segmentation as prior prediction and further optimize 3D grid labels through a novel CRF model. Superpixels are utilized to enforce smoothness and form robust P N high order potential. An efficient mean field inference is developed for the graph optimization. We evaluate our system on the KITTI dataset and improve the segmentation accuracy by 10% over existing systems.Comment: IROS 201

    Network Uncertainty Informed Semantic Feature Selection for Visual SLAM

    Full text link
    In order to facilitate long-term localization using a visual simultaneous localization and mapping (SLAM) algorithm, careful feature selection can help ensure that reference points persist over long durations and the runtime and storage complexity of the algorithm remain consistent. We present SIVO (Semantically Informed Visual Odometry and Mapping), a novel information-theoretic feature selection method for visual SLAM which incorporates semantic segmentation and neural network uncertainty into the feature selection pipeline. Our algorithm selects points which provide the highest reduction in Shannon entropy between the entropy of the current state and the joint entropy of the state, given the addition of the new feature with the classification entropy of the feature from a Bayesian neural network. Each selected feature significantly reduces the uncertainty of the vehicle state and has been detected to be a static object (building, traffic sign, etc.) repeatedly with a high confidence. This selection strategy generates a sparse map which can facilitate long-term localization. The KITTI odometry dataset is used to evaluate our method, and we also compare our results against ORB_SLAM2. Overall, SIVO performs comparably to the baseline method while reducing the map size by almost 70%.Comment: Published in: 2019 16th Conference on Computer and Robot Vision (CRV

    Visual Semantic SLAM with Landmarks for Large-Scale Outdoor Environment

    Full text link
    Semantic SLAM is an important field in autonomous driving and intelligent agents, which can enable robots to achieve high-level navigation tasks, obtain simple cognition or reasoning ability and achieve language-based human-robot-interaction. In this paper, we built a system to creat a semantic 3D map by combining 3D point cloud from ORB SLAM with semantic segmentation information from Convolutional Neural Network model PSPNet-101 for large-scale environments. Besides, a new dataset for KITTI sequences has been built, which contains the GPS information and labels of landmarks from Google Map in related streets of the sequences. Moreover, we find a way to associate the real-world landmark with point cloud map and built a topological map based on semantic map.Comment: Accepted by 2019 China Symposium on Cognitive Computing and Hybrid Intelligence(CCHI'19

    Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving

    Full text link
    We propose a stereo vision-based approach for tracking the camera ego-motion and 3D semantic objects in dynamic autonomous driving scenarios. Instead of directly regressing the 3D bounding box using end-to-end approaches, we propose to use the easy-to-labeled 2D detection and discrete viewpoint classification together with a light-weight semantic inference method to obtain rough 3D object measurements. Based on the object-aware-aided camera pose tracking which is robust in dynamic environments, in combination with our novel dynamic object bundle adjustment (BA) approach to fuse temporal sparse feature correspondences and the semantic 3D measurement model, we obtain 3D object pose, velocity and anchored dynamic point cloud estimation with instance accuracy and temporal consistency. The performance of our proposed method is demonstrated in diverse scenarios. Both the ego-motion estimation and object localization are compared with the state-of-of-the-art solutions.Comment: 14 pages, 9 figures, eccv201

    SIVO: Semantically Informed Visual Odometry and Mapping

    Get PDF
    Accurate localization is a requirement for any autonomous mobile robot. In recent years, cameras have proven to be a reliable, cheap, and effective sensor to achieve this goal. Visual simultaneous localization and mapping (SLAM) algorithms determine camera motion by tracking the motion of reference points from the scene. However, these references must be static, as well as viewpoint, scale, and rotation invariant in order to ensure accurate localization. This is especially paramount for long-term robot operation, where we require our references to be stable over long durations and also require careful point selection to maintain the runtime and storage complexity of the algorithm while the robot navigates through its environment. In this thesis, we present SIVO (Semantically Informed Visual Odometry and Mapping), a novel feature selection method for visual SLAM which incorporates machine learning and neural network uncertainty into an information-theoretic approach to feature selection. The emergence of deep learning techniques has resulted in remarkable advances in scene understanding, and our method supplements traditional visual SLAM with this contextual knowledge. Our algorithm selects points which provide significant information to reduce the uncertainty of the state estimate while ensuring that the feature is detected to be a static object repeatedly, with a high confidence. This is done by evaluating the reduction in Shannon entropy between the current state entropy, and the joint entropy of the state given the addition of the new feature with the classification entropy of the feature from a Bayesian neural network. Our method is evaluated against ORB SLAM2 and the ground truth of the KITTI odometry dataset. Overall, SIVO performs comparably to ORB SLAM2 (average of 0.17% translation error difference, 6.2 × 10 −5 deg/m rotation error difference) while removing 69% of the map points on average. As the reference points selected are from static objects (building, traffic signs, etc.), the map generated using our algorithm is suitable for long-term localization

    Neural Network based Robot 3D Mapping and Navigation using Depth Image Camera

    Get PDF
    Robotics research has been developing rapidly in the past decade. However, in order to bring robots into household or office environments and cooperate well with humans, it is still required more research works. One of the main problems is robot localization and navigation. To be able to accomplish its missions, the mobile robot needs to solve problems of localizing itself in the environment, finding the best path and navigate to the goal. The navigation methods can be categorized into map-based navigation and map-less navigation. In this research we propose a method based on neural networks, using a depth image camera to solve the robot navigation problem. By using a depth image camera, the surrounding environment can be recognized regardless of the lighting conditions. A neural network-based approach is fast enough for robot navigation in real-time which is important to develop the full autonomous robots.In our method, mapping and annotating of the surrounding environment is done by the robot using a Feed-Forward Neural Network and a CNN network. The 3D map not only contains the geometric information of the environments but also their semantic contents. The semantic contents are important for robots to accomplish their tasks. For instance, consider the task “Go to cabinet to take a medicine”. The robot needs to know the position of the cabinet and medicine which is not supplied by solely the geometrical map. A Feed-Forward Neural Network is trained to convert the depth information from depth images into 3D points in real-world coordination. A CNN network is trained to segment the image into classes. By combining the two neural networks, the objects in the environment are segmented and their positions are determined.We implemented the proposed method using the mobile humanoid robot. Initially, the robot moves in the environment and build the 3D map with objects placed in their positions. Then, the robot utilizes the developed 3D map for goal-directed navigation.The experimental results show good performance in terms of the 3D map accuracy and robot navigation. Most of the objects in the working environments are classified by the trained CNN. Un-recognized objects are classified by Feed-Forward Neural Network. As a result, the generated maps reflected exactly working environments and can be applied for robots to safely navigate in them. The 3D geometric maps can be generated regardless of the lighting conditions. The proposed localization method is robust even in texture-less environments which are the toughest environments in the field of vision-based localization.博士(工学)法政大学 (Hosei University
    corecore