84,147 research outputs found

    3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection

    Full text link
    Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avoid blind spots which can otherwise lead to accidents. To minimize the number of cameras needed for surround perception, we utilize fisheye cameras. Consequently, standard vision pipelines for 3D mapping, visual localization, obstacle detection, etc. need to be adapted to take full advantage of the availability of multiple cameras rather than treat each camera individually. In addition, processing of fisheye images has to be supported. In this paper, we describe the camera calibration and subsequent processing pipeline for multi-fisheye-camera systems developed as part of the V-Charge project. This project seeks to enable automated valet parking for self-driving cars. Our pipeline is able to precisely calibrate multi-camera systems, build sparse 3D maps for visual navigation, visually localize the car with respect to these maps, generate accurate dense maps, as well as detect obstacles based on real-time depth map extraction

    Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

    Get PDF
    Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in 3D3D skeleton sequences into multiple 2D2D images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results

    Guided Filtering based Pyramidal Stereo Matching for Unrectified Images

    Get PDF
    Stereo matching deals with recovering quantitative depth information from a set of input images, based on the visual disparity between corresponding points. Generally most of the algorithms assume that the processed images are rectified. As robotics becomes popular, conducting stereo matching in the context of cloth manipulation, such as obtaining the disparity map of the garments from the two cameras of the cloth folding robot, is useful and challenging. This is resulted from the fact of the high efficiency, accuracy and low memory requirement under the usage of high resolution images in order to capture the details (e.g. cloth wrinkles) for the given application (e.g. cloth folding). Meanwhile, the images can be unrectified. Therefore, we propose to adapt guided filtering algorithm into the pyramidical stereo matching framework that works directly for unrectified images. To evaluate the proposed unrectified stereo matching in terms of accuracy, we present three datasets that are suited to especially the characteristics of the task of cloth manipulations. By com- paring the proposed algorithm with two baseline algorithms on those three datasets, we demonstrate that our proposed approach is accurate, efficient and requires low memory. This also shows that rather than relying on image rectification, directly applying stereo matching through the unrectified images can be also quite effective and meanwhile efficien
    • …
    corecore