639 research outputs found

    DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

    Full text link
    Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics. We polish the stereo modeling and propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline in the following three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for better grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we firstly propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code will be released at https://github.com/chenyilun95/DSGN2

    RGB-D-based Stair Detection using Deep Learning for Autonomous Stair Climbing

    Full text link
    Stairs are common building structures in urban environments, and stair detection is an important part of environment perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a neural network architecture with RGB and depth map inputs. Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB map and the depth map and effectively combine the information from the RGB map and the depth map in different scenes. In addition, we design a line clustering algorithm for the postprocessing of detection results, which can make full use of the detection results to obtain the geometric stair parameters. Experiments on our dataset show that our method can achieve better accuracy and recall compared with existing state-of-the-art deep learning methods, which are 5.64% and 7.97%, respectively, and our method also has extremely fast detection speed. A lightweight version can achieve 300 + frames per second with the same resolution, which can meet the needs of most real-time detection scenes

    Fusion-layer-based machine vision for intelligent transportation systems/

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 307-317).Environment understanding technology is very vital for intelligent vehicles that are expected to automatically respond to fast changing environment and dangerous situations. To obtain perception abilities, we should automatically detect static and dynamic obstacles, and obtain their related information, such as, locations, speed, collision/occlusion possibility, and other dynamic current/historic information. Conventional methods independently detect individual information, which is normally noisy and not very reliable. Instead we propose fusion-based and layered-based information-retrieval methodology to systematically detect obstacles and obtain their location/timing information for visible and infrared sequences. The proposed obstacle detection methodologies take advantage of connection between different information and increase the computational accuracy of obstacle information estimation, thus improving environment understanding abilities, and driving safety.by Yajun Fang.Ph.D

    Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations

    Get PDF
    This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions

    M3DSSD: Monocular 3D Single Stage Object Detector

    Full text link
    In this paper, we propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. Current anchor-based monocular 3D object detection methods suffer from feature mismatching. To overcome this, we propose a two-step feature alignment approach. In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. In the second step, the center alignment is used to align the features at 2D/3D centers. Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. Therefore, we propose a novel asymmetric non-local attention block with multi-scale sampling to extract depth-wise features. The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset, in both 3D object detection and bird's eye view tasks.Comment: Accepted to CVPR 202

    StairNetV3: Depth-aware Stair Modeling using Deep Learning

    Full text link
    Vision-based stair perception can help autonomous mobile robots deal with the challenge of climbing stairs, especially in unfamiliar environments. To address the problem that current monocular vision methods are difficult to model stairs accurately without depth information, this paper proposes a depth-aware stair modeling method for monocular vision. Specifically, we take the extraction of stair geometric features and the prediction of depth images as joint tasks in a convolutional neural network (CNN), with the designed information propagation architecture, we can achieve effective supervision for stair geometric feature learning by depth information. In addition, to complete the stair modeling, we take the convex lines, concave lines, tread surfaces and riser surfaces as stair geometric features and apply Gaussian kernels to enable the network to predict contextual information within the stair lines. Combined with the depth information obtained by depth sensors, we propose a stair point cloud reconstruction method that can quickly get point clouds belonging to the stair step surfaces. Experiments on our dataset show that our method has a significant improvement over the previous best monocular vision method, with an intersection over union (IOU) increase of 3.4 %, and the lightweight version has a fast detection speed and can meet the requirements of most real-time applications. Our dataset is available at https://data.mendeley.com/datasets/6kffmjt7g2/1

    MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization

    Full text link
    Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Most of the available multispectral pedestrian detectors are based on non-end-to-end detectors, while in this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which extends DETR into the field of multi-modal detection. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR

    Augmented reality meeting table: a novel multi-user interface for architectural design

    Get PDF
    Immersive virtual environments have received widespread attention as providing possible replacements for the media and systems that designers traditionally use, as well as, more generally, in providing support for collaborative work. Relatively little attention has been given to date however to the problem of how to merge immersive virtual environments into real world work settings, and so to add to the media at the disposal of the designer and the design team, rather than to replace it. In this paper we report on a research project in which optical see-through augmented reality displays have been developed together with prototype decision support software for architectural and urban design. We suggest that a critical characteristic of multi user augmented reality is its ability to generate visualisations from a first person perspective in which the scale of rendition of the design model follows many of the conventions that designers are used to. Different scales of model appear to allow designers to focus on different aspects of the design under consideration. Augmenting the scene with simulations of pedestrian movement appears to assist both in scale recognition, and in moving from a first person to a third person understanding of the design. This research project is funded by the European Commission IST program (IST-2000-28559)

    Identifying and Tracking Pedestrians Based on Sensor Fusion and Motion Stability Predictions

    Get PDF
    The lack of trustworthy sensors makes development of Advanced Driver Assistance System (ADAS) applications a tough task. It is necessary to develop intelligent systems by combining reliable sensors and real-time algorithms to send the proper, accurate messages to the drivers. In this article, an application to detect and predict the movement of pedestrians in order to prevent an imminent collision has been developed and tested under real conditions. The proposed application, first, accurately measures the position of obstacles using a two-sensor hybrid fusion approach: a stereo camera vision system and a laser scanner. Second, it correctly identifies pedestrians using intelligent algorithms based on polylines and pattern recognition related to leg positions (laser subsystem) and dense disparity maps and u-v disparity (vision subsystem). Third, it uses statistical validation gates and confidence regions to track the pedestrian within the detection zones of the sensors and predict their position in the upcoming frames. The intelligent sensor application has been experimentally tested with success while tracking pedestrians that cross and move in zigzag fashion in front of a vehicle
    • …
    corecore