639 research outputs found
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Camera-based 3D object detectors are welcome due to their wider deployment
and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN
about the stereo volume constructions for representing both 3D geometry and
semantics. We polish the stereo modeling and propose our approach, DSGN++,
aiming for improving information flow throughout the 2D-to-3D pipeline in the
following three main aspects. First, to effectively lift the 2D information to
stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser
connections and extracts depth-guided features. Second, for better grasping
differently spaced features, we present a novel stereo volume -- Dual-view
Stereo Volume (DSV) that integrates front-view and top-view features and
reconstructs sub-voxel depth in the camera frustum. Third, as the foreground
region becomes less dominant in 3D space, we firstly propose a multi-modal data
editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal
alignment and improves data efficiency. Without bells and whistles, extensive
experiments in various modality setups on the popular KITTI benchmark show that
our method consistently outperforms other camera-based 3D detectors for all
categories. Code will be released at https://github.com/chenyilun95/DSGN2
RGB-D-based Stair Detection using Deep Learning for Autonomous Stair Climbing
Stairs are common building structures in urban environments, and stair
detection is an important part of environment perception for autonomous mobile
robots. Most existing algorithms have difficulty combining the visual
information from binocular sensors effectively and ensuring reliable detection
at night and in the case of extremely fuzzy visual clues. To solve these
problems, we propose a neural network architecture with RGB and depth map
inputs. Specifically, we design a selective module, which can make the network
learn the complementary relationship between the RGB map and the depth map and
effectively combine the information from the RGB map and the depth map in
different scenes. In addition, we design a line clustering algorithm for the
postprocessing of detection results, which can make full use of the detection
results to obtain the geometric stair parameters. Experiments on our dataset
show that our method can achieve better accuracy and recall compared with
existing state-of-the-art deep learning methods, which are 5.64% and 7.97%,
respectively, and our method also has extremely fast detection speed. A
lightweight version can achieve 300 + frames per second with the same
resolution, which can meet the needs of most real-time detection scenes
Fusion-layer-based machine vision for intelligent transportation systems/
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 307-317).Environment understanding technology is very vital for intelligent vehicles that are expected to automatically respond to fast changing environment and dangerous situations. To obtain perception abilities, we should automatically detect static and dynamic obstacles, and obtain their related information, such as, locations, speed, collision/occlusion possibility, and other dynamic current/historic information. Conventional methods independently detect individual information, which is normally noisy and not very reliable. Instead we propose fusion-based and layered-based information-retrieval methodology to systematically detect obstacles and obtain their location/timing information for visible and infrared sequences. The proposed obstacle detection methodologies take advantage of connection between different information and increase the computational accuracy of obstacle information estimation, thus improving environment understanding abilities, and driving safety.by Yajun Fang.Ph.D
Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations
This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions
M3DSSD: Monocular 3D Single Stage Object Detector
In this paper, we propose a Monocular 3D Single Stage object Detector
(M3DSSD) with feature alignment and asymmetric non-local attention. Current
anchor-based monocular 3D object detection methods suffer from feature
mismatching. To overcome this, we propose a two-step feature alignment
approach. In the first step, the shape alignment is performed to enable the
receptive field of the feature map to focus on the pre-defined anchors with
high confidence scores. In the second step, the center alignment is used to
align the features at 2D/3D centers. Further, it is often difficult to learn
global information and capture long-range relationships, which are important
for the depth prediction of objects. Therefore, we propose a novel asymmetric
non-local attention block with multi-scale sampling to extract depth-wise
features. The proposed M3DSSD achieves significantly better performance than
the monocular 3D object detection methods on the KITTI dataset, in both 3D
object detection and bird's eye view tasks.Comment: Accepted to CVPR 202
StairNetV3: Depth-aware Stair Modeling using Deep Learning
Vision-based stair perception can help autonomous mobile robots deal with the
challenge of climbing stairs, especially in unfamiliar environments. To address
the problem that current monocular vision methods are difficult to model stairs
accurately without depth information, this paper proposes a depth-aware stair
modeling method for monocular vision. Specifically, we take the extraction of
stair geometric features and the prediction of depth images as joint tasks in a
convolutional neural network (CNN), with the designed information propagation
architecture, we can achieve effective supervision for stair geometric feature
learning by depth information. In addition, to complete the stair modeling, we
take the convex lines, concave lines, tread surfaces and riser surfaces as
stair geometric features and apply Gaussian kernels to enable the network to
predict contextual information within the stair lines. Combined with the depth
information obtained by depth sensors, we propose a stair point cloud
reconstruction method that can quickly get point clouds belonging to the stair
step surfaces. Experiments on our dataset show that our method has a
significant improvement over the previous best monocular vision method, with an
intersection over union (IOU) increase of 3.4 %, and the lightweight version
has a fast detection speed and can meet the requirements of most real-time
applications. Our dataset is available at
https://data.mendeley.com/datasets/6kffmjt7g2/1
MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization
Multispectral pedestrian detection is an important task for many
around-the-clock applications, since the visible and thermal modalities can
provide complementary information especially under low light conditions. Most
of the available multispectral pedestrian detectors are based on non-end-to-end
detectors, while in this paper, we propose MultiSpectral pedestrian DEtection
TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which
extends DETR into the field of multi-modal detection. MS-DETR consists of two
modality-specific backbones and Transformer encoders, followed by a multi-modal
Transformer decoder, and the visible and thermal features are fused in the
multi-modal Transformer decoder. To well resist the misalignment between
multi-modal images, we design a loosely coupled fusion strategy by sparsely
sampling some keypoints from multi-modal features independently and fusing them
with adaptively learned attention weights. Moreover, based on the insight that
not only different modalities, but also different pedestrian instances tend to
have different confidence scores to final detection, we further propose an
instance-aware modality-balanced optimization strategy, which preserves visible
and thermal decoder branches and aligns their predicted slots through an
instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance
on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code
is available at https://github.com/YinghuiXing/MS-DETR
Augmented reality meeting table: a novel multi-user interface for architectural design
Immersive virtual environments have received widespread attention as providing possible replacements for the media and systems that designers traditionally use, as well as, more generally, in providing support for collaborative work. Relatively little attention has been given to date however to the problem of how to merge immersive virtual environments into real world work settings, and so to add to the media at the disposal of the designer and the design team, rather than to replace it. In this paper we report on a research project in which optical see-through augmented reality displays have been developed together with prototype decision support software for architectural and urban design. We suggest that a critical characteristic of multi user augmented reality is its ability to generate visualisations from a first person perspective in which the scale of rendition of the design model follows many of the conventions that designers are used to. Different scales of model appear to allow designers to focus on different aspects of the design under consideration. Augmenting the scene with simulations of pedestrian movement appears to assist both in scale recognition, and in moving from a first person to a third person understanding of the design. This research project is funded by the European Commission IST program (IST-2000-28559)
Identifying and Tracking Pedestrians Based on Sensor Fusion and Motion Stability Predictions
The lack of trustworthy sensors makes development of Advanced Driver Assistance System (ADAS) applications a tough task. It is necessary to develop intelligent systems by combining reliable sensors and real-time algorithms to send the proper, accurate messages to the drivers. In this article, an application to detect and predict the movement of pedestrians in order to prevent an imminent collision has been developed and tested under real conditions. The proposed application, first, accurately measures the position of obstacles using a two-sensor hybrid fusion approach: a stereo camera vision system and a laser scanner. Second, it correctly identifies pedestrians using intelligent algorithms based on polylines and pattern recognition related to leg positions (laser subsystem) and dense disparity maps and u-v disparity (vision subsystem). Third, it uses statistical validation gates and confidence regions to track the pedestrian within the detection zones of the sensors and predict their position in the upcoming frames. The intelligent sensor application has been experimentally tested with success while tracking pedestrians that cross and move in zigzag fashion in front of a vehicle
- …