Search CORE

639 research outputs found

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Author: Chen Yilun
Huang Shijia
Jia Jiaya
Liu Shu
Yu Bei
Publication venue
Publication date: 09/04/2022
Field of study

Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics. We polish the stereo modeling and propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline in the following three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for better grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we firstly propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code will be released at https://github.com/chenyilun95/DSGN2

arXiv.org e-Print Archive

RGB-D-based Stair Detection using Deep Learning for Autonomous Stair Climbing

Author: Pei Zhongcai
Qiu Shuang
Tang Zhiyong
Wang Chen
Publication venue
Publication date: 09/12/2022
Field of study

Stairs are common building structures in urban environments, and stair detection is an important part of environment perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a neural network architecture with RGB and depth map inputs. Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB map and the depth map and effectively combine the information from the RGB map and the depth map in different scenes. In addition, we design a line clustering algorithm for the postprocessing of detection results, which can make full use of the detection results to obtain the geometric stair parameters. Experiments on our dataset show that our method can achieve better accuracy and recall compared with existing state-of-the-art deep learning methods, which are 5.64% and 7.97%, respectively, and our method also has extremely fast detection speed. A lightweight version can achieve 300 + frames per second with the same resolution, which can meet the needs of most real-time detection scenes

arXiv.org e-Print Archive

Fusion-layer-based machine vision for intelligent transportation systems/

Author: Fang Yajun, Ph. D. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 307-317).Environment understanding technology is very vital for intelligent vehicles that are expected to automatically respond to fast changing environment and dangerous situations. To obtain perception abilities, we should automatically detect static and dynamic obstacles, and obtain their related information, such as, locations, speed, collision/occlusion possibility, and other dynamic current/historic information. Conventional methods independently detect individual information, which is normally noisy and not very reliable. Instead we propose fusion-based and layered-based information-retrieval methodology to systematically detect obstacles and obtain their location/timing information for visible and infrared sequences. The proposed obstacle detection methodologies take advantage of connection between different information and increase the computational accuracy of obstacle information estimation, thus improving environment understanding abilities, and driving safety.by Yajun Fang.Ph.D

DSpace@MIT

Effects of Ground Manifold Modeling on the Accuracy of Stixel Calculations

Author: Chien H-J
Klette R
Rezaei M
Saleem NH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/10/2019
Field of study

This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions

White Rose Research Online

M3DSSD: Monocular 3D Single Stage Object Detector

Author: Dai Hang
Ding Yong
Luo Shujie
Shao Ling
Publication venue
Publication date: 24/03/2021
Field of study

In this paper, we propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. Current anchor-based monocular 3D object detection methods suffer from feature mismatching. To overcome this, we propose a two-step feature alignment approach. In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. In the second step, the center alignment is used to align the features at 2D/3D centers. Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. Therefore, we propose a novel asymmetric non-local attention block with multi-scale sampling to extract depth-wise features. The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset, in both 3D object detection and bird's eye view tasks.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

Enlighten

StairNetV3: Depth-aware Stair Modeling using Deep Learning

Author: Pei Zhongcai
Qiu Shuang
Tang Zhiyong
Wang Chen
Wang Yachun
Publication venue
Publication date: 13/08/2023
Field of study

Vision-based stair perception can help autonomous mobile robots deal with the challenge of climbing stairs, especially in unfamiliar environments. To address the problem that current monocular vision methods are difficult to model stairs accurately without depth information, this paper proposes a depth-aware stair modeling method for monocular vision. Specifically, we take the extraction of stair geometric features and the prediction of depth images as joint tasks in a convolutional neural network (CNN), with the designed information propagation architecture, we can achieve effective supervision for stair geometric feature learning by depth information. In addition, to complete the stair modeling, we take the convex lines, concave lines, tread surfaces and riser surfaces as stair geometric features and apply Gaussian kernels to enable the network to predict contextual information within the stair lines. Combined with the depth information obtained by depth sensors, we propose a stair point cloud reconstruction method that can quickly get point clouds belonging to the stair step surfaces. Experiments on our dataset show that our method has a significant improvement over the previous best monocular vision method, with an intersection over union (IOU) increase of 3.4 %, and the lightweight version has a fast detection speed and can meet the requirements of most real-time applications. Our dataset is available at https://data.mendeley.com/datasets/6kffmjt7g2/1

arXiv.org e-Print Archive

MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization

Author: Liang Guoqiang
Wang Song
Xing Yinghui
Zhang Shizhou
Zhang Xiuwei
Zhang Yanning
Publication venue
Publication date: 08/08/2023
Field of study

Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Most of the available multispectral pedestrian detectors are based on non-end-to-end detectors, while in this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR), an end-to-end multispectral pedestrian detector, which extends DETR into the field of multi-modal detection. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR

arXiv.org e-Print Archive

Augmented reality meeting table: a novel multi-user interface for architectural design

Author: A. A. Argyros
A. Colquhoun
A. Penn
A. Penn
A. Turner
B. Hillier
B. Hillier
B. Hillier
B. Hillier
B. Hillier
B. Lawson
B. Wanstall
C. Jones
C. Jones
D.A. Schoen
D.A. Schoen
G. Stiny
H. Rittel
H. Simon
H. Simon
H.T. Regenbrecht
J.F.H. Fuchs
L. Archer
N. Navab
N.J. Habraken
P. Checkland
P. Checkland
R. Venturi
R.T. Azuma
R.T. Azuma
S. Feiner
S. Weghorst
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Immersive virtual environments have received widespread attention as providing possible replacements for the media and systems that designers traditionally use, as well as, more generally, in providing support for collaborative work. Relatively little attention has been given to date however to the problem of how to merge immersive virtual environments into real world work settings, and so to add to the media at the disposal of the designer and the design team, rather than to replace it. In this paper we report on a research project in which optical see-through augmented reality displays have been developed together with prototype decision support software for architectural and urban design. We suggest that a critical characteristic of multi user augmented reality is its ability to generate visualisations from a first person perspective in which the scale of rendition of the design model follows many of the conventions that designers are used to. Different scales of model appear to allow designers to focus on different aspects of the design under consideration. Augmenting the scene with simulations of pedestrian movement appears to assist both in scale recognition, and in moving from a first person to a third person understanding of the design. This research project is funded by the European Commission IST program (IST-2000-28559)

Identifying and Tracking Pedestrians Based on Sensor Fusion and Motion Stability Predictions

Author: Arturo De la Escalera
Basam Musleh
Broggi
Broggi
Brown
Draper
Fernando García
Fusiello
Gandhi
Garcia
Goldberg
Gong
Hartley
Hirschmüller
Hogg
Javier Otamendi
José Mª Armingol
Labayrade
Leonard
Mahlisch
McCulloch
Premebida
Scharstein
Sidak
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/08/2010
Field of study

The lack of trustworthy sensors makes development of Advanced Driver Assistance System (ADAS) applications a tough task. It is necessary to develop intelligent systems by combining reliable sensors and real-time algorithms to send the proper, accurate messages to the drivers. In this article, an application to detect and predict the movement of pedestrians in order to prevent an imminent collision has been developed and tested under real conditions. The proposed application, first, accurately measures the position of obstacles using a two-sensor hybrid fusion approach: a stereo camera vision system and a laser scanner. Second, it correctly identifies pedestrians using intelligent algorithms based on polylines and pattern recognition related to leg positions (laser subsystem) and dense disparity maps and u-v disparity (vision subsystem). Third, it uses statistical validation gates and confidence regions to track the pedestrian within the detection zones of the sensors and predict their position in the upcoming frames. The intelligent sensor application has been experimentally tested with success while tracking pedestrians that cross and move in zigzag fashion in front of a vehicle

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central