Search CORE

226 research outputs found

Online Map Vectorization for Autonomous Driving: A Rasterization Perspective

Author: Lin Jiahao
Lu Shijian
Luo Zhipeng
Song Yilin
Wang Zuoguan
Wu Shuang
Xue Yang
Zhang Gongjie
Publication venue
Publication date: 09/10/2023
Field of study

Vectorized high-definition (HD) map is essential for autonomous driving, providing detailed and precise environmental information for advanced perception and planning. However, current map vectorization methods often exhibit deviations, and the existing evaluation metric for map vectorization lacks sufficient sensitivity to detect these deviations. To address these limitations, we propose integrating the philosophy of rasterization into map vectorization. Specifically, we introduce a new rasterization-based evaluation metric, which has superior sensitivity and is better suited to real-world autonomous driving scenarios. Furthermore, we propose MapVR (Map Vectorization via Rasterization), a novel framework that applies differentiable rasterization to vectorized outputs and then performs precise and geometry-aware supervision on rasterized HD maps. Notably, MapVR designs tailored rasterization strategies for various geometric shapes, enabling effective adaptation to a wide range of map elements. Experiments show that incorporating rasterization into map vectorization greatly enhances performance with no extra computational cost during inference, leading to more accurate map perception and ultimately promoting safer autonomous driving.Comment: [NeurIPS 2023

arXiv.org e-Print Archive

VectorMapNet: End-to-end Vectorized HD Map Learning

Author: Liu Yicheng
Tuan Yuantian
Wang Yilun
Wang Yue
Zhao Hang
Publication venue
Publication date: 01/11/2022
Field of study

Autonomous driving systems require a good understanding of surrounding environments, including moving obstacles and static High-Definition (HD) semantic map elements. Existing methods approach the semantic map problem by offline manual annotation, which suffers from serious scalability issues. Recent learning-based methods produce dense rasterized segmentation predictions to construct maps. However, these predictions do not include instance information of individual map elements and require heuristic post-processing to obtain vectorized maps. To tackle these challenges, we introduce an end-to-end vectorized HD map learning pipeline, termed VectorMapNet. VectorMapNet takes onboard sensor observations and predicts a sparse set of polylines in the bird's-eye view. This pipeline can explicitly model the spatial relation between map elements and generate vectorized maps that are friendly to downstream autonomous driving tasks. Extensive experiments show that VectorMapNet achieve strong map learning performance on both nuScenes and Argoverse2 dataset, surpassing previous state-of-the-art methods by 14.2 mAP and 14.6mAP. Qualitatively, we also show that VectorMapNet is capable of generating comprehensive maps and capturing more fine-grained details of road geometry. To the best of our knowledge, VectorMapNet is the first work designed towards end-to-end vectorized map learning from onboard observations. Our project website is available at https://tsinghua-mars-lab.github.io/vectormapnet/

arXiv.org e-Print Archive

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Author: Chen Li
Jia Peijin
Jiang Kun
Li Hongyang
Li Tianyu
Wang Bangjun
Yan Junchi
Publication venue
Publication date: 26/02/2024
Field of study

A map, as crucial information for downstream applications of an autonomous driving system, is usually represented in lanelines or centerlines. However, existing literature on map learning primarily focuses on either detecting geometry-based lanelines or perceiving topology relationships of centerlines. Both of these methods ignore the intrinsic relationship of lanelines and centerlines, that lanelines bind centerlines. While simply predicting both types of lane in one model is mutually excluded in learning objective, we advocate lane segment as a new representation that seamlessly incorporates both geometry and topology information. Thus, we introduce LaneSegNet, the first end-to-end mapping network generating lane segments to obtain a complete representation of the road structure. Our algorithm features two key modifications. One is a lane attention module to capture pivotal region details within the long-range feature space. Another is an identical initialization strategy for reference points, which enhances the learning of positional priors for lane attention. On the OpenLane-V2 dataset, LaneSegNet outperforms previous counterparts by a substantial gain across three tasks, \textit{i.e.}, map element detection (+4.8 mAP), centerline perception (+6.9 DET

_l

), and the newly defined one, lane segment perception (+5.6 mAP). Furthermore, it obtains a real-time inference speed of 14.7 FPS. Code is accessible at https://github.com/OpenDriveLab/LaneSegNet.Comment: Accepted in ICLR 202

arXiv.org e-Print Archive

Recommended from our members

Recognizing human activity using RGBD data

Author: Xia Lu, active 21st century
Publication venue
Publication date: 03/07/2014
Field of study

textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin

Texas ScholarWorks

Hand tracking using a quadric surface model and Bayesian filtering

Author: Cipolla Roberto
Stenger Bjorn
Thayananthan Arasanathan
Torr Philip HS
Publication venue: Springer Nature
Publication date: 01/01/2003
Field of study

Within this paper a technique for model-based 3D hand tracking is presented. A hand model is built from a set of truncated quadrics, approximating the anatomy of a real hand with few parameters. Given that the projection of a quadric onto the image plane is a conic, the contours can be generated efficiently. These model contours are used as shape templates to evaluate possible matches in the current frame. The evaluation is done within a hierarchical Bayesian filtering framework, where the posterior distribution is computed efficiently using a tree of templates. We demonstrate the effectiveness of the technique by using it for tracking 3D articulated and non-rigid hand motion from monocular video sequences in front of a cluttered background

Crossref

Oxford University Research Archive

Estimating 3D hand pose using hierarchical multi-label classification

Author: Cipolla R
Stenger B
Thayananthan A
Torr Phs
Publication venue: Elsevier
Publication date: 06/10/2006
Field of study

This paper presents an analysis of the design of classifiers for use in a hierarchical object recognition approach. In this approach, a cascade of classifiers is arranged in a tree in order to recognize multiple object classes. We are interested in the problem of recognizing multiple patterns as it is closely related to the problem of locating an articulated object. Each different pattern class corresponds to the hand in a different pose, or set of poses. For this problem obtaining labelled training data of the hand in a given pose can be problematic. Given a parametric 3D model, generating training data in the form of example images is cheap, and we demonstrate that it can be used to design classifiers almost as good as those trained using non-synthetic data. We compare a variety of different template-based classifiers and discuss their merits

Oxford University Research Archive

People detection and tracking using a network of low-cost depth cameras

Author: Tikkanen Tommi
Publication venue
Publication date: 10/02/2014
Field of study

Automaattinen ihmisten havainnointi on jo laajalti käytetty teknologia, jolla on sovelluksia esimerkiksi kaupan ja turvallisuuden aloilla. Tämän diplomityön tarkoituksena on suunnitella yleiskäyttöinen järjestelmä ihmisten havainnointiin sisätiloissa. Tässä työssä ensin esitetään kirjallisuudesta löytyvät ratkaisut ihmisten havainnointiin, seurantaan ja tunnistamiseen. Painopiste on syvyyskuvaa hyödyntävissä havaitsemismenetelmissä. Lisäksi esittellään kehitetty älykkäiden syvyyskameroiden verkko. Havainnointitarkkuutta kokeillaan neljällä kuvasarjalla, jotka sisältävät yli 20 000 syvyyskuvaa. Tulokset ovat lupaavia ja näyttävät, että yksinkertaiset ja laskennallisesti kevyet ratkaisut sopivat hyvin käytännön sovelluksiin.Automatic people detection is a widely adopted technology that has applications in retail stores, crowd management and surveillance. The goal of this work is to create a general purpose people detection framework. First, studies on people detection, tracking and re-identification are reviewed. The emphasis is on people detection from depth images. Furthermore, an approach based on a network of smart depth cameras is presented. The performance is evaluated with four image sequences, totalling over 20 000 depth images. Experimental results show that simple and lightweight algorithms are very useful in practical applications

Aaltodoc Publication Archive

Vulnerable road user detection and orientation estimation for context-aware automated driving

Author: Flohr F.B.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Comparison of fusion methods for thermo-visual surveillance tracking

Author: Cooke Eddie
O'Connor Noel E.
Smeaton Alan F.
Ó Conaire Ciarán
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In this paper, we evaluate the appearance tracking performance of multiple fusion schemes that combine information from standard CCTV and thermal infrared spectrum video for the tracking of surveillance objects, such as people, faces, bicycles and vehicles. We show results on numerous real world multimodal surveillance sequences, tracking challenging objects whose appearance changes rapidly. Based on these results we can determine the most promising fusion scheme

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Body Parts Features Based Pedestrian Detection for Active Pedestrian Protection System

Author: Lie Guo
Linhui Li
Mingheng Zhang
Yibing Zhao
Yingzi Lin
Publication venue: 'Faculty of Transport and Traffic Sciences'
Publication date: 01/01/2016
Field of study

A novel pedestrian detection system based on vision in urban traffic situations is presented to help the driver perceive the pedestrian ahead of vehicle. To enhance the accuracy and to decrease the time consumption of pedestrian detection in such complicated situations, the pedestrian is detected by dividing it into several parts according to their corresponding features in the image. The candidate pedestrian leg is segmented based on the gentle Adaboost algorithm by training the optimized histogram of gradient features. The candidate pedestrian head is located by matching the pedestrian head and shoulder model above the region of the candidate leg. Then the candidate leg, head and shoulder are combined by parts constraint and threshold adjustment to verify the existence of pedestrian. Experiments in real urban traffic circumstances were conducted finally. Results show that the proposed pedestrian detection method can achieve a pedestrian detection rate of 92.1% with less time consumption

Crossref

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia