7,650 research outputs found

    ๊ตฌ์กฐ ๊ฐ์‘ํ˜• ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•๊ณผ ํ˜ผํ•ฉ ๋ฐ€๋„ ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•œ ๋ผ์ด๋‹ค ๊ธฐ๋ฐ˜ 3์ฐจ์› ๊ฐ์ฒด ๊ฒ€์ถœ ๊ฐœ์„ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€, 2023. 2. ๊ณฝ๋…ธ์ค€.์ž์œจ์ฃผํ–‰์ž๋™์ฐจ, ๋กœ๋ด‡์˜ ์ธ์‹ ์žฅ๋น„๋กœ ๋งŽ์ด ํ™œ์šฉ๋˜๊ณ ์žˆ๋Š” ๋ผ์ด๋‹ค (LiDAR) ๋Š” ๋ ˆ์ด์ € ํŽ„์Šค๋ฅผ ๋ฐฉ์ถœํ•˜์—ฌ ๋˜๋Œ์•„์˜ค๋Š” ์‹œ๊ฐ„์„ ๊ณ„์‚ฐํ•˜์—ฌ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ (point cloud) ํ˜•ํƒœ๋กœ ์ฃผ๋ณ€ ํ™˜๊ฒฝ์„ ๊ฐ์ง€ํ•œ๋‹ค. ์ฃผ๋ณ€ ํ™˜๊ฒฝ์„ ๊ฐ์ง€ํ• ๋•Œ ๊ฐ€์žฅ ์ค‘ ์š”ํ•œ ๋ถ€๋ถ„์€ ๊ทผ์ฒ˜์— ์–ด๋–ค ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š”์ง€, ์–ด๋””์— ์œ„์น˜ํ•ด ์žˆ๋Š”์ง€๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์ด๊ณ  ์ด๋Ÿฌํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ํ™œ์šฉํ•˜๋Š” 3์ฐจ์› ๊ฐ ์ฒด ๊ฒ€์ถœ ๊ธฐ์ˆ ๋“ค์ด ๋งŽ์ด ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์˜ ์ „์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ๋งค์šฐ ๋‹ค์–‘ํ•œ ๊ตฌ์กฐ์˜ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ (backbone network) ๊ฐ€ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๊ณ ๋„ํ™”๋œ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ๋“ค๋กœ ์ธํ•ด ์ธ์‹ ์„ฑ๋Šฅ์— ํฐ ๋ฐœ์ „์„ ์ด๋ฃจ์—ˆ์ง€๋งŒ, ์ด๋“ค์˜ ํ˜•ํƒœ๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์„œ๋กœ ํ˜ธํ™˜์„ฑ์ด ๋ถ€์กฑํ•˜์—ฌ ์—ฐ๊ตฌ๋“ค์˜ ๊ฐˆ๋ž˜๊ฐ€ ๋งŽ์ด ๋‚˜๋ˆ„์–ด์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์— ์„œ ํ’€๊ณ ์žํ•˜๋Š” ๋ฌธ์ œ๋Š” ํŒŒํŽธํ™”๋œ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋“ค์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๊ณ  3์ฐจ์› ๊ฐ์ฒด ๊ฒ€์ถœ๊ธฐ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๊ฐ€ ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ ์—์„œ๋Š” ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์˜ 3์ฐจ์› ๊ฐ์ฒด ๊ฒ€์ถœ ๊ธฐ์ˆ ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” 3์ฐจ์› ๊ฒฝ๊ณ„ ์ƒ์ž (3D bounding box) ์˜ ๊ตฌ์กฐ์ ์ธ ์ •๋ณด์˜ ํ™œ์šฉ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ตฌ์กฐ ๊ฐ์‘ํ˜• ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• (PA-AUG) ๊ธฐ๋ฒ•์ด๋‹ค. 3์ฐจ์› ๊ฒฝ๊ณ„ ์ƒ์ž ๋ผ๋ฒจ์€ ๊ฐ์ฒด์— ๋”ฑ ๋งž๊ฒŒ ์ƒ์„ฑ๋˜๊ณ  ๋ฐฉํ–ฅ๊ฐ’์„ ํฌํ•จํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒ์ž ๋‚ด์— ๊ฐ์ฒด์˜ ๊ตฌ์กฐ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” 3์ฐจ์› ๊ฒฝ๊ณ„ ์ƒ์ž๋ฅผ ๊ตฌ์กฐ ๊ฐ์‘ํ˜• ํŒŒํ‹ฐ์…˜์œผ๋กœ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๊ณ , ํŒŒํ‹ฐ์…˜ ์ˆ˜์ค€์—์„œ ์ˆ˜ํ–‰๋˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ์‹์˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. PA-AUG๋Š” ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ 3์ฐจ์› ๊ฐ์ฒด ๊ฒ€์ถœ๊ธฐ๋“ค์˜ ์„ฑ๋Šฅ์„ ๊ฐ•์ธํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ๊ณ , ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ 2.5๋ฐฐ ์ฆ ๊ฐ•์‹œํ‚ค๋Š” ๋งŒํผ์˜ ์ธ์‹ ์„ฑ๋Šฅ ํ–ฅ์ƒ ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ํ˜ผํ•ฉ ๋ฐ€๋„ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ 3์ฐจ์› ๊ฐ์ฒด ๊ฒ€์ถœ (MD3D) ๊ธฐ๋ฒ•์ด๋‹ค. MD3D๋Š” ๊ฐ€์šฐ์‹œ๊ฐ„ ํ˜ผํ•ฉ ๋ชจ๋ธ (Gaussian Mixture Model) ์„ ์ด์šฉํ•ด 3์ฐจ์› ๊ฒฝ ๊ณ„ ์ƒ์ž ํšŒ๊ท€ ๋ฌธ์ œ๋ฅผ ๋ฐ€๋„ ์˜ˆ์ธก ๋ฐฉ์‹์œผ๋กœ ์žฌ์ •์˜ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๊ธฐ์กด์˜ ๋ผ๋ฒจ ํ• ๋‹น์‹์˜ ํ•™์Šต ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋‹ฌ๋ฆฌ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ์ „์ฒ˜๋ฆฌ ํ˜•ํƒœ์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๊ณ  ๋™์ผํ•œ ํ•™์Šต ๋ฐฉ์‹์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ๊ธฐ์กด ๋ฐฉ์‹ ๋Œ€๋น„ ํ•™์Šต ์— ํ•„์š”ํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ˜„์ €ํžˆ ์ ์–ด์„œ ์ตœ์ ํ™”๊ฐ€ ์šฉ์ดํ•˜์—ฌ ์ธ์‹ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๋†’์ผ ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ๋กœ ์ธํ•ด ์ธ์‹ ์†๋„๋„ ๋นจ๋ผ์ง€๊ฒŒ ๋œ๋‹ค. PA-AUG์™€ MD3D๋Š” ๋ชจ๋‘ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์— ์ƒ๊ด€์—†์ด ๋‹ค์–‘ํ•œ 3์ฐจ์› ๊ฐ์ฒด ๊ฒ€์ถœ๊ธฐ์— ๊ณตํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋†’์€ ์ธ์‹ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‘ ๊ธฐ๋ฒ•์€ ๊ฒ€์ถœ๊ธฐ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์˜์—ญ์— ์ ์šฉ๋˜๋Š” ๊ธฐ๋ฒ•์ด๋ฏ€๋กœ ํ•จ๊ป˜ ๋™์‹œ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ณ , ํ•จ๊ป˜ ์‚ฌ์šฉํ–ˆ์„๋•Œ ์ธ์‹ ์„ฑ๋Šฅ์ด ๋”์šฑ ํฌ๊ฒŒ ํ–ฅ์ƒ๋œ๋‹ค.LiDAR (Light Detection And Ranging), which is widely used as a sensing device for autonomous vehicles and robots, emits laser pulses and calculates the return time to sense the surrounding environment in the form of a point cloud. When recognizing the surrounding environment, the most important part is recognizing what objects are nearby and where they are located, and 3D object detection methods using point clouds have been actively studied to perform these tasks. Various backbone networks for point cloud-based 3D object detection have been proposed according to the preprocessing method of point cloud data. Although advanced backbone networks have made great strides in detection performance, they are largely different in structure, so there is a lack of compatibility with each other. The problem to be solved in this dissertation is How to improve the performance of 3D object detectors regardless of their diverse backbone network structures?. This dissertation proposes two general methods to improve point cloud-based 3D object detectors. First, we propose a part-aware data augmentation (PA-AUG) method which maximizes the utilization of structural information of 3D bounding boxes. Since the 3D bounding box labels fit the objects boundaries and include the orientation value, they contain the structural information of the object in the box. To fully utilize the intra-object structural information, we propose a novel partaware partitioning method which separates 3D bounding boxes with characteristic sub-parts. PA-AUG applies newly proposed data augmentation methods at the partition level. It makes various types of 3D object detectors robust and brings the equivalent effect of increasing the train data by about 2.5ร—. Second, we propose a mixture-density-based 3D object detection (MD3D). MD3D predicts the distribution of 3D bounding boxes using a Gaussian mixture model (GMM). It reformulates the conventional regression methods as a density estimation problem. Thus, unlike conventional target assignment methods, it can be applied to any 3D object detector regardless of the point cloud preprocessing method. In addition, as it requires significantly fewer hyper-parameters compared to existing methods, it is easy to optimize the detection performance. MD3D also increases the detection speed due to its simple structure. Both PA-AUG and MD3D can be applied to any 3D object detector and shows an impressive increase in detection performance. The two proposed methods cover different stages of the object detection pipeline. Thus, they can be used simultaneously, and the experimental results show they have a synergy effect when applied together.1 Introduction 1 1.1 Problem Definition 3 1.2 Challenges 6 1.3 Contributions 8 1.3.1 Part-Aware Data Augmentation (PA-AUG) 8 1.3.2 Mixture-Density-based 3D Object Detection (MD3D) 9 1.3.3 Combination of PA-AUG and MD3D 10 1.4 Outline 10 2 Related Works 11 2.1 Data augmentation for Object Detection 11 2.1.1 2D Data augmentation 11 2.1.2 3D Data augmentation 12 2.2 LiDAR-based 3D Object Detection 13 2.3 Mixture Density Networks in Computer Vision 15 2.4 Datasets 16 2.4.1 KITTI Dataset 16 2.4.2 Waymo Open Dataset 18 2.5 Evaluation metric 19 2.5.1 Average Precision (AP) 19 2.5.2 Average Orientation Similarity (AOS) 22 2.5.3 Average Precision weighted by Heading (APH) 22 3 Part-Aware Data Augmentation (PA-AUG) 24 3.1 Introduction 24 3.2 Methods 27 3.2.1 Part-Aware Partitioning 27 3.2.2 Part-Aware Data Augmentation 28 3.3 Experiments 33 3.3.1 Results on the KITTI Dataset 33 3.3.2 Robustness Test 36 3.3.3 Data Efficiency Test 38 3.3.4 Ablation Study 40 3.4 Discussion 41 3.5 Conclusion 42 4 Mixture-Density-based 3D Object Detection (MD3D) 43 4.1 Introduction 43 4.2 Methods 47 4.2.1 Modeling Point-cloud-based 3D Object Detection with Mixture Density Network 47 4.2.2 Network Architecture 49 4.2.3 Loss function 52 4.3 Experiments 53 4.3.1 Datasets 53 4.3.2 Experiment Settings 53 4.3.3 Results on the KITTI Dataset 54 4.3.4 Latency of Each Module 56 4.3.5 Results on the Waymo Open Dataset 58 4.3.6 Analyzing Recall by object size 59 4.3.7 Ablation Study 60 4.3.8 Discussion 65 4.4 Conclusion 66 5 Combination of PA-AUG and MD3D 71 5.1 Methods 71 5.2 Experiments 72 5.2.1 Settings 72 5.2.2 Results on the KITTI Dataset 73 5.3 Discussion 76 6 Conclusion 77 6.1 Summary 77 6.2 Limitations and Future works 78 6.2.1 Hyper-parameter-free PA-AUG 78 6.2.2 Redefinition of Part-aware Partitioning 79 6.2.3 Application to other tasks 79 Abstract (In Korean) 94 ๊ฐ์‚ฌ์˜ ๊ธ€ 96๋ฐ•

    Multi-View 3D Object Detection Network for Autonomous Driving

    Full text link
    This paper aims at high-accuracy 3D object detection in autonomous driving scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes. We encode the sparse 3D point cloud with a compact multi-view representation. The network is composed of two subnetworks: one for 3D object proposal generation and another for multi-view feature fusion. The proposal network generates 3D candidate boxes efficiently from the bird's eye view representation of 3D point cloud. We design a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths. Experiments on the challenging KITTI benchmark show that our approach outperforms the state-of-the-art by around 25% and 30% AP on the tasks of 3D localization and 3D detection. In addition, for 2D detection, our approach obtains 10.3% higher AP than the state-of-the-art on the hard data among the LIDAR-based methods.Comment: To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 201

    DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation

    Full text link
    In real-world crowd counting applications, the crowd densities vary greatly in spatial and temporal domains. A detection based counting method will estimate crowds accurately in low density scenes, while its reliability in congested areas is downgraded. A regression based approach, on the other hand, captures the general density information in crowded regions. Without knowing the location of each person, it tends to overestimate the count in low density areas. Thus, exclusively using either one of them is not sufficient to handle all kinds of scenes with varying densities. To address this issue, a novel end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density Estimation Network) is proposed. It can adaptively decide the appropriate counting mode for different locations on the image based on its real density conditions. DecideNet starts with estimating the crowd density by generating detection and regression based density maps separately. To capture inevitable variation in densities, it incorporates an attention module, meant to adaptively assess the reliability of the two types of estimations. The final crowd counts are obtained with the guidance of the attention module to adopt suitable estimations from the two kinds of density maps. Experimental results show that our method achieves state-of-the-art performance on three challenging crowd counting datasets.Comment: CVPR 201

    Event-Based Motion Segmentation by Motion Compensation

    Full text link
    In contrast to traditional cameras, whose pixels have a common exposure time, event-based cameras are novel bio-inspired sensors whose pixels work independently and asynchronously output intensity changes (called "events"), with microsecond resolution. Since events are caused by the apparent motion of objects, event-based cameras sample visual information based on the scene dynamics and are, therefore, a more natural fit than traditional cameras to acquire motion, especially at high speeds, where traditional cameras suffer from motion blur. However, distinguishing between events caused by different moving objects and by the camera's ego-motion is a challenging task. We present the first per-event segmentation method for splitting a scene into independently moving objects. Our method jointly estimates the event-object associations (i.e., segmentation) and the motion parameters of the objects (or the background) by maximization of an objective function, which builds upon recent results on event-based motion-compensation. We provide a thorough evaluation of our method on a public dataset, outperforming the state-of-the-art by as much as 10%. We also show the first quantitative evaluation of a segmentation algorithm for event cameras, yielding around 90% accuracy at 4 pixels relative displacement.Comment: When viewed in Acrobat Reader, several of the figures animate. Video: https://youtu.be/0q6ap_OSBA

    Road pollution estimation using static cameras and neural networks

    Get PDF
    Este artรญculo presenta una metodologรญa para estimar la contaminaciรณn en carreteras mediante el anรกlisis de secuencias de video de trรกfico. El objetivo es aprovechar la gran red de cรกmaras IP existente en el sistema de carreteras de cualquier estado o paรญs para estimar la contaminaciรณn en cada รกrea. Esta propuesta utiliza redes neuronales de aprendizaje profundo para la detecciรณn de objetos, y un modelo de estimaciรณn de contaminaciรณn basado en la frecuencia de vehรญculos y su velocidad. Los experimentos muestran prometedores resultados que sugieren que el sistema se puede usar en solitario o combinado con los sistemas existentes para medir la contaminaciรณn en carreteras.Universidad de Mรกlaga. Campus de Excelencia Internacional Andalucรญa Tech
    • โ€ฆ
    corecore