46 research outputs found

    Unsupervised brain anomaly detection in MR images

    Get PDF
    Brain disorders are characterized by morphological deformations in shape and size of (sub)cortical structures in one or both hemispheres. These deformations cause deviations from the normal pattern of brain asymmetries, resulting in asymmetric lesions that directly affect the patientโ€™s condition. Unsupervised methods aim to learn a model from unlabeled healthy images, so that an unseen image that breaks priors of this model, i.e., an outlier, is considered an anomaly. Consequently, they are generic in detecting any lesions, e.g., coming from multiple diseases, as long as these notably differ from healthy training images. This thesis addresses the development of solutions to leverage unsupervised machine learning for the detection/analysis of abnormal brain asymmetries related to anomalies in magnetic resonance (MR) images. First, we propose an automatic probabilistic-atlas-based approach for anomalous brain image segmentation. Second, we explore an automatic method for the detection of abnormal hippocampi from abnormal asymmetries based on deep generative networks and a one-class classifier. Third, we present a more generic framework to detect abnormal asymmetries in the entire brain hemispheres. Our approach extracts pairs of symmetric regions โ€” called supervoxels โ€” in both hemispheres of a test image under study. One-class classifiers then analyze the asymmetries present in each pair. Experimental results on 3D MR-T1 images from healthy subjects and patients with a variety of lesions show the effectiveness and robustness of the proposed unsupervised approaches for brain anomaly detection

    Cross-source Point Cloud Registration: Challenges, Progress and Prospects

    Full text link
    The emerging topic of cross-source point cloud (CSPC) registration has attracted increasing attention with the fast development background of 3D sensor technologies. Different from the conventional same-source point clouds that focus on data from same kind of 3D sensor (e.g., Kinect), CSPCs come from different kinds of 3D sensors (e.g., Kinect and { LiDAR}). CSPC registration generalizes the requirement of data acquisition from same-source to different sources, which leads to generalized applications and combines the advantages of multiple sensors. In this paper, we provide a systematic review on CSPC registration. We first present the characteristics of CSPC, and then summarize the key challenges in this research area, followed by the corresponding research progress consisting of the most recent and representative developments on this topic. Finally, we discuss the important research directions in this vibrant area and explain the role in several application fields.Comment: Accepted by Neurocomputing 202

    ์ฃผํ–‰๊ณ„ ๋ฐ ์ง€๋„ ์ž‘์„ฑ์„ ์œ„ํ•œ 3์ฐจ์› ํ™•๋ฅ ์  ์ •๊ทœ๋ถ„ํฌ๋ณ€ํ™˜์˜ ์ •ํ•ฉ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์ด๋ฒ”ํฌ.๋กœ๋ด‡์€ ๊ฑฐ๋ฆฌ์„ผ์„œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ„์น˜ํ•œ ํ™˜๊ฒฝ์˜ ๊ณต๊ฐ„ ์ •๋ณด๋ฅผ ์ ๊ตฐ(point set) ํ˜•ํƒœ๋กœ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ์ˆ˜์ง‘ํ•œ ์ •๋ณด๋ฅผ ํ™˜๊ฒฝ์˜ ๋ณต์›์— ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ๋กœ๋ด‡์€ ์ ๊ตฐ๊ณผ ๋ชจ๋ธ์„ ์ •ํ•ฉํ•˜๋Š” ์œ„์น˜๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฑฐ๋ฆฌ์„ผ์„œ๊ฐ€ ์ˆ˜์ง‘ํ•œ ์ ๊ตฐ์ด 2์ฐจ์›์—์„œ 3์ฐจ์›์œผ๋กœ ํ™•์žฅ๋˜๊ณ  ํ•ด์ƒ๋„๊ฐ€ ๋†’์•„์ง€๋ฉด์„œ ์ ์˜ ๊ฐœ์ˆ˜๊ฐ€ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ, NDT (normal distributions transform)๋ฅผ ์ด์šฉํ•œ ์ •ํ•ฉ์ด ICP (iterative closest point)์˜ ๋Œ€์•ˆ์œผ๋กœ ๋ถ€์ƒํ•˜์˜€๋‹ค. NDT๋Š” ์ ๊ตฐ์„ ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ณต๊ฐ„์„ ํ‘œํ˜„ํ•˜๋Š” ์••์ถ•๋œ ๊ณต๊ฐ„ ํ‘œํ˜„ ๋ฐฉ๋ฒ•์ด๋‹ค. ๋ถ„ํฌ์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ ์˜ ๊ฐœ์ˆ˜์— ๋น„ํ•ด ์›”๋“ฑํžˆ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ICP์— ๋น„ํ•ด ๋น ๋ฅธ ์„ฑ๋Šฅ์„ ๊ฐ€์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ NDT ์ •ํ•ฉ ๊ธฐ๋ฐ˜ ์œ„์น˜ ์ถ”์ •์˜ ์„ฑ๋Šฅ์„ ์ขŒ์šฐํ•˜๋Š” ์…€์˜ ํฌ๊ธฐ, ์…€์˜ ์ค‘์ฒฉ ์ •๋„, ์…€์˜ ๋ฐฉํ–ฅ, ๋ถ„ํฌ์˜ ์Šค์ผ€์ผ, ๋Œ€์‘์Œ์˜ ๋น„์ค‘ ๋“ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•˜๊ธฐ๊ฐ€ ๋งค์šฐ ์–ด๋ ต๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์–ด๋ ค์›€์— ๋Œ€์‘ํ•˜์—ฌ NDT ์ •ํ•ฉ ๊ธฐ๋ฐ˜ ์œ„์น˜ ์ถ”์ •์˜ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ํ‘œํ˜„๋ฒ•๊ณผ ์ •ํ•ฉ๋ฒ• 2๊ฐœ ํŒŒํŠธ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. ํ‘œํ˜„๋ฒ•์— ์žˆ์–ด ๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์Œ 3๊ฐœ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ์งธ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ถ„ํฌ์˜ ํ‡ดํ™”๋ฅผ ๋ง‰๊ธฐ ์œ„ํ•ด ๊ฒฝํ—˜์ ์œผ๋กœ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œ ๊ฐ’์„ ์ˆ˜์ •ํ•˜์—ฌ ๊ณต๊ฐ„์  ํ˜•ํƒœ์˜ ์™œ๊ณก์„ ๊ฐ€์ ธ์˜ค๋Š” ๋ฌธ์ œ์ ๊ณผ ๊ณ ํ•ด์ƒ๋„์˜ NDT๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ์…€๋‹น ์ ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜๋ฉฐ ๊ตฌ์กฐ๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ๋ถ„ํฌ๊ฐ€ ํ˜•์„ฑ๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ์ ์„ ์ฃผ๋ชฉํ–ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๊ฐ ์ ์— ๋Œ€ํ•ด ๋ถˆํ™•์‹ค์„ฑ์„ ๋ถ€์—ฌํ•˜๊ณ , ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์˜ ๊ธฐ๋Œ€๊ฐ’์œผ๋กœ ์ˆ˜์ •ํ•œ ํ™•๋ฅ ์  NDT (PNDT, probabilistic NDT) ํ‘œํ˜„๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ณต๊ฐ„ ์ •๋ณด์˜ ๋ˆ„๋ฝ ์—†์ด ๋ชจ๋“  ์ ์„ ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•œ NDT๋ฅผ ํ†ตํ•ด ํ–ฅ์ƒ๋œ ์ •ํ™•๋„๋ฅผ ๋ณด์ธ PNDT๋Š” ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•œ ๊ฐ€์„์„ ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ๋‘˜์งธ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ •์œก๋ฉด์ฒด๋ฅผ ์…€๋กœ ๋‹ค๋ฃจ๋ฉฐ, ์…€์„ ์ค‘์‹ฌ์ขŒํ‘œ์™€ ๋ณ€์˜ ๊ธธ์ด๋กœ ์ •์˜ํ•œ๋‹ค. ๋˜ํ•œ, ์…€๋“ค๋กœ ์ด๋ค„์ง„ ๊ฒฉ์ž๋ฅผ ๊ฐ ์…€์˜ ์ค‘์‹ฌ์  ์‚ฌ์ด์˜ ๊ฐ„๊ฒฉ๊ณผ ์…€์˜ ํฌ๊ธฐ๋กœ ์ •์˜ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ •์˜๋ฅผ ํ† ๋Œ€๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์…€์˜ ํ™•๋Œ€๋ฅผ ํ†ตํ•˜์—ฌ ์…€์„ ์ค‘์ฒฉ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๊ณผ ์…€์˜ ๊ฐ„๊ฒฉ ์กฐ์ ˆ์„ ํ†ตํ•˜์—ฌ ์…€์„ ์ค‘์ฒฉ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด 2D NDT์—์„œ ์‚ฌ์šฉํ•œ ์…€์˜ ์‚ฝ์ž…๋ฒ•์„ ์ฃผ๋ชฉํ•˜์˜€๋‹ค. ๋‹จ์ˆœ์ž…๋ฐฉ๊ตฌ์กฐ๋ฅผ ์ด๋ฃจ๋Š” ๊ธฐ์กด ๋ฐฉ๋ฒ• ์™ธ์— ๋ฉด์‹ฌ์ž…๋ฐฉ๊ตฌ์กฐ์™€ ์ฒด์‹ฌ์ž…๋ฐฉ๊ตฌ์กฐ์˜ ์…€๋กœ ์ด๋ค„์ง„ ๊ฒฉ์ž๊ฐ€ ์ƒ์„ฑํ•˜์˜€๋‹ค. ๊ทธ ๋‹ค์Œ ํ•ด๋‹น ๊ฒฉ์ž๋ฅผ ์ด์šฉํ•˜์—ฌ NDT๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ NDT๋ฅผ ์ •ํ•ฉํ•  ๋•Œ ๋งŽ์€ ์‹œ๊ฐ„์„ ์†Œ์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€์‘์Œ ๊ฒ€์ƒ‰ ์˜์—ญ์„ ์ •์˜ํ•˜์—ฌ ์ •ํ•ฉ ์†๋„๋ฅผ ํ–ฅ์ƒํ•˜์˜€๋‹ค. ์…‹์งธ, ์ €์‚ฌ์–‘ ๋กœ๋ด‡๋“ค์€ ์ ๊ตฐ ์ง€๋„๋ฅผ NDT ์ง€๋„๋กœ ์••์ถ•ํ•˜์—ฌ ๋ณด๊ด€ํ•˜๋Š” ๊ฒƒ์ด ํšจ์œจ์ ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋กœ๋ด‡ ํฌ์ฆˆ๊ฐ€ ๊ฐฑ์‹ ๋˜๊ฑฐ๋‚˜, ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡๊ฐ„ ๋ž‘๋ฐ๋ทฐ๊ฐ€ ์ผ์–ด๋‚˜ ์ง€๋„๋ฅผ ๊ณต์œ  ๋ฐ ๊ฒฐํ•ฉํ•˜๋Š” ๊ฒฝ์šฐ NDT์˜ ๋ถ„ํฌ ํ˜•ํƒœ๊ฐ€ ์™œ๊ณก๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ NDT ์žฌ์ƒ์„ฑ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ •ํ•ฉ๋ฒ•์— ์žˆ์–ด ๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์Œ 4๊ฐœ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ์งธ, ์ ๊ตฐ์˜ ๊ฐ ์ ์— ๋Œ€ํ•ด ๋Œ€์‘๋˜๋Š” ์ƒ‰์ƒ ์ •๋ณด๊ฐ€ ์ œ๊ณต๋  ๋•Œ ์ƒ‰์ƒ hue๋ฅผ ์ด์šฉํ•œ ํ–ฅ์ƒ๋œ NDT ์ •ํ•ฉ์œผ๋กœ ๊ฐ ๋Œ€์‘์Œ์— ๋Œ€ํ•ด hue์˜ ์œ ์‚ฌ๋„๋ฅผ ๋น„์ค‘์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘˜์งธ, ๋ณธ ๋…ผ๋ฌธ์€์€ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ์œ„์น˜ ๋ณ€ํ™”๋Ÿ‰์— ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์ค‘ ๋ ˆ์ด์–ด NDT ์ •ํ•ฉ (ML-NDT, multi-layered NDT)์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ํ‚ค๋ ˆ์ด์–ด NDT ์ •ํ•ฉ (KL-NDT, key-layered NDT)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. KL-NDT๋Š” ๊ฐ ํ•ด์ƒ๋„์˜ ์…€์—์„œ ํ™œ์„ฑํ™”๋œ ์ ์˜ ๊ฐœ์ˆ˜ ๋ณ€ํ™”๋Ÿ‰์„ ์ฒ™๋„๋กœ ํ‚ค๋ ˆ์ด์–ด๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๋˜ํ•œ ํ‚ค๋ ˆ์ด์–ด์—์„œ ์œ„์น˜์˜ ์ถ”์ •๊ฐ’์ด ์ˆ˜๋ ดํ•  ๋•Œ๊นŒ์ง€ ์ •ํ•ฉ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์„ ์ทจํ•˜์—ฌ ๋‹ค์Œ ํ‚ค๋ ˆ์ด์–ด์— ๋” ์ข‹์€ ์ดˆ๊ธฐ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค. ์…‹์งธ, ๋ณธ ๋…ผ๋ฌธ์€ ์ด์‚ฐ์ ์ธ ์…€๋กœ ์ธํ•ด NDT๊ฐ„ ์ •ํ•ฉ ๊ธฐ๋ฒ•์ธ NDT-D2D (distribution-to-distribution NDT)์˜ ๋ชฉ์  ํ•จ์ˆ˜๊ฐ€ ๋น„์„ ํ˜•์ด๋ฉฐ ๊ตญ์†Œ ์ตœ์ €์น˜์˜ ์™„ํ™”๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์‹ ๊ทœ NDT์™€ ๋ชจ๋ธ NDT์— ๋…๋ฆฝ๋œ ์Šค์ผ€์ผ์„ ์ •์˜ํ•˜๊ณ  ์Šค์ผ€์ผ์„ ๋ณ€ํ™”ํ•˜๋ฉฐ ์ •ํ•ฉํ•˜๋Š” ๋™์  ์Šค์ผ€์ผ ๊ธฐ๋ฐ˜ NDT ์ •ํ•ฉ (DSF-NDT-D2D, dynamic scaling factor-based NDT-D2D)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์€ ์†Œ์Šค NDT์™€ ์ง€๋„๊ฐ„ ์ฆ๋Œ€์  ์ •ํ•ฉ์„ ์ด์šฉํ•œ ์ฃผํ–‰๊ณ„ ์ถ”์ • ๋ฐ ์ง€๋„ ์ž‘์„ฑ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋กœ๋ด‡์˜ ํ˜„์žฌ ํฌ์ฆˆ์— ๋Œ€ํ•œ ์ดˆ๊ธฐ๊ฐ’์„ ์†Œ์Šค ์ ๊ตฐ์— ์ ์šฉํ•œ ๋’ค NDT๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ง€๋„ ์ƒ NDT์™€ ๊ฐ€๋Šฅํ•œ ํ•œ ์œ ์‚ฌํ•œ NDT๋ฅผ ์ž‘์„ฑํ•œ๋‹ค. ๊ทธ ๋‹ค์Œ ๋กœ๋ด‡ ํฌ์ฆˆ ๋ฐ ์†Œ์Šค NDT์˜ GC (Gaussian component)๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋ถ€๋ถ„์ง€๋„๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ์ถ”์ถœํ•œ ๋ถ€๋ถ„์ง€๋„์™€ ์†Œ์Šค NDT๋Š” ๋‹ค์ค‘ ๋ ˆ์ด์–ด NDT ์ •ํ•ฉ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ •ํ™•ํ•œ ์ฃผํ–‰๊ณ„๋ฅผ ์ถ”์ •ํ•˜๊ณ , ์ถ”์ • ํฌ์ฆˆ๋กœ ์†Œ์Šค ์ ๊ตฐ์„ ํšŒ์ „ ๋ฐ ์ด๋™ ํ›„ ๊ธฐ์กด ์ง€๋„๋ฅผ ๊ฐฑ์‹ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ํ†ตํ•ด ์ด ๋ฐฉ๋ฒ•์€ ํ˜„์žฌ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ LOAM (lidar odometry and mapping)์— ๋น„ํ•˜์—ฌ ๋” ๋†’์€ ์ •ํ™•๋„์™€ ๋” ๋น ๋ฅธ ์ฒ˜๋ฆฌ์†๋„๋ฅผ ๋ณด์˜€๋‹ค.The robot is a self-operating device using its intelligence, and autonomous navigation is a critical form of intelligence for a robot. This dissertation focuses on localization and mapping using a 3D range sensor for autonomous navigation. The robot can collect spatial information from the environment using a range sensor. This information can be used to reconstruct the environment. Additionally, the robot can estimate pose variations by registering the source point set with the model. Given that the point set collected by the sensor is expanded in three dimensions and becomes dense, registration using the normal distribution transform (NDT) has emerged as an alternative to the most commonly used iterative closest point (ICP) method. NDT is a compact representation which describes using a set of GCs (GC) converted from a point set. Because the number of GCs is much smaller than the number of points, with regard to the computation time, NDT outperforms ICP. However, the NDT has issues to be resolved, such as the discretization of the point set and the objective function. This dissertation is divided into two parts: representation and registration. For the representation part, first we present the probabilistic NDT (PNDT) to deal with the destruction and degeneration problems caused by the small cell size and the sparse point set. PNDT assigns an uncertainty to each point sample to convert a point set with fewer than four points into a distribution. As a result, PNDT allows for more precise registration using small cells. Second, we present lattice adjustment and cell insertion methods to overlap cells to overcome the discreteness problem of the NDT. In the lattice adjustment method, a lattice is expressed as the distance between the cells and the side length of each cell. In the cell insertion method, simple, face-centered-cubic, and body-centered-cubic lattices are compared. Third, we present a means of regenerating the NDT for the target lattice. A single robot updates its poses using simultaneous localization and mapping (SLAM) and fuses the NDT at each pose to update its NDT map. Moreover, multiple robots share NDT maps built with inconsistent lattices and fuse the maps. Because the simple fusion of the NDT maps can change the centers, shapes, and normal vectors of GCs, the regeneration method subdivides the NDT into truncated GCs using the target lattice and regenerates the NDT. For the registration part, first we present a hue-assisted NDT registration if the robot acquires color information corresponding to each point sample from a vision sensor. Each GC of the NDT has a distribution of the hue and uses the similarity of the hue distributions as the weight in the objective function. Second, we present a key-layered NDT registration (KL-NDT) method. The multi-layered NDT registration (ML-NDT) registers points to the NDT in multiple resolutions of lattices. However, the initial cell size and the number of layers are difficult to determine. KL-NDT determines the key layers in which the registration is performed based on the change of the number of activated points. Third, we present a method involving dynamic scaling factors of the covariance. This method scales the source NDT at zero initially to avoid a negative correlation between the likelihood and rotational alignment. It also scales the target NDT from the maximum scale to the minimum scale. Finally, we present a method of incremental registration of PNDTs which outperforms the state-of-the-art lidar odometry and mapping method.1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Point Set Registration . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Incremental Registration for Odometry Estimation . . . . . . 16 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Preliminaries 21 2.1 NDT Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 NDT Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 NDT Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Transformation Matrix and The Parameter Vector . . . . . . . . . . . 27 2.5 Cubic Cell and Lattice . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.8 Evaluation of Registration . . . . . . . . . . . . . . . . . . . . . . . 31 2.9 Benchmark Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Probabilistic NDT Representation 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Uncertainty of Point Based on Sensor Model . . . . . . . . . . . . . . 36 3.3 Probabilistic NDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Generalization of NDT Registration Based on PNDT . . . . . . . . . 40 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.2 Evaluation of Representation . . . . . . . . . . . . . . . . . . 41 3.5.3 Evaluation of Registration . . . . . . . . . . . . . . . . . . . 46 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Interpolation for NDT Using Overlapped Regular Cells 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Lattice Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Crystalline NDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 Lattice Adjustment . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.2 Performance of Crystalline NDT . . . . . . . . . . . . . . . . 60 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5 Regeneration of Normal Distributions Transform 65 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 67 5.2.1 Trivariate Normal Distribution . . . . . . . . . . . . . . . . . 67 5.2.2 Truncated Trivariate Normal Distribution . . . . . . . . . . . 67 5.3 Regeneration of NDT . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.1 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.2 Subdivision of Gaussian Components . . . . . . . . . . . . . 70 5.3.3 Fusion of Gaussian Components . . . . . . . . . . . . . . . . 72 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.1 Evaluation Metrics for Representation . . . . . . . . . . . . . 73 5.4.2 Representation Performance of the Regenerated NDT . . . . . 75 5.4.3 Computation Performance of the Regeneration . . . . . . . . 82 5.4.4 Application of Map Fusion . . . . . . . . . . . . . . . . . . . 83 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6 Hue-Assisted Registration 91 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.2 Preliminary of the HSV Model . . . . . . . . . . . . . . . . . . . . . 92 6.3 Colored Octree for Subdivision . . . . . . . . . . . . . . . . . . . . . 94 6.4 HA-NDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.5.1 Evaluation of HA-NDT against nhue . . . . . . . . . . . . . . 97 6.5.2 Evaluation of NDT and HA-NDT . . . . . . . . . . . . . . . 98 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7 Key-Layered NDT Registration 103 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.2 Key-layered NDT-P2D . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3.1 Evaluation of KL-NDT-P2D and ML-NDT-P2D . . . . . . . . 108 7.3.2 Evaluation of KL-NDT-D2D and ML-NDT-D2D . . . . . . . 111 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8 Scaled NDT and The Multi-scale Registration 113 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2 Scaled NDT representation and L2 distance . . . . . . . . . . . . . . 114 8.3 NDT-D2D with dynamic scaling factors of covariances . . . . . . . . 116 8.4 Range of scaling factors . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.5.1 Evaluation of the presented method without initial guess . . . 122 8.5.2 Application of odometry estimation . . . . . . . . . . . . . . 125 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 9 Scan-to-map Registration 129 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.2 Multi-layered PNDT . . . . . . . . . . . . . . . . . . . . . . . . . . 130 9.3 NDT Incremental Registration . . . . . . . . . . . . . . . . . . . . . 132 9.3.1 Initialization of PNDT-Map . . . . . . . . . . . . . . . . . . 133 9.3.2 Generation of Source ML-PNDT . . . . . . . . . . . . . . . . 134 9.3.3 Reconstruction of The Target ML-PNDT . . . . . . . . . . . 134 9.3.4 Pose Estimation Based on Multi-layered Registration . . . . . 135 9.3.5 Update of PNDT-Map . . . . . . . . . . . . . . . . . . . . . 136 9.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 10 Conclusions 142 Bibliography 145 ์ดˆ๋ก 159 ๊ฐ์‚ฌ์˜ ๊ธ€ 162Docto

    Perception of Unstructured Environments for Autonomous Off-Road Vehicles

    Get PDF
    Autonome Fahrzeuge benรถtigen die Fรคhigkeit zur Perzeption als eine notwendige Voraussetzung fรผr eine kontrollierbare und sichere Interaktion, um ihre Umgebung wahrzunehmen und zu verstehen. Perzeption fรผr strukturierte Innen- und AuรŸenumgebungen deckt wirtschaftlich lukrative Bereiche, wie den autonomen Personentransport oder die Industrierobotik ab, wรคhrend die Perzeption unstrukturierter Umgebungen im Forschungsfeld der Umgebungswahrnehmung stark unterreprรคsentiert ist. Die analysierten unstrukturierten Umgebungen stellen eine besondere Herausforderung dar, da die vorhandenen, natรผrlichen und gewachsenen Geometrien meist keine homogene Struktur aufweisen und รคhnliche Texturen sowie schwer zu trennende Objekte dominieren. Dies erschwert die Erfassung dieser Umgebungen und deren Interpretation, sodass Perzeptionsmethoden speziell fรผr diesen Anwendungsbereich konzipiert und optimiert werden mรผssen. In dieser Dissertation werden neuartige und optimierte Perzeptionsmethoden fรผr unstrukturierte Umgebungen vorgeschlagen und in einer ganzheitlichen, dreistufigen Pipeline fรผr autonome Gelรคndefahrzeuge kombiniert: Low-Level-, Mid-Level- und High-Level-Perzeption. Die vorgeschlagenen klassischen Methoden und maschinellen Lernmethoden (ML) zur Perzeption bzw.~Wahrnehmung ergรคnzen sich gegenseitig. Darรผber hinaus ermรถglicht die Kombination von Perzeptions- und Validierungsmethoden fรผr jede Ebene eine zuverlรคssige Wahrnehmung der mรถglicherweise unbekannten Umgebung, wobei lose und eng gekoppelte Validierungsmethoden kombiniert werden, um eine ausreichende, aber flexible Bewertung der vorgeschlagenen Perzeptionsmethoden zu gewรคhrleisten. Alle Methoden wurden als einzelne Module innerhalb der in dieser Arbeit vorgeschlagenen Perzeptions- und Validierungspipeline entwickelt, und ihre flexible Kombination ermรถglicht verschiedene Pipelinedesigns fรผr eine Vielzahl von Gelรคndefahrzeugen und Anwendungsfรคllen je nach Bedarf. Low-Level-Perzeption gewรคhrleistet eine eng gekoppelte Konfidenzbewertung fรผr rohe 2D- und 3D-Sensordaten, um Sensorausfรคlle zu erkennen und eine ausreichende Genauigkeit der Sensordaten zu gewรคhrleisten. Darรผber hinaus werden neuartige Kalibrierungs- und Registrierungsansรคtze fรผr Multisensorsysteme in der Perzeption vorgestellt, welche lediglich die Struktur der Umgebung nutzen, um die erfassten Sensordaten zu registrieren: ein halbautomatischer Registrierungsansatz zur Registrierung mehrerer 3D~Light Detection and Ranging (LiDAR) Sensoren und ein vertrauensbasiertes Framework, welches verschiedene Registrierungsmethoden kombiniert und die Registrierung verschiedener Sensoren mit unterschiedlichen Messprinzipien ermรถglicht. Dabei validiert die Kombination mehrerer Registrierungsmethoden die Registrierungsergebnisse in einer eng gekoppelten Weise. Mid-Level-Perzeption ermรถglicht die 3D-Rekonstruktion unstrukturierter Umgebungen mit zwei Verfahren zur Schรคtzung der Disparitรคt von Stereobildern: ein klassisches, korrelationsbasiertes Verfahren fรผr Hyperspektralbilder, welches eine begrenzte Menge an Test- und Validierungsdaten erfordert, und ein zweites Verfahren, welches die Disparitรคt aus Graustufenbildern mit neuronalen Faltungsnetzen (CNNs) schรคtzt. Neuartige Disparitรคtsfehlermetriken und eine Evaluierungs-Toolbox fรผr die 3D-Rekonstruktion von Stereobildern ergรคnzen die vorgeschlagenen Methoden zur Disparitรคtsschรคtzung aus Stereobildern und ermรถglichen deren lose gekoppelte Validierung. High-Level-Perzeption konzentriert sich auf die Interpretation von einzelnen 3D-Punktwolken zur Befahrbarkeitsanalyse, Objekterkennung und Hindernisvermeidung. Eine Domรคnentransferanalyse fรผr State-of-the-art-Methoden zur semantischen 3D-Segmentierung liefert Empfehlungen fรผr eine mรถglichst exakte Segmentierung in neuen Zieldomรคnen ohne eine Generierung neuer Trainingsdaten. Der vorgestellte Trainingsansatz fรผr 3D-Segmentierungsverfahren mit CNNs kann die benรถtigte Menge an Trainingsdaten weiter reduzieren. Methoden zur Erklรคrbarkeit kรผnstlicher Intelligenz vor und nach der Modellierung ermรถglichen eine lose gekoppelte Validierung der vorgeschlagenen High-Level-Methoden mit Datensatzbewertung und modellunabhรคngigen Erklรคrungen fรผr CNN-Vorhersagen. Altlastensanierung und Militรคrlogistik sind die beiden Hauptanwendungsfรคlle in unstrukturierten Umgebungen, welche in dieser Arbeit behandelt werden. Diese Anwendungsszenarien zeigen auch, wie die Lรผcke zwischen der Entwicklung einzelner Methoden und ihrer Integration in die Verarbeitungskette fรผr autonome Gelรคndefahrzeuge mit Lokalisierung, Kartierung, Planung und Steuerung geschlossen werden kann. Zusammenfassend lรคsst sich sagen, dass die vorgeschlagene Pipeline flexible Perzeptionslรถsungen fรผr autonome Gelรคndefahrzeuge bietet und die begleitende Validierung eine exakte und vertrauenswรผrdige Perzeption unstrukturierter Umgebungen gewรคhrleistet

    Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

    Get PDF
    Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    Evolution of A Common Vector Space Approach to Multi-Modal Problems

    Get PDF
    A set of methods to address computer vision problems has been developed. Video un- derstanding is an activate area of research in recent years. If one can accurately identify salient objects in a video sequence, these components can be used in information retrieval and scene analysis. This research started with the development of a course-to-fine frame- work to extract salient objects in video sequences. Previous work on image and video frame background modeling involved methods that ranged from simple and efficient to accurate but computationally complex. It will be shown in this research that the novel approach to implement object extraction is efficient and effective that outperforms the existing state-of-the-art methods. However, the drawback to this method is the inability to deal with non-rigid motion. With the rapid development of artificial neural networks, deep learning approaches are explored as a solution to computer vision problems in general. Focusing on image and text, the image (or video frame) understanding can be achieved using CVS. With this concept, modality generation and other relevant applications such as automatic im- age description, text paraphrasing, can be explored. Specifically, video sequences can be modeled by Recurrent Neural Networks (RNN), the greater depth of the RNN leads to smaller error, but that makes the gradient in the network unstable during training.To overcome this problem, a Batch-Normalized Recurrent Highway Network (BNRHN) was developed and tested on the image captioning (image-to-text) task. In BNRHN, the highway layers are incorporated with batch normalization which diminish the gradient vanishing and exploding problem. In addition, a sentence to vector encoding framework that is suitable for advanced natural language processing is developed. This semantic text embedding makes use of the encoder-decoder model which is trained on sentence paraphrase pairs (text-to-text). With this scheme, the latent representation of the text is shown to encode sentences with common semantic information with similar vector rep- resentations. In addition to image-to-text and text-to-text, an image generation model is developed to generate image from text (text-to-image) or another image (image-to- image) based on the semantics of the content. The developed model, which refers to the Multi-Modal Vector Representation (MMVR), builds and encodes different modalities into a common vector space that achieve the goal of keeping semantics and conversion between text and image bidirectional. The concept of CVS is introduced in this research to deal with multi-modal conversion problems. In theory, this method works not only on text and image, but also can be generalized to other modalities, such as video and audio. The characteristics and performance are supported by both theoretical analysis and experimental results. Interestingly, the MMVR model is one of the many possible ways to build CVS. In the final stages of this research, a simple and straightforward framework to build CVS, which is considered as an alternative to the MMVR model, is presented
    corecore