Search CORE

46 research outputs found

Visual Perception of Objects and their Parts in Artificial Systems

Author: Schoeler Markus
Publication venue
Publication date: 12/10/2015
Field of study

Unsupervised brain anomaly detection in MR images

Author: Botter Martins Samuel
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2020
Field of study

Brain disorders are characterized by morphological deformations in shape and size of (sub)cortical structures in one or both hemispheres. These deformations cause deviations from the normal pattern of brain asymmetries, resulting in asymmetric lesions that directly affect the patient’s condition. Unsupervised methods aim to learn a model from unlabeled healthy images, so that an unseen image that breaks priors of this model, i.e., an outlier, is considered an anomaly. Consequently, they are generic in detecting any lesions, e.g., coming from multiple diseases, as long as these notably differ from healthy training images. This thesis addresses the development of solutions to leverage unsupervised machine learning for the detection/analysis of abnormal brain asymmetries related to anomalies in magnetic resonance (MR) images. First, we propose an automatic probabilistic-atlas-based approach for anomalous brain image segmentation. Second, we explore an automatic method for the detection of abnormal hippocampi from abnormal asymmetries based on deep generative networks and a one-class classifier. Third, we present a more generic framework to detect abnormal asymmetries in the entire brain hemispheres. Our approach extracts pairs of symmetric regions — called supervoxels — in both hemispheres of a test image under study. One-class classifiers then analyze the asymmetries present in each pair. Experimental results on 3D MR-T1 images from healthy subjects and patients with a variety of lesions show the effectiveness and robustness of the proposed unsupervised approaches for brain anomaly detection

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Cross-source Point Cloud Registration: Challenges, Progress and Prospects

Author: Huang Xiaoshui
Mei Guofeng
Zhang Jian
Publication venue
Publication date: 22/05/2023
Field of study

The emerging topic of cross-source point cloud (CSPC) registration has attracted increasing attention with the fast development background of 3D sensor technologies. Different from the conventional same-source point clouds that focus on data from same kind of 3D sensor (e.g., Kinect), CSPCs come from different kinds of 3D sensors (e.g., Kinect and { LiDAR}). CSPC registration generalizes the requirement of data acquisition from same-source to different sources, which leads to generalized applications and combines the advantages of multiple sensors. In this paper, we provide a systematic review on CSPC registration. We first present the characteristics of CSPC, and then summarize the key challenges in this research area, followed by the corresponding research progress consisting of the most recent and representative developments on this topic. Finally, we discuss the important research directions in this vibrant area and explain the role in several application fields.Comment: Accepted by Neurocomputing 202

arXiv.org e-Print Archive

주행계 및 지도 작성을 위한 3차원 확률적 정규분포변환의 정합 방법

Author: Hyunki Hong
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. 이범희.로봇은 거리센서를 이용하여 위치한 환경의 공간 정보를 점군(point set) 형태로 수집할 수 있는데, 이렇게 수집한 정보를 환경의 복원에 이용할 수 있다. 또한, 로봇은 점군과 모델을 정합하는 위치를 추정할 수 있다. 거리센서가 수집한 점군이 2차원에서 3차원으로 확장되고 해상도가 높아지면서 점의 개수가 크게 증가하면서, NDT (normal distributions transform)를 이용한 정합이 ICP (iterative closest point)의 대안으로 부상하였다. NDT는 점군을 분포로 변환하여 공간을 표현하는 압축된 공간 표현 방법이다. 분포의 개수가 점의 개수에 비해 월등히 작기 때문에 ICP에 비해 빠른 성능을 가졌다. 그러나 NDT 정합 기반 위치 추정의 성능을 좌우하는 셀의 크기, 셀의 중첩 정도, 셀의 방향, 분포의 스케일, 대응쌍의 비중 등 파라미터를 설정하기가 매우 어렵다. 본 학위 논문에서는 이러한 어려움에 대응하여 NDT 정합 기반 위치 추정의 정확도를 향상할 수 있는 방법을 제안하였다. 본 논문은 표현법과 정합법 2개 파트로 나눌 수 있다. 표현법에 있어 본 논문은 다음 3개 방법을 제안하였다. 첫째, 본 논문에서는 분포의 퇴화를 막기 위해 경험적으로 공분산 행렬의 고유값을 수정하여 공간적 형태의 왜곡을 가져오는 문제점과 고해상도의 NDT를 생성할 때 셀당 점의 개수가 감소하며 구조를 반영하는 분포가 형성되지 않는 문제점을 주목했다. 이를 해결하기 위하여 각 점에 대해 불확실성을 부여하고, 평균과 분산의 기대값으로 수정한 확률적 NDT (PNDT, probabilistic NDT) 표현법을 제안하였다. 공간 정보의 누락 없이 모든 점을 분포로 변환한 NDT를 통해 향상된 정확도를 보인 PNDT는 샘플링을 통한 가을을 가능하도록 하였다. 둘째, 본 논문에서는 정육면체를 셀로 다루며, 셀을 중심좌표와 변의 길이로 정의한다. 또한, 셀들로 이뤄진 격자를 각 셀의 중심점 사이의 간격과 셀의 크기로 정의한다. 이러한 정의를 토대로, 본 논문에서는 셀의 확대를 통하여 셀을 중첩시키는 방법과 셀의 간격 조절을 통하여 셀을 중첩시키는 방법을 제안하였다. 본 논문은 기존 2D NDT에서 사용한 셀의 삽입법을 주목하였다. 단순입방구조를 이루는 기존 방법 외에 면심입방구조와 체심입방구조의 셀로 이뤄진 격자가 생성하였다. 그 다음 해당 격자를 이용하여 NDT를 생성하는 방법을 제안하였다. 또한, 이렇게 생성된 NDT를 정합할 때 많은 시간을 소요하기 때문에 대응쌍 검색 영역을 정의하여 정합 속도를 향상하였다. 셋째, 저사양 로봇들은 점군 지도를 NDT 지도로 압축하여 보관하는 것이 효율적이다. 그러나 로봇 포즈가 갱신되거나, 다개체 로봇간 랑데뷰가 일어나 지도를 공유 및 결합하는 경우 NDT의 분포 형태가 왜곡되는 문제가 발생한다. 이러한 문제를 해결하기 위하여 NDT 재생성 방법을 제안하였다. 정합법에 있어 본 논문은 다음 4개 방법을 제안하였다. 첫째, 점군의 각 점에 대해 대응되는 색상 정보가 제공될 때 색상 hue를 이용한 향상된 NDT 정합으로 각 대응쌍에 대해 hue의 유사도를 비중으로 사용하는 목적함수를 제안하였다. 둘째, 본 논문은은 다양한 크기의 위치 변화량에 대응하기 위한 다중 레이어 NDT 정합 (ML-NDT, multi-layered NDT)의 한계를 극복하기 위하여 키레이어 NDT 정합 (KL-NDT, key-layered NDT)을 제안하였다. KL-NDT는 각 해상도의 셀에서 활성화된 점의 개수 변화량을 척도로 키레이어를 결정한다. 또한 키레이어에서 위치의 추정값이 수렴할 때까지 정합을 수행하는 방식을 취하여 다음 키레이어에 더 좋은 초기값을 제공한다. 셋째, 본 논문은 이산적인 셀로 인해 NDT간 정합 기법인 NDT-D2D (distribution-to-distribution NDT)의 목적 함수가 비선형이며 국소 최저치의 완화를 위한 방법으로 신규 NDT와 모델 NDT에 독립된 스케일을 정의하고 스케일을 변화하며 정합하는 동적 스케일 기반 NDT 정합 (DSF-NDT-D2D, dynamic scaling factor-based NDT-D2D)을 제안하였다. 마지막으로, 본 논문은 소스 NDT와 지도간 증대적 정합을 이용한 주행계 추정 및 지도 작성 방법을 제안하였다. 이 방법은 로봇의 현재 포즈에 대한 초기값을 소스 점군에 적용한 뒤 NDT로 변환하여 지도 상 NDT와 가능한 한 유사한 NDT를 작성한다. 그 다음 로봇 포즈 및 소스 NDT의 GC (Gaussian component)를 고려하여 부분지도를 추출한다. 이렇게 추출한 부분지도와 소스 NDT는 다중 레이어 NDT 정합을 수행하여 정확한 주행계를 추정하고, 추정 포즈로 소스 점군을 회전 및 이동 후 기존 지도를 갱신한다. 이러한 과정을 통해 이 방법은 현재 최고 성능을 가진 LOAM (lidar odometry and mapping)에 비하여 더 높은 정확도와 더 빠른 처리속도를 보였다.The robot is a self-operating device using its intelligence, and autonomous navigation is a critical form of intelligence for a robot. This dissertation focuses on localization and mapping using a 3D range sensor for autonomous navigation. The robot can collect spatial information from the environment using a range sensor. This information can be used to reconstruct the environment. Additionally, the robot can estimate pose variations by registering the source point set with the model. Given that the point set collected by the sensor is expanded in three dimensions and becomes dense, registration using the normal distribution transform (NDT) has emerged as an alternative to the most commonly used iterative closest point (ICP) method. NDT is a compact representation which describes using a set of GCs (GC) converted from a point set. Because the number of GCs is much smaller than the number of points, with regard to the computation time, NDT outperforms ICP. However, the NDT has issues to be resolved, such as the discretization of the point set and the objective function. This dissertation is divided into two parts: representation and registration. For the representation part, first we present the probabilistic NDT (PNDT) to deal with the destruction and degeneration problems caused by the small cell size and the sparse point set. PNDT assigns an uncertainty to each point sample to convert a point set with fewer than four points into a distribution. As a result, PNDT allows for more precise registration using small cells. Second, we present lattice adjustment and cell insertion methods to overlap cells to overcome the discreteness problem of the NDT. In the lattice adjustment method, a lattice is expressed as the distance between the cells and the side length of each cell. In the cell insertion method, simple, face-centered-cubic, and body-centered-cubic lattices are compared. Third, we present a means of regenerating the NDT for the target lattice. A single robot updates its poses using simultaneous localization and mapping (SLAM) and fuses the NDT at each pose to update its NDT map. Moreover, multiple robots share NDT maps built with inconsistent lattices and fuse the maps. Because the simple fusion of the NDT maps can change the centers, shapes, and normal vectors of GCs, the regeneration method subdivides the NDT into truncated GCs using the target lattice and regenerates the NDT. For the registration part, first we present a hue-assisted NDT registration if the robot acquires color information corresponding to each point sample from a vision sensor. Each GC of the NDT has a distribution of the hue and uses the similarity of the hue distributions as the weight in the objective function. Second, we present a key-layered NDT registration (KL-NDT) method. The multi-layered NDT registration (ML-NDT) registers points to the NDT in multiple resolutions of lattices. However, the initial cell size and the number of layers are difficult to determine. KL-NDT determines the key layers in which the registration is performed based on the change of the number of activated points. Third, we present a method involving dynamic scaling factors of the covariance. This method scales the source NDT at zero initially to avoid a negative correlation between the likelihood and rotational alignment. It also scales the target NDT from the maximum scale to the minimum scale. Finally, we present a method of incremental registration of PNDTs which outperforms the state-of-the-art lidar odometry and mapping method.1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Point Set Registration . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Incremental Registration for Odometry Estimation . . . . . . 16 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Preliminaries 21 2.1 NDT Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 NDT Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 NDT Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Transformation Matrix and The Parameter Vector . . . . . . . . . . . 27 2.5 Cubic Cell and Lattice . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.8 Evaluation of Registration . . . . . . . . . . . . . . . . . . . . . . . 31 2.9 Benchmark Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Probabilistic NDT Representation 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Uncertainty of Point Based on Sensor Model . . . . . . . . . . . . . . 36 3.3 Probabilistic NDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Generalization of NDT Registration Based on PNDT . . . . . . . . . 40 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.2 Evaluation of Representation . . . . . . . . . . . . . . . . . . 41 3.5.3 Evaluation of Registration . . . . . . . . . . . . . . . . . . . 46 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Interpolation for NDT Using Overlapped Regular Cells 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Lattice Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Crystalline NDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 Lattice Adjustment . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.2 Performance of Crystalline NDT . . . . . . . . . . . . . . . . 60 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5 Regeneration of Normal Distributions Transform 65 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 67 5.2.1 Trivariate Normal Distribution . . . . . . . . . . . . . . . . . 67 5.2.2 Truncated Trivariate Normal Distribution . . . . . . . . . . . 67 5.3 Regeneration of NDT . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.1 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.2 Subdivision of Gaussian Components . . . . . . . . . . . . . 70 5.3.3 Fusion of Gaussian Components . . . . . . . . . . . . . . . . 72 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.1 Evaluation Metrics for Representation . . . . . . . . . . . . . 73 5.4.2 Representation Performance of the Regenerated NDT . . . . . 75 5.4.3 Computation Performance of the Regeneration . . . . . . . . 82 5.4.4 Application of Map Fusion . . . . . . . . . . . . . . . . . . . 83 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6 Hue-Assisted Registration 91 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.2 Preliminary of the HSV Model . . . . . . . . . . . . . . . . . . . . . 92 6.3 Colored Octree for Subdivision . . . . . . . . . . . . . . . . . . . . . 94 6.4 HA-NDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.5.1 Evaluation of HA-NDT against nhue . . . . . . . . . . . . . . 97 6.5.2 Evaluation of NDT and HA-NDT . . . . . . . . . . . . . . . 98 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7 Key-Layered NDT Registration 103 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.2 Key-layered NDT-P2D . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3.1 Evaluation of KL-NDT-P2D and ML-NDT-P2D . . . . . . . . 108 7.3.2 Evaluation of KL-NDT-D2D and ML-NDT-D2D . . . . . . . 111 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8 Scaled NDT and The Multi-scale Registration 113 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2 Scaled NDT representation and L2 distance . . . . . . . . . . . . . . 114 8.3 NDT-D2D with dynamic scaling factors of covariances . . . . . . . . 116 8.4 Range of scaling factors . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.5.1 Evaluation of the presented method without initial guess . . . 122 8.5.2 Application of odometry estimation . . . . . . . . . . . . . . 125 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 9 Scan-to-map Registration 129 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.2 Multi-layered PNDT . . . . . . . . . . . . . . . . . . . . . . . . . . 130 9.3 NDT Incremental Registration . . . . . . . . . . . . . . . . . . . . . 132 9.3.1 Initialization of PNDT-Map . . . . . . . . . . . . . . . . . . 133 9.3.2 Generation of Source ML-PNDT . . . . . . . . . . . . . . . . 134 9.3.3 Reconstruction of The Target ML-PNDT . . . . . . . . . . . 134 9.3.4 Pose Estimation Based on Multi-layered Registration . . . . . 135 9.3.5 Update of PNDT-Map . . . . . . . . . . . . . . . . . . . . . 136 9.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 10 Conclusions 142 Bibliography 145 초록 159 감사의 글 162Docto

SNU Open Repository and Archive

Perception of Unstructured Environments for Autonomous Off-Road Vehicles

Author: Heide Nina Felicitas
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 19/08/2022
Field of study

Autonome Fahrzeuge benötigen die Fähigkeit zur Perzeption als eine notwendige Voraussetzung für eine kontrollierbare und sichere Interaktion, um ihre Umgebung wahrzunehmen und zu verstehen. Perzeption für strukturierte Innen- und Außenumgebungen deckt wirtschaftlich lukrative Bereiche, wie den autonomen Personentransport oder die Industrierobotik ab, während die Perzeption unstrukturierter Umgebungen im Forschungsfeld der Umgebungswahrnehmung stark unterrepräsentiert ist. Die analysierten unstrukturierten Umgebungen stellen eine besondere Herausforderung dar, da die vorhandenen, natürlichen und gewachsenen Geometrien meist keine homogene Struktur aufweisen und ähnliche Texturen sowie schwer zu trennende Objekte dominieren. Dies erschwert die Erfassung dieser Umgebungen und deren Interpretation, sodass Perzeptionsmethoden speziell für diesen Anwendungsbereich konzipiert und optimiert werden müssen. In dieser Dissertation werden neuartige und optimierte Perzeptionsmethoden für unstrukturierte Umgebungen vorgeschlagen und in einer ganzheitlichen, dreistufigen Pipeline für autonome Geländefahrzeuge kombiniert: Low-Level-, Mid-Level- und High-Level-Perzeption. Die vorgeschlagenen klassischen Methoden und maschinellen Lernmethoden (ML) zur Perzeption bzw.~Wahrnehmung ergänzen sich gegenseitig. Darüber hinaus ermöglicht die Kombination von Perzeptions- und Validierungsmethoden für jede Ebene eine zuverlässige Wahrnehmung der möglicherweise unbekannten Umgebung, wobei lose und eng gekoppelte Validierungsmethoden kombiniert werden, um eine ausreichende, aber flexible Bewertung der vorgeschlagenen Perzeptionsmethoden zu gewährleisten. Alle Methoden wurden als einzelne Module innerhalb der in dieser Arbeit vorgeschlagenen Perzeptions- und Validierungspipeline entwickelt, und ihre flexible Kombination ermöglicht verschiedene Pipelinedesigns für eine Vielzahl von Geländefahrzeugen und Anwendungsfällen je nach Bedarf. Low-Level-Perzeption gewährleistet eine eng gekoppelte Konfidenzbewertung für rohe 2D- und 3D-Sensordaten, um Sensorausfälle zu erkennen und eine ausreichende Genauigkeit der Sensordaten zu gewährleisten. Darüber hinaus werden neuartige Kalibrierungs- und Registrierungsansätze für Multisensorsysteme in der Perzeption vorgestellt, welche lediglich die Struktur der Umgebung nutzen, um die erfassten Sensordaten zu registrieren: ein halbautomatischer Registrierungsansatz zur Registrierung mehrerer 3D~Light Detection and Ranging (LiDAR) Sensoren und ein vertrauensbasiertes Framework, welches verschiedene Registrierungsmethoden kombiniert und die Registrierung verschiedener Sensoren mit unterschiedlichen Messprinzipien ermöglicht. Dabei validiert die Kombination mehrerer Registrierungsmethoden die Registrierungsergebnisse in einer eng gekoppelten Weise. Mid-Level-Perzeption ermöglicht die 3D-Rekonstruktion unstrukturierter Umgebungen mit zwei Verfahren zur Schätzung der Disparität von Stereobildern: ein klassisches, korrelationsbasiertes Verfahren für Hyperspektralbilder, welches eine begrenzte Menge an Test- und Validierungsdaten erfordert, und ein zweites Verfahren, welches die Disparität aus Graustufenbildern mit neuronalen Faltungsnetzen (CNNs) schätzt. Neuartige Disparitätsfehlermetriken und eine Evaluierungs-Toolbox für die 3D-Rekonstruktion von Stereobildern ergänzen die vorgeschlagenen Methoden zur Disparitätsschätzung aus Stereobildern und ermöglichen deren lose gekoppelte Validierung. High-Level-Perzeption konzentriert sich auf die Interpretation von einzelnen 3D-Punktwolken zur Befahrbarkeitsanalyse, Objekterkennung und Hindernisvermeidung. Eine Domänentransferanalyse für State-of-the-art-Methoden zur semantischen 3D-Segmentierung liefert Empfehlungen für eine möglichst exakte Segmentierung in neuen Zieldomänen ohne eine Generierung neuer Trainingsdaten. Der vorgestellte Trainingsansatz für 3D-Segmentierungsverfahren mit CNNs kann die benötigte Menge an Trainingsdaten weiter reduzieren. Methoden zur Erklärbarkeit künstlicher Intelligenz vor und nach der Modellierung ermöglichen eine lose gekoppelte Validierung der vorgeschlagenen High-Level-Methoden mit Datensatzbewertung und modellunabhängigen Erklärungen für CNN-Vorhersagen. Altlastensanierung und Militärlogistik sind die beiden Hauptanwendungsfälle in unstrukturierten Umgebungen, welche in dieser Arbeit behandelt werden. Diese Anwendungsszenarien zeigen auch, wie die Lücke zwischen der Entwicklung einzelner Methoden und ihrer Integration in die Verarbeitungskette für autonome Geländefahrzeuge mit Lokalisierung, Kartierung, Planung und Steuerung geschlossen werden kann. Zusammenfassend lässt sich sagen, dass die vorgeschlagene Pipeline flexible Perzeptionslösungen für autonome Geländefahrzeuge bietet und die begleitende Validierung eine exakte und vertrauenswürdige Perzeption unstrukturierter Umgebungen gewährleistet

KITopen

Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

Author: Kragh Mikkel Fly
Publication venue: 'Aarhus University Library'
Publication date: 28/11/2018
Field of study

Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE

AU Library Scholarly Publishing Services: E-books (Aarhus University)

Visual Perception For Robotic Spatial Understanding

Author: Owens Jason Lawrence
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

ScholarlyCommons@Penn

Evolution of A Common Vector Space Approach to Multi-Modal Problems

Author: Zhang Chi
Publication venue: RIT Scholar Works
Publication date: 18/10/2018
Field of study

A set of methods to address computer vision problems has been developed. Video un- derstanding is an activate area of research in recent years. If one can accurately identify salient objects in a video sequence, these components can be used in information retrieval and scene analysis. This research started with the development of a course-to-fine frame- work to extract salient objects in video sequences. Previous work on image and video frame background modeling involved methods that ranged from simple and efficient to accurate but computationally complex. It will be shown in this research that the novel approach to implement object extraction is efficient and effective that outperforms the existing state-of-the-art methods. However, the drawback to this method is the inability to deal with non-rigid motion. With the rapid development of artificial neural networks, deep learning approaches are explored as a solution to computer vision problems in general. Focusing on image and text, the image (or video frame) understanding can be achieved using CVS. With this concept, modality generation and other relevant applications such as automatic im- age description, text paraphrasing, can be explored. Specifically, video sequences can be modeled by Recurrent Neural Networks (RNN), the greater depth of the RNN leads to smaller error, but that makes the gradient in the network unstable during training.To overcome this problem, a Batch-Normalized Recurrent Highway Network (BNRHN) was developed and tested on the image captioning (image-to-text) task. In BNRHN, the highway layers are incorporated with batch normalization which diminish the gradient vanishing and exploding problem. In addition, a sentence to vector encoding framework that is suitable for advanced natural language processing is developed. This semantic text embedding makes use of the encoder-decoder model which is trained on sentence paraphrase pairs (text-to-text). With this scheme, the latent representation of the text is shown to encode sentences with common semantic information with similar vector rep- resentations. In addition to image-to-text and text-to-text, an image generation model is developed to generate image from text (text-to-image) or another image (image-to- image) based on the semantics of the content. The developed model, which refers to the Multi-Modal Vector Representation (MMVR), builds and encodes different modalities into a common vector space that achieve the goal of keeping semantics and conversion between text and image bidirectional. The concept of CVS is introduced in this research to deal with multi-modal conversion problems. In theory, this method works not only on text and image, but also can be generalized to other modalities, such as video and audio. The characteristics and performance are supported by both theoretical analysis and experimental results. Interestingly, the MMVR model is one of the many possible ways to build CVS. In the final stages of this research, a simple and straightforward framework to build CVS, which is considered as an alternative to the MMVR model, is presented

RIT Scholar Works