27 research outputs found

    Efficient and accurate stereo matching for cloth manipulation

    Get PDF
    Due to the recent development of robotic techniques, researching robots that can assist in everyday household tasks, especially robotic cloth manipulation has become popular in recent years. Stereo matching forms a crucial part of the robotic vision and aims to derive depth information from image pairs captured by the stereo cameras. Although stereo robotic vision is widely adopted for cloth manipulation robots in the research community, this remains a challenging research task. Robotic vision requires very accurate depth output in a relatively short timespan in order to successfully perform cloth manipulation in real-time. In this thesis, we mainly aim to develop a robotic stereo matching based vision system that is both efficient and effective for the task of robotic cloth manipulation. Effectiveness refers to the accuracy of the depth map generated from the stereo matching algorithms for the robot to grasp the required details to achieve the given task on cloth materials while efficiency emphasizes the required time for the stereo matching to process the images. With respect to efficiency, firstly, by exploring a variety of different hardware architectures such as multi-core CPU and graphic processors (GPU) to accelerate stereo matching, we demonstrate that the parallelised stereo-matching algorithm can be significantly accelerated, achieving 12X and 176X speed-ups respectively for multi-core CPU and GPU, compared with SISD (Single Instruction, Single Data) single-thread CPU. In terms of effectiveness, due to the fact that there are no cloth based testbeds with depth map ground-truths for evaluating the accuracy of stereo matching performance in this context, we created five different testbeds to facilitate evaluation of stereo matching in the context of cloth manipulation. In addition, we adapted a guided filtering algorithm into a pyramidical stereo matching framework that works directly for unrectified images, and evaluate its accuracy utilizing the created cloth testbeds. We demonstrate that our proposed approach is not only efficient, but also accurate and suits well to the characteristics of the task of cloth manipulations. This also shows that rather than relying on image rectification, directly applying stereo matching to unrectified images is effective and efficient. Finally, we further explore whether we can improve efficiency while maintaining reasonable accuracy for robotic cloth manipulations (i.e.~trading off accuracy for efficiency). We use a foveated matching algorithm, inspired by biological vision systems, and found that it is effective in trading off accuracy for efficiency, achieving almost the same level of accuracy for both cloth grasping and flattening tasks with two to three fold acceleration. We also demonstrate that with the robot we can use machine learning techniques to predict the optimal foveation level in order to accomplish the robotic cloth manipulation tasks successfully and much more efficiently. To summarize, in this thesis, we extensively study stereo matching, contributing to the long-term goal of developing effective ways for efficient whilst accurate robotic stereo matching for cloth manipulation

    Deep Vision for Prosthetic Grasp

    Get PDF
    Ph. D. ThesisThe loss of the hand can limit the natural ability of individuals in grasping and manipulating objects and affect their quality of life. Prosthetic hands can aid the users in overcoming these limitations and regaining their ability. Despite considerable technical advances, the control of commercial hand prostheses is still limited to few degrees of freedom. Furthermore, switching a prosthetic hand into a desired grip mode can be tiring. Therefore, the performance of hand prostheses should improve greatly. The main aim of this thesis is to improve the functionality, performance and flexibility of current hand prostheses by augmentation of current commercial hand prosthetics with a vision module. By offering the prosthesis the capacity to see objects, appropriate grip modes can be determined autonomously and quickly. Several deep learning-based approaches were designed in this thesis to realise such a vision-reinforced prosthetic system. Importantly, the user, interacting with this learning structure, may act as a supervisor to accept or correct the suggested grasp. Amputee participants evaluated the designed system and provided feedback. The following objectives for prosthetic hands were met: 1. Chapter 3: Design, implementation and real-time testing of a semi-autonomous vision-reinforced prosthetic control structure, empowered with a baseline convolutional neural network deep learning structure. 2. Chapter 4: Development of advanced deep learning structure to simultaneously detect and estimate grasp maps for unknown objects, in presence of ambiguity. 3. Chapter 5: Design and development of several deep learning set-ups for concurrent depth and grasp map as well as human grasp type prediction. Publicly available datasets, consisting of common graspable objects, namely Amsterdam library of object images (ALOI) and Cornell grasp library were used within this thesis. Moreover, to have access to real data, a small dataset of household objects was gathered for the experiments, that is Newcastle Grasp Library.EPSRC, School of Engineering Newcastle University

    On Motion Analysis in Computer Vision with Deep Learning: Selected Case Studies

    Get PDF
    Motion analysis is one of the essential enabling technologies in computer vision. Despite recent significant advances, image-based motion analysis remains a very challenging problem. This challenge arises because the motion features are extracted directory from a sequence of images without any other meta data information. Extracting motion information (features) is inherently more difficult than in other computer vision disciplines. In a traditional approach, the motion analysis is often formulated as an optimisation problem, with the motion model being hand-crafted to reflect our understanding of the problem domain. The critical element of these traditional methods is a prior assumption about the model of motion believed to represent a specific problem. Data analyticsโ€™ recent trend is to replace hand-crafted prior assumptions with a model learned directly from observational data with no, or very limited, prior assumptions about that model. Although known for a long time, these approaches, based on machine learning, have been shown competitive only very recently due to advances in the so-called deep learning methodologies. This work's key aim has been to investigate novel approaches, utilising the deep learning methodologies, for motion analysis where the motion model is learned directly from observed data. These new approaches have focused on investigating the deep network architectures suitable for the effective extraction of spatiotemporal information. Due to the estimated motion parameters' volume and structure, it is frequently difficult or even impossible to obtain relevant ground truth data. Missing ground truth leads to choose the unsupervised learning methodologies which is usually represents challenging choice to utilize in already challenging high dimensional motion representation of the image sequence. The main challenge with unsupervised learning is to evaluate if the algorithm can learn the data model directly from the data only without any prior knowledge presented to the deep learning model during In this project, an emphasis has been put on the unsupervised learning approaches. Owning to a broad spectrum of computer vision problems and applications related to motion analysis, the research reported in the thesis has focused on three specific motion analysis challenges and corresponding practical case studies. These include motion detection and recognition, as well as 2D and 3D motion field estimation. Eyeblinks quantification has been used as a case study for the motion detection and recognition problem. The approach proposed for this problem consists of a novel network architecture processing weakly corresponded images in an action completion regime with learned spatiotemporal image features fused using cascaded recurrent networks. The stereo-vision disparity estimation task has been selected as a case study for the 2D motion field estimation problem. The proposed method directly estimates occlusion maps using novel convolutional neural network architecture that is trained with a custom-designed loss function in an unsupervised manner. The volumetric data registration task has been chosen as a case study for the 3D motion field estimation problem. The proposed solution is based on the 3D CNN, with a novel architecture featuring a Generative Adversarial Network used during training to improve network performance for unseen data. All the proposed networks demonstrated a state-of-the-art performance compared to other corresponding methods reported in the literature on a number of assessment metrics. In particular, the proposed architecture for 3D motion field estimation has shown to outperform the previously reported manual expert-guided registration methodology

    3D ์† ํฌ์ฆˆ ์ธ์‹์„ ์œ„ํ•œ ์ธ์กฐ ๋ฐ์ดํ„ฐ์˜ ์ด์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2021.8. ์–‘ํ•œ์—ด.3D hand pose estimation (HPE) based on RGB images has been studied for a long time. Relevant methods have focused mainly on optimization of neural framework for graphically connected finger joints. Training RGB-based HPE models has not been easy to train because of the scarcity on RGB hand pose datasets; unlike human body pose datasets, the finger joints that span hand postures are structured delicately and exquisitely. Such structure makes accurately annotating each joint with unique 3D world coordinates difficult, which is why many conventional methods rely on synthetic data samples to cover large variations of hand postures. Synthetic dataset consists of very precise annotations of ground truths, and further allows control over the variety of data samples, yielding a learning model to be trained with a large pose space. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human hand pose estimation is a particularly interesting example of this synthetic-to-real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this dissertation, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework for better 3D hand pose estimation performance, which leads to the necessity of a large scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movements by re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential images of synthetic hands in motion and emphasizing temporal smoothness of estimations with a temporal consistency constraint. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in experiments on hand pose estimation benchmarks. Since a fixed set of dataset provides a finite distribution of data samples, the generalization of a learning pose estimation network is limited in terms of pose, RGB and viewpoint spaces. We further propose to augment the data automatically such that the augmented pose sampling is performed in favor of training pose estimators generalization performance. Such auto-augmentation of poses is performed within a learning feature space in order to avoid computational burden of generating synthetic sample for every iteration of updates. The proposed effort can be considered as generating and utilizing synthetic samples for network training in the feature space. This allows training efficiency by requiring less number of real data samples, enhanced generalization power over multiple dataset domains and estimation performance caused by efficient augmentation.2D ์ด๋ฏธ์ง€์—์„œ ์‚ฌ๋žŒ์˜ ์† ๋ชจ์–‘๊ณผ ํฌ์ฆˆ๋ฅผ ์ธ์‹ํ•˜๊ณ  ๊ตฌํ˜„ํ๋Š” ์—ฐ๊ตฌ๋Š” ๊ฐ ์†๊ฐ€๋ฝ ์กฐ์ธํŠธ๋“ค์˜ 3D ์œ„์น˜๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœํ•œ๋‹ค. ์† ํฌ์ฆˆ๋Š” ์†๊ฐ€๋ฝ ์กฐ์ธํŠธ๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ณ  ์†๋ชฉ ๊ด€์ ˆ๋ถ€ํ„ฐ MCP, PIP, DIP ์กฐ์ธํŠธ๋“ค๋กœ ์‚ฌ๋žŒ ์†์„ ๊ตฌ์„ฑํ•˜๋Š” ์‹ ์ฒด์  ์š”์†Œ๋“ค์„ ์˜๋ฏธํ•œ๋‹ค. ์† ํฌ์ฆˆ ์ •๋ณด๋Š” ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋ ์ˆ˜ ์žˆ๊ณ  ์† ์ œ์Šค์ณ ๊ฐ์ง€ ์—ฐ๊ตฌ ๋ถ„์•ผ์—์„œ ์† ํฌ์ฆˆ ์ •๋ณด๊ฐ€ ๋งค์šฐ ํ›Œ๋ฅญํ•œ ์ž…๋ ฅ ํŠน์ง• ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค. ์‚ฌ๋žŒ์˜ ์† ํฌ์ฆˆ ๊ฒ€์ถœ ์—ฐ๊ตฌ๋ฅผ ์‹ค์ œ ์‹œ์Šคํ…œ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ์ •ํ™•๋„, ์‹ค์‹œ๊ฐ„์„ฑ, ๋‹ค์–‘ํ•œ ๊ธฐ๊ธฐ์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋„๋ก ๊ฐ€๋ฒผ์šด ๋ชจ๋ธ์ด ํ•„์š”ํ•˜๊ณ , ์ด๊ฒƒ์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ•™์Šตํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š”๋ฐ์—๋Š” ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”๋กœ ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์‚ฌ๋žŒ ์† ํฌ์ฆˆ๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ธฐ๊ณ„๋“ค์ด ๊ฝค ๋ถˆ์•ˆ์ •ํ•˜๊ณ , ์ด ๊ธฐ๊ณ„๋“ค์„ ์žฅ์ฐฉํ•˜๊ณ  ์žˆ๋Š” ์ด๋ฏธ์ง€๋Š” ์‚ฌ๋žŒ ์† ํ”ผ๋ถ€ ์ƒ‰๊ณผ๋Š” ๋งŽ์ด ๋‹ฌ๋ผ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ธฐ๊ฐ€ ์ ์ ˆํ•˜์ง€ ์•Š๋‹ค. ๊ทธ๋Ÿฌ๊ธฐ ๋•Œ๋ฌธ์— ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ธ๊ณต์ ์œผ๋กœ ๋งŒ๋“ค์–ด๋‚ธ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ๊ฐ€๊ณต ๋ฐ ์ฆ๋Ÿ‰ํ•˜์—ฌ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ณ , ๊ทธ๊ฒƒ์„ ํ†ตํ•ด ๋” ์ข‹์€ ํ•™์Šต์„ฑ๊ณผ๋ฅผ ์ด๋ฃจ๋ ค๊ณ  ํ•œ๋‹ค. ์ธ๊ณต์ ์œผ๋กœ ๋งŒ๋“ค์–ด๋‚ธ ์‚ฌ๋žŒ ์† ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋“ค์€ ์‹ค์ œ ์‚ฌ๋žŒ ์† ํ”ผ๋ถ€์ƒ‰๊ณผ๋Š” ๋น„์Šทํ• ์ง€์–ธ์ • ๋””ํ…Œ์ผํ•œ ํ…์Šค์ณ๊ฐ€ ๋งŽ์ด ๋‹ฌ๋ผ, ์‹ค์ œ๋กœ ์ธ๊ณต ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ์€ ์‹ค์ œ ์† ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋Šฅ์ด ํ˜„์ €ํžˆ ๋งŽ์ด ๋–จ์–ด์ง„๋‹ค. ์ด ๋‘ ๋ฐ์ดํƒ€์˜ ๋„๋ฉ”์ธ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ ์ฒซ๋ฒˆ์งธ๋กœ๋Š” ์‚ฌ๋žŒ์†์˜ ๊ตฌ์กฐ๋ฅผ ๋จผ์ € ํ•™์Šต ์‹œํ‚ค๊ธฐ์œ„ํ•ด, ์† ๋ชจ์…˜์„ ์žฌ๊ฐ€๊ณตํ•˜์—ฌ ๊ทธ ์›€์ง์ž„ ๊ตฌ์กฐ๋ฅผ ํ•™์Šคํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๋ฅผ ๋บ€ ๋‚˜๋จธ์ง€๋งŒ ์‹ค์ œ ์† ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ํ•™์Šตํ•˜์˜€๊ณ  ํฌ๊ฒŒ ํšจ๊ณผ๋ฅผ ๋‚ด์—ˆ๋‹ค. ์ด๋•Œ ์‹ค์ œ ์‚ฌ๋žŒ ์†๋ชจ์…˜์„ ๋ชจ๋ฐฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ๋Š” ๋‘ ๋„๋ฉ”์ธ์ด ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฅผ ๋„คํŠธ์›Œํฌ ํ”ผ์ณ ๊ณต๊ฐ„์—์„œ align์‹œ์ผฐ๋‹ค. ๊ทธ๋ฟ๋งŒ์•„๋‹ˆ๋ผ ์ธ๊ณต ํฌ์ฆˆ๋ฅผ ํŠน์ • ๋ฐ์ดํ„ฐ๋“ค๋กœ augmentํ•˜์ง€ ์•Š๊ณ  ๋„คํŠธ์›Œํฌ๊ฐ€ ๋งŽ์ด ๋ณด์ง€ ๋ชปํ•œ ํฌ์ฆˆ๊ฐ€ ๋งŒ๋“ค์–ด์ง€๋„๋ก ํ•˜๋‚˜์˜ ํ™•๋ฅ  ๋ชจ๋ธ๋กœ์„œ ์„ค์ •ํ•˜์—ฌ ๊ทธ๊ฒƒ์—์„œ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ธ๊ณต ๋ฐ์ดํ„ฐ๋ฅผ ๋” ํšจ๊ณผ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ annotation์ด ์–ด๋ ค์šด ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋ชจ์œผ๋Š” ์ˆ˜๊ณ ์Šค๋Ÿฌ์›€ ์—†์ด ์ธ๊ณต ๋ฐ์ดํ„ฐ๋“ค์„ ๋” ํšจ๊ณผ์ ์œผ๋กœ ๋งŒ๋“ค์–ด ๋‚ด๋Š” ๊ฒƒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋” ์•ˆ์ „ํ•˜๊ณ  ์ง€์—ญ์  ํŠน์ง•๊ณผ ์‹œ๊ฐ„์  ํŠน์ง•์„ ํ™œ์šฉํ•ด์„œ ํฌ์ฆˆ์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ–ˆ๋‹ค. ๋˜ํ•œ, ๋„คํŠธ์›Œํฌ๊ฐ€ ์Šค์Šค๋กœ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„์„œ ํ•™์Šตํ• ์ˆ˜ ์žˆ๋Š” ์ž๋™ ๋ฐ์ดํ„ฐ ์ฆ๋Ÿ‰ ๋ฐฉ๋ฒ•๋ก ๋„ ํ•จ๊ป˜ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ ‡๊ฒŒ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ๊ฒฐํ•ฉํ•ด์„œ ๋” ๋‚˜์€ ์† ํฌ์ฆˆ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ํ•  ์ˆ˜ ์žˆ๋‹ค.1. Introduction 1 2. Related Works 14 3. Preliminaries: 3D Hand Mesh Model 27 4. SeqHAND: RGB-sequence-based 3D Hand Pose and Shape Estimation 31 5. Hand Pose Auto-Augment 66 6. Conclusion 85 Abstract (Korea) 101 ๊ฐ์‚ฌ์˜ ๊ธ€ 103๋ฐ•

    Multi-Object Detection, Pose Estimation and Tracking in Panoramic Monocular Imagery for Autonomous Vehicle Perception

    Get PDF
    While active sensing such as radars, laser-based ranging (LiDAR) and ultrasonic sensors are nearly ubiquitous in modern autonomous vehicle prototypes, cameras are more versatile because they are nonetheless essential for tasks such as road marking detection and road sign reading. Active sensing technologies are widely used because active sensors are, by nature, usually more reliable than cameras to detect objects, however they are lower resolution, break in challenging environmental conditions such as rain and heavy reflections, as well as materials such as black paint. Therefore, in this work, we focus primarily on passive sensing technologies. More specifically, we look at monocular imagery and to what extent, it can be used as replacement for more complex sensing systems such as stereo, multi-view cameras and LiDAR. Whilst the main strength of LiDAR is its ability to measure distances and naturally enable 3D reasoning; in contrast, camera-based object detection is typically restricted to the 2D image space. We propose a convolutional neural network extending object detection to estimate the 3D pose and velocity of objects from a single monocular camera. Our approach is based on a siamese neural network able to process pair of video frames to integrate temporal information. While the prior work has focused almost exclusively on the processing of forward-facing rectified rectilinear vehicle mounted cameras, there are no studies of panoramic imagery in the context of autonomous driving. We introduce an approach to adapt existing convolutional neural networks to unseen 360ยฐ panoramic imagery using domain adaptation via style transfer. We also introduce a new synthetic evaluation dataset and benchmark for 3D object detection and depth estimation in automotive panoramic imagery. Multi-object tracking-by-detection is often split into two parts: a detector and a tracker. In contrast, we investigate the use of end-to-end recurrent convolutional networks to process automotive video sequences to jointly detect and track objects through time. We present a multitask neural network able to track online the 3D pose of objects in panoramic video sequences. Our work highlights that monocular imagery, in conjunction with the proposed algorithmic approaches, can offer an effective replacement for more expensive active sensors to estimate depth, to estimate and track the 3D pose of objects surrounding the ego-vehicle; thus demonstrating that autonomous driving could be achieved using a limited number of cameras or even a single 360ยฐ panoramic camera, akin to a human driver perception

    Vertical Optimizations of Convolutional Neural Networks for Embedded Systems

    Get PDF
    L'abstract รจ presente nell'allegato / the abstract is in the attachmen

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    Multi-task near-field perception for autonomous driving using surround-view fisheye cameras

    Get PDF
    Die Bildung der Augen fรผhrte zum Urknall der Evolution. Die Dynamik รคnderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch Mรคngel. Der Mensch hat รผber Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser Fรคhigkeiten fรผr Computer ist entscheidend fรผr verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented Realitรคt und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360ยฐ Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. Jรผngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lรถsungen fรผr die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die fรผr die Entwicklung von Echtzeit-Anwendungen zur Verfรผgung steht. Aufgrund dieses Engpasses kommt es hรคufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer Rechenkomplexitรคt fรผr verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur รœberwindung von Rechenengpรคssen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360ยฐ coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks

    Localization, Mapping and SLAM in Marine and Underwater Environments

    Get PDF
    The use of robots in marine and underwater applications is growing rapidly. These applications share the common requirement of modeling the environment and estimating the robotsโ€™ pose. Although there are several mapping, SLAM, target detection and localization methods, marine and underwater environments have several challenging characteristics, such as poor visibility, water currents, communication issues, sonar inaccuracies or unstructured environments, that have to be considered. The purpose of this Special Issue is to present the current research trends in the topics of underwater localization, mapping, SLAM, and target detection and localization. To this end, we have collected seven articles from leading researchers in the field, and present the different approaches and methods currently being investigated to improve the performance of underwater robots
    corecore