2,060 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Local Motion Planner for Autonomous Navigation in Vineyards with a RGB-D Camera-Based Algorithm and Deep Learning Synergy

    Get PDF
    With the advent of agriculture 3.0 and 4.0, researchers are increasingly focusing on the development of innovative smart farming and precision agriculture technologies by introducing automation and robotics into the agricultural processes. Autonomous agricultural field machines have been gaining significant attention from farmers and industries to reduce costs, human workload, and required resources. Nevertheless, achieving sufficient autonomous navigation capabilities requires the simultaneous cooperation of different processes; localization, mapping, and path planning are just some of the steps that aim at providing to the machine the right set of skills to operate in semi-structured and unstructured environments. In this context, this study presents a low-cost local motion planner for autonomous navigation in vineyards based only on an RGB-D camera, low range hardware, and a dual layer control algorithm. The first algorithm exploits the disparity map and its depth representation to generate a proportional control for the robotic platform. Concurrently, a second back-up algorithm, based on representations learning and resilient to illumination variations, can take control of the machine in case of a momentaneous failure of the first block. Moreover, due to the double nature of the system, after initial training of the deep learning model with an initial dataset, the strict synergy between the two algorithms opens the possibility of exploiting new automatically labeled data, coming from the field, to extend the existing model knowledge. The machine learning algorithm has been trained and tested, using transfer learning, with acquired images during different field surveys in the North region of Italy and then optimized for on-device inference with model pruning and quantization. Finally, the overall system has been validated with a customized robot platform in the relevant environment

    Island Loss for Learning Discriminative Features in Facial Expression Recognition

    Full text link
    Over the past few years, Convolutional Neural Networks (CNNs) have shown promise on facial expression recognition. However, the performance degrades dramatically under real-world settings due to variations introduced by subtle facial appearance changes, head pose variations, illumination changes, and occlusions. In this paper, a novel island loss is proposed to enhance the discriminative power of the deeply learned features. Specifically, the IL is designed to reduce the intra-class variations while enlarging the inter-class differences simultaneously. Experimental results on four benchmark expression databases have demonstrated that the CNN with the proposed island loss (IL-CNN) outperforms the baseline CNN models with either traditional softmax loss or the center loss and achieves comparable or better performance compared with the state-of-the-art methods for facial expression recognition.Comment: 8 pages, 3 figure

    Pedestrian Attribute Recognition: A Survey

    Full text link
    Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

    Convolutional Neural Networks - Generalizability and Interpretations

    Get PDF

    Real-time classification of vehicle types within infra-red imagery.

    Get PDF
    Real-time classification of vehicles into sub-category types poses a significant challenge within infra-red imagery due to the high levels of intra-class variation in thermal vehicle signatures caused by aspects of design, current operating duration and ambient thermal conditions. Despite these challenges, infra-red sensing offers significant generalized target object detection advantages in terms of all-weather operation and invariance to visual camouflage techniques. This work investigates the accuracy of a number of real-time object classification approaches for this task within the wider context of an existing initial object detection and tracking framework. Specifically we evaluate the use of traditional feature-driven bag of visual words and histogram of oriented gradient classification approaches against modern convolutional neural network architectures. Furthermore, we use classical photogrammetry, within the context of current target detection and classification techniques, as a means of approximating 3D target position within the scene based on this vehicle type classification. Based on photogrammetric estimation of target position, we then illustrate the use of regular Kalman filter based tracking operating on actual 3D vehicle trajectories. Results are presented using a conventional thermal-band infra-red (IR) sensor arrangement where targets are tracked over a range of evaluation scenarios

    ๋”ฅ๋Ÿฌ๋‹์— ๊ธฐ์ดˆํ•œ ํšจ๊ณผ์ ์ธ Visual Odometry ๊ฐœ์„  ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์ด๋ฒ”ํฌ.Understanding the three-dimensional environment is one of the most important issues in robotics and computer vision. For this purpose, sensors such as a lidar, a ultrasound, infrared devices, an inertial measurement unit (IMU) and cameras are used, individually or simultaneously, through sensor fusion. Among these sensors, in recent years, researches for use of visual sensors, which can obtain a lot of information at a low price, have been actively underway. Understanding of the 3D environment using cameras includes depth restoration, optical/scene flow estimation, and visual odometry (VO). Among them, VO estimates location of a camera and maps the surrounding environment, while a camera-equipped robot or person travels. This technology must be preceded by other tasks such as path planning and collision avoidance. Also, it can be applied to practical applications such as autonomous driving, augmented reality (AR), unmanned aerial vehicle (UAV) control, and 3D modeling. So far, researches on various VO algorithms have been proposed. Initial VO researches were conducted by filtering poses of robot and map features. Because of the disadvantage of the amount of computation being too large and errors are accumulated, a method using a keyframe was studied. Traditional VO can be divided into a feature-based method and a direct method. Methods using features obtain pose transformation between two images through feature extraction and matching. Direct methods directly compare the intensity of image pixels to obtain poses that minimize the sum of photometric errors. Recently, due to the development of deep learning skills, many studies have been conducted to apply deep learning to VO. Deep learning-based VO, like other fields using deep learning with images, first extracts convolutional neural network (CNN) features and calculates pose transformation between images. Deep learning-based VO can be divided into supervised learning-based and unsupervised learning-based. For VO, using supervised learning, a neural network is trained using ground truth poses, and the unsupervised learning-based method learns poses using only image sequences without given ground truth values. While existing research papers show decent performance, the image datasets used in these studies are all composed of high quality and clear images obtained using expensive cameras. There are also algorithms that can be operated only if non-image information such as exposure time, nonlinear response functions, and camera parameters is provided. In order for VO to be more widely applied to real-world application problems, odometry estimation should be performed even if the datasets are incomplete. Therefore, in this dissertation, two methods are proposed to improve VO performance using deep learning. First, I adopt a super-resolution (SR) technique to improve the performance of VO using images with low-resolution and noises. The existing SR techniques have mainly focused on increasing image resolution rather than execution time. However, a real-time property is very important for VO. Therefore, the SR network should be designed considering the execution time, resolution increment, and noise reduction in this case. Conducting a VO after passing through this SR network, a higher performance VO can be carried out, than using original images. Experimental results using the TUM dataset show that the proposed method outperforms the conventional VO and other SR methods. Second, I propose a fully unsupervised learning-based VO that performs odometry estimation, single-view depth estimation, and camera intrinsic parameter estimation simultaneously using a dataset consisting only of image sequences. In the existing unsupervised learning-based VO, algorithms were performed using the images and intrinsic parameters of the camera. Based on existing the technique, I propose a method for additionally estimating camera parameters from the deep intrinsic network. Intrinsic parameters are estimated by two assumptions using the properties of camera parameters in an intrinsic network. Experiments using the KITTI dataset show that the results are comparable to those of the conventional method.3์ฐจ์› ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ์ดํ•ด๋Š” ๋กœ๋ณดํ‹ฑ์Šค์™€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ๊ต‰์žฅํžˆ ์ค‘์š”ํ•œ ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ผ์ด๋‹ค, ์ดˆ์ŒํŒŒ, ์ ์™ธ์„ , inertial measurement unit (IMU), ์นด๋ฉ”๋ผ ๋“ฑ์˜ ์„ผ์„œ๊ฐ€ ๊ฐœ๋ณ„์ ์œผ๋กœ ๋˜๋Š” ์„ผ์„œ ์œตํ•ฉ์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ์„ผ์„œ๊ฐ€ ๋™์‹œ์— ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•œ๋‹ค. ์ด ์ค‘์—์„œ๋„ ์ตœ๊ทผ์—๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์ €๋ ดํ•œ ๊ฐ€๊ฒฉ์— ๋งŽ์€ ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์นด๋ฉ”๋ผ๋ฅผ ์ด์šฉํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ์นด๋ฉ”๋ผ๋ฅผ ์ด์šฉํ•œ 3์ฐจ์› ํ™˜๊ฒฝ ์ธ์ง€๋Š” ๊นŠ์ด ๋ณต์›, optical/scene flow ์ถ”์ •, visual odometry (VO) ๋“ฑ์ด ์žˆ๋‹ค. ์ด ์ค‘ VO๋Š” ์นด๋ฉ”๋ผ๋ฅผ ์žฅ์ฐฉํ•œ ๋กœ๋ด‡ ํ˜น์€ ์‚ฌ๋žŒ์ด ์ด๋™ํ•˜๋ฉฐ ์ž์‹ ์˜ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•˜๊ณ  ์ฃผ๋ณ€ ํ™˜๊ฒฝ์˜ ์ง€๋„๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค. ์ด ๊ธฐ์ˆ ์€ ๊ฒฝ๋กœ ์„ค์ •, ์ถฉ๋Œ ํšŒํ”ผ ๋“ฑ ๋‹ค๋ฅธ ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์ „์— ํ•„์ˆ˜์ ์œผ๋กœ ์„ ํ–‰๋˜์–ด์•ผ ํ•˜๋ฉฐ ์ž์œจ ์ฃผํ–‰, AR, UAV contron, 3D modelling ๋“ฑ ์‹ค์ œ ์‘์šฉ ๋ฌธ์ œ์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ํ˜„์žฌ ๋‹ค์–‘ํ•œ VO ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ๋…ผ๋ฌธ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ดˆ๊ธฐ VO ์—ฐ๊ตฌ๋Š” feature๋ฅผ ์ด์šฉํ•˜์—ฌ feature์™€ ๋กœ๋ด‡์˜ pose๋ฅผ ํ•„ํ„ฐ๋ง ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ํ•„ํ„ฐ๋ฅผ ์ด์šฉํ•œ ๋ฐฉ๋ฒ•์€ ๊ณ„์‚ฐ๋Ÿ‰์ด ๋„ˆ๋ฌด ๋งŽ๊ณ  ์˜ค์ฐจ๊ฐ€ ๋ˆ„์ ๋œ๋‹ค๋Š” ๋‹จ์  ๋•Œ๋ฌธ์— keyframe์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์—ฐ๊ตฌ๋˜์—ˆ๋‹ค. ์ด ๋ฐฉ์‹์œผ๋กœ feature๋ฅผ ์ด์šฉํ•˜๋Š” ๋ฐฉ์‹๊ณผ ํ”ฝ์…€์˜ intensity๋ฅผ ์ง์ ‘ ์‚ฌ์šฉํ•˜๋Š” direct ๋ฐฉ์‹์ด ์—ฐ๊ตฌ๋˜์—ˆ๋‹ค. feature๋ฅผ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์€ feature์˜ ์ถ”์ถœ๊ณผ ๋งค์นญ์„ ์ด์šฉํ•˜์—ฌ ๋‘ ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ pose ๋ณ€ํ™”๋ฅผ ๊ตฌํ•˜๋ฉฐ direct ๋ฐฉ๋ฒ•๋“ค์€ ์ด๋ฏธ์ง€ ํ”ฝ์…€์˜ intensity๋ฅผ ์ง์ ‘ ๋น„๊ตํ•˜์—ฌ photometric error๋ฅผ ์ตœ์†Œํ™” ์‹œํ‚ค๋Š” pose๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ตœ๊ทผ์—๋Š” deep learning ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ฐœ๋‹ฌ๋กœ ์ธํ•ด VO์—๋„ deep learning์„ ์ ์šฉ์‹œํ‚ค๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. Deep learning-based VO๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ด์šฉํ•œ ๋‹ค๋ฅธ ๋ถ„์•ผ์™€ ๊ฐ™์ด ๊ธฐ๋ณธ์ ์œผ๋กœ CNN์„ ์ด์šฉํ•˜์—ฌ feature๋ฅผ ์ถ”์ถœํ•œ ๋’ค ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ pose ๋ณ€ํ™”๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋Š” ๋‹ค์‹œ supervised learning์„ ์ด์šฉํ•œ ๋ฐฉ์‹๊ณผ unsupervised learning์„ ์ด์šฉํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. supervised learning์„ ์ด์šฉํ•œ VO๋Š” pose์˜ ์ฐธ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์„ ์‹œํ‚ค๋ฉฐ, unsupervised learning์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ฃผ์–ด์ง€๋Š” ์ฐธ๊ฐ’ ์—†์ด ์ด๋ฏธ์ง€์˜ ์ •๋ณด๋งŒ์„ ์ด์šฉํ•˜์—ฌ pose๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ์‹์ด๋‹ค. ๊ธฐ์กด VO ๋…ผ๋ฌธ๋“ค์€ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋œ ์ด๋ฏธ์ง€ dataset๋“ค์€ ๋ชจ๋‘ ๊ณ ๊ฐ€์˜ ์นด๋ฉ”๋ผ๋ฅผ ์ด์šฉํ•˜์—ฌ ์–ป์–ด์ง„ ๊ณ ํ™”์งˆ์˜ ์„ ๋ช…ํ•œ ์ด๋ฏธ์ง€๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ๋˜ํ•œ ๋…ธ์ถœ ์‹œ๊ฐ„, ๋น„์„ ํ˜• ๋ฐ˜์‘ ํ•จ์ˆ˜, ์นด๋ฉ”๋ผ ํŒŒ๋ผ๋ฏธํ„ฐ ๋“ฑ์˜ ์ด๋ฏธ์ง€ ์™ธ์ ์ธ ์ •๋ณด๋ฅผ ์ด์šฉํ•ด์•ผ๋งŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋™์ž‘์ด ๊ฐ€๋Šฅํ•˜๋‹ค. VO๊ฐ€ ์‹ค์ œ ์‘์šฉ ๋ฌธ์ œ์— ๋” ๋„๋ฆฌ ์ ์šฉ๋˜๊ธฐ ์œ„ํ•ด์„œ๋Š” dataset์ด ๋ถˆ์™„์ „ํ•  ๊ฒฝ์šฐ์—๋„ odometry ์ถ”์ •์ด ์ž˜ ์ด๋ฃจ์–ด์ ธ์•ผ ํ•œ๋‹ค. ์ด์— ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” deep learning์„ ์ด์šฉํ•˜์—ฌ VO์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ๋Š” super-resolution (SR) ๊ธฐ๋ฒ•์œผ๋กœ ์ €ํ•ด์ƒ๋„, ๋…ธ์ด์ฆˆ๊ฐ€ ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ด์šฉํ•œ VO์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด์˜ SR ๊ธฐ๋ฒ•์€ ์ˆ˜ํ–‰ ์‹œ๊ฐ„๋ณด๋‹ค๋Š” ์ด๋ฏธ์ง€์˜ ํ•ด์ƒ๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์— ์ฃผ๋กœ ์ง‘์ค‘ํ•˜์˜€๋‹ค. ํ•˜์ง€๋งŒ VO ์ˆ˜ํ–‰์— ์žˆ์–ด์„œ๋Š” ์‹ค์‹œ๊ฐ„์„ฑ์ด ๊ต‰์žฅํžˆ ์ค‘์š”ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•œ SR ๋„คํŠธ์›Œํฌ์˜ ์„ค๊ณ„ํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๊ณ  ๋…ธ์ด์ฆˆ๋ฅผ ์ค„์˜€๋‹ค. ์ด SR ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ต๊ณผ์‹œํ‚จ ๋’ค VO๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด ๊ธฐ์กด์˜ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ์˜ VO๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. TUM dataset์„ ์ด์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ ๊ธฐ์กด์˜ VO ๊ธฐ๋ฒ•๊ณผ ๋‹ค๋ฅธ SR ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์˜€์„ ๋•Œ ๋ณด๋‹ค ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ์„ฑ๋Šฅ์ด ๋” ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ๋Š” ์—ฐ์†๋œ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ๊ตฌ์„ฑ๋œ dataset์„ ์ด์šฉํ•˜์—ฌ VO, ๋‹จ์ผ ์ด๋ฏธ์ง€ ๊นŠ์ด ์ถ”์ •, ์นด๋ฉ”๋ผ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์„ ์ˆ˜ํ–‰ํ•˜๋Š” fully unsupervised learning-based VO๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด unsupervised learning์„ ์ด์šฉํ•œ VO์—์„œ๋Š” ์ด๋ฏธ์ง€๋“ค๊ณผ ์ด๋ฏธ์ง€๋ฅผ ์ดฌ์˜ํ•œ ์นด๋ฉ”๋ผ์˜ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ VO๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ด ๊ธฐ์ˆ ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” deep intrinsic ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์นด๋ฉ”๋ผ ํŒŒ๋ผ๋ฏธํ„ฐ๊นŒ์ง€ ๋„คํŠธ์›Œํฌ์—์„œ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. 0์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ฑฐ๋‚˜ ์‰ฝ๊ฒŒ ๋ฐœ์‚ฐํ•˜๋Š” intrinsic ๋„คํŠธ์›Œํฌ์— ์นด๋ฉ”๋ผ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์„ฑ์งˆ์„ ์ด์šฉํ•œ ๋‘ ๊ฐ€์ง€ ๊ฐ€์ •์„ ํ†ตํ•ด ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. KITTI dataset์„ ์ด์šฉํ•œ ์‹คํ—˜์„ ํ†ตํ•ด intrinsic parameter ์ •๋ณด๋ฅผ ์ œ๊ณต๋ฐ›์•„ ์ง„ํ–‰๋œ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.1 INTRODUCTION 1 1.1 Background and Motivation 1 1.2 Literature Review 3 1.3 Contributions 10 1.4 Thesis Structure 11 2 Mathematical Preliminaries of Visual Odometry 13 2.1 Feature-based VO 13 2.2 Direct VO 17 2.3 Learning-based VO 21 2.3.1 Supervised learning-based VO 22 2.3.2 Unsupervised learning-based VO 25 3 Error Improvement in Visual Odometry Using Super-resolution 29 3.1 Introduction 29 3.2 Related Work 31 3.2.1 Visual Odometry 31 3.2.2 Super-resolution 33 3.3 SR-VO 34 3.3.1 VO performance analysis according to changing resolution 34 3.3.2 Super-Resolution Network 37 3.4 Experiments 40 3.4.1 Super-Resolution Procedure 40 3.4.2 VO with SR images 42 3.5 Summary 54 4 Visual Odometry Enhancement Method Using Fully Unsupervised Learning 55 4.1 Introduction 55 4.2 Related Work 57 4.2.1 Traditional Visual Odometry 57 4.2.2 Single-view Depth Recovery 58 4.2.3 Supervised Learning-based Visual Odometry 59 4.2.4 Unsupervised Learning-based Visual Odometry 60 4.2.5 Architecture Overview 62 4.3 Methods 62 4.3.1 Predicting the Target Image using Source Images 62 4.3.2 Intrinsic Parameters Regressor 63 4.4 Experiments 66 4.4.1 Monocular Depth Estimation 66 4.4.2 Visual Odometry 67 4.4.3 Intrinsic Parameters Estimation 77 5 Conclusion and Future Work 82 5.1 Conclusion 82 5.2 Future Work 85 Bibliography 86 Abstract (In Korean) 101Docto
    • โ€ฆ
    corecore