Search CORE

7,292 research outputs found

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Author: Cadena Cesar
Carlone Luca
Carrillo Henry
Latif Yasir
Leonard John J.
Neira Jose
Reid Ian
Scaramuzza Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

DSpace@MIT

Adelaide Research & Scholarship

ZORA

ADFactory: An Effective Framework for Generalizing Optical Flow with Nerf

Author: Ling Han
Publication venue
Publication date: 14/11/2023
Field of study

A significant challenge facing current optical flow methods is the difficulty in generalizing them well to the real world. This is mainly due to the high cost of hand-crafted datasets, and existing self-supervised methods are limited by indirect loss and occlusions, resulting in fuzzy outcomes. To address this challenge, we introduce a novel optical flow training framework: automatic data factory (ADF). ADF only requires RGB images as input to effectively train the optical flow network on the target data domain. Specifically, we use advanced Nerf technology to reconstruct scenes from photo groups collected by a monocular camera, and then calculate optical flow labels between camera pose pairs based on the rendering results. To eliminate erroneous labels caused by defects in the scene reconstructed by Nerf, we screened the generated labels from multiple aspects, such as optical flow matching accuracy, radiation field confidence, and depth consistency. The filtered labels can be directly used for network supervision. Experimentally, the generalization ability of ADF on KITTI surpasses existing self-supervised optical flow and monocular scene flow algorithms. In addition, ADF achieves impressive results in real-world zero-point generalization evaluations and surpasses most supervised methods.Comment: 8 page

arXiv.org e-Print Archive

Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM

Author: Bao Hujun
Chen Danpeng
Chu Guanyi
Li Hai
Qian Quanhao
Wang Nan
Xie Weijian
Yu Yihao
Zhai Shangjin
Zhang Guofeng
Publication venue
Publication date: 20/09/2023
Field of study

Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense mapping can be performed online even on a mobile phone. Specifically, we present a specifically optimized multi-basis depth completion network, called BBC-Net, tailored to the characteristics of traditional sparse SLAM systems. BBC-Net can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems. The final depth is a linear combination of predicted depth bases that can be optimized by tuning the corresponding weights. To seamlessly incorporate the weights into traditional SLAM optimization and ensure efficiency and robustness, we design a set of depth weight factors, which makes our network a versatile plug-in module, facilitating easy integration into various existing sparse SLAM systems and significantly enhancing global depth consistency through bundle adjustment. To verify the portability of our method, we integrate BBC-Net into two representative SLAM systems. The experimental results on various datasets show that the proposed method achieves better performance in monocular dense mapping than the state-of-the-art methods. We provide an online demo running on a mobile phone, which verifies the efficiency and mapping quality of the proposed method in real-world scenarios

arXiv.org e-Print Archive

Automatic Scaffolding Productivity Measurement through Deep Learning

Author: Ying Wenzheng
Publication venue: Curtin University
Publication date: 01/01/2021
Field of study

This study developed a method to automatically measure scaffolding productivity by extracting and analysing semantic information from onsite vision data

espace@Curtin

Survey on Controlable Image Synthesis with Deep Learning

Author: Li Jiao
Yang Lu
Zhang Shixiong
Publication venue
Publication date: 18/07/2023
Field of study

Image synthesis has attracted emerging research interests in academic and industry communities. Deep learning technologies especially the generative models greatly inspired controllable image synthesis approaches and applications, which aim to generate particular visual contents with latent prompts. In order to further investigate low-level controllable image synthesis problem which is crucial for fine image rendering and editing tasks, we present a survey of some recent works on 3D controllable image synthesis using deep learning. We first introduce the datasets and evaluation indicators for 3D controllable image synthesis. Then, we review the state-of-the-art research for geometrically controllable image synthesis in two aspects: 1) Viewpoint/pose-controllable image synthesis; 2) Structure/shape-controllable image synthesis. Furthermore, the photometrically controllable image synthesis approaches are also reviewed for 3D re-lighting researches. While the emphasis is on 3D controllable image synthesis algorithms, the related applications, products and resources are also briefly summarized for practitioners.Comment: 19 pages, 17 figure

arXiv.org e-Print Archive

On Body Mass Index Analysis from Human Visual Appearance

Author: Jiang Min
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2020
Field of study

In the past few decades, overweight and obesity are spreading widely like an epidemic. Generally, a person is considered overweight by body mass index (BMI). In addition to a body fat measurement, BMI is also a risk factor for many diseases, such as cardiovascular diseases, cancers and diabetes, etc. Therefore, BMI is important for personal health monitoring and medical research. Currently, BMI is measured in person with special devices. It is an urgent demand to explore conveniently preventive tools. This work investigates the feasibility of analyzing BMI from human visual appearances, including 2-dimensional (2D)/3-dimensional (3D) body and face data. Motivated by health science studies which have shown that anthropometric measures, such as waist-hip ratio, waist circumference, etc., are indicators for obesity, we analyze body weight from frontal view human body images. A framework is developed for body weight analysis from body images, along with the computation methods of five anthropometric features for body weight characterization. Then, we study BMI estimation from the 3D data by measuring the correlation between the estimated body volume and BMIs, and develop an efficient BMI computation method which consists of body weight and height estimation from normally dressed people in 3D space. We also intensively study BMI estimation from frontal view face images via two key aspects: facial representation extracting and BMI estimator learning. First, we investigate the visual BMI estimation problem from the aspect of the characteristics and performance of different facial representation extracting methods by three designed experiments. Then we study visual BMI estimation from facial images by a two-stage learning framework. BMI related facial features are learned in the first stage. To address the ambiguity of BMI labels, a label distribution based BMI estimator is proposed for the second stage. The experimental results show that this framework improves the performance step by step. Finally, to address the challenges caused by BMI data and labels, we integrate feature learning and estimator learning in one convolutional neural network (CNN). A label assignment matching scheme is proposed which successfully achieves an improvement in BMI estimation from face images

The Research Repository @ WVU (West Virginia University)

Wide baseline pose estimation from video with a density-based uncertainty model

Author: Aldea Emanuel
Le Hégarat-Mascle Sylvie
Pellicanò Nicola
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/06/2019
Field of study

International audienceRobust wide baseline pose estimation is an essential step in the deployment of smart camera networks. In this work, we highlight some current limitations of conventional strategies for relative pose estimation in difficult urban scenes. Then, we propose a solution which relies on an adaptive search of corresponding interest points in synchronized video streams which allows us to converge robustly toward a high-quality solution. The core idea of our algorithm is to build across the image space a nonstationary mapping of the local pose estimation uncertainty, based on the spatial distribution of interest points. Subsequently, the mapping guides the selection of new observations from the video stream in order to prioritize the coverage of areas of high uncertainty. With an additional step in the initial stage, the proposed algorithm may also be used for refining an existing pose estimation based on the video data; this mode allows for performing a data-driven self-calibration task for stereo rigs for which accuracy is critical, such as onboard medical or vehicular systems. We validate our method on three different datasets which cover typical scenarios in pose estimation. The results show a fast and robust convergence of the solution, with a significant improvement, compared to single image-based alternatives, of the RMSE of ground-truth matches, and of the maximum absolute error

Hal-Diderot