39,408 research outputs found
HDRFusion:HDR SLAM using a low-cost auto-exposure RGB-D sensor
We describe a new method for comparing frame appearance in a frame-to-model
3-D mapping and tracking system using an low dynamic range (LDR) RGB-D camera
which is robust to brightness changes caused by auto exposure. It is based on a
normalised radiance measure which is invariant to exposure changes and not only
robustifies the tracking under changing lighting conditions, but also enables
the following exposure compensation perform accurately to allow online building
of high dynamic range (HDR) maps. The latter facilitates the frame-to-model
tracking to minimise drift as well as better capturing light variation within
the scene. Results from experiments with synthetic and real data demonstrate
that the method provides both improved tracking and maps with far greater
dynamic range of luminosity.Comment: 14 page
Semantic MapNet: Building Allocentric SemanticMaps and Representations from Egocentric Views
We study the task of semantic mapping - specifically, an embodied agent (a
robot or an egocentric AI assistant) is given a tour of a new environment and
asked to build an allocentric top-down semantic map ("what is where?") from
egocentric observations of an RGB-D camera with known pose (via localization
sensors). Towards this goal, we present SemanticMapNet (SMNet), which consists
of: (1) an Egocentric Visual Encoder that encodes each egocentric RGB-D frame,
(2) a Feature Projector that projects egocentric features to appropriate
locations on a floor-plan, (3) a Spatial Memory Tensor of size floor-plan
length x width x feature-dims that learns to accumulate projected egocentric
features, and (4) a Map Decoder that uses the memory tensor to produce semantic
top-down maps. SMNet combines the strengths of (known) projective camera
geometry and neural representation learning. On the task of semantic mapping in
the Matterport3D dataset, SMNet significantly outperforms competitive baselines
by 4.01-16.81% (absolute) on mean-IoU and 3.81-19.69% (absolute) on Boundary-F1
metrics. Moreover, we show how to use the neural episodic memories and
spatio-semantic allocentric representations build by SMNet for subsequent tasks
in the same space - navigating to objects seen during the tour("Find chair") or
answering questions about the space ("How many chairs did you see in the
house?")
HDRFusion:HDR SLAM using a low-cost auto-exposure RGB-D sensor
We describe a new method for comparing frame appearance in a frame-to-model
3-D mapping and tracking system using an low dynamic range (LDR) RGB-D camera
which is robust to brightness changes caused by auto exposure. It is based on a
normalised radiance measure which is invariant to exposure changes and not only
robustifies the tracking under changing lighting conditions, but also enables
the following exposure compensation perform accurately to allow online building
of high dynamic range (HDR) maps. The latter facilitates the frame-to-model
tracking to minimise drift as well as better capturing light variation within
the scene. Results from experiments with synthetic and real data demonstrate
that the method provides both improved tracking and maps with far greater
dynamic range of luminosity.Comment: 14 page
3D measurements in conventional X-ray imaging with RGB-D sensors
[EN] A method for deriving 3D internal information in conventional X-ray settings is presented. It is based on the combination of a pair of radiographs from a patient and it avoids the use of X-ray-opaque fiducials and external reference structures. To achieve this goal, we augment an ordinary X-ray device with a consumer RGB-D camera. The patient' s rotation around the craniocaudal axis is tracked relative to this camera thanks to the depth information provided and the application of a modern surface-mapping algorithm. The measured spatial information is then translated to the reference frame of the X-ray imaging system. By using the intrinsic parameters of the diagnostic equipment, epipolar geometry, and X-ray images of the patient at different angles, 3D internal positions can be obtained. Both the RGB-D and Xray instruments are first geometrically calibrated to find their joint spatial transformation. The proposed method is applied to three rotating phantoms. The first two consist of an anthropomorphic head and a torso, which are filled with spherical lead bearings at precise locations. The third one is made of simple foam and has metal needles of several known lengths embedded in it. The results show that it is possible to resolve anatomical positions and lengths with a millimetric level of precision. With the proposed approach, internal 3D reconstructed coordinates and distances can be provided to the physician. It also contributes to reducing the invasiveness of ordinary X-ray environments and can replace other types of clinical explorations that are mainly aimed at measuring or geometrically relating elements that are present inside the patient's body.(C) 2017 IPEM. Published by Elsevier Ltd. All rights reserved.The authors would like to thank the Radiation Oncology Department of the Physics Section at La Fe Hospital for the anthropomorphic phantom used in this work and Jose Manuel Monserrate (Instituto de Física Corpuscular) for his contribution in the development of the calibration frame shown in Fig. 3. This research has the support of Information Storage S.L., University of Valencia (grant CPI-15-170), CSD-2007-00042 Con solider Ingenio CPAN (grant CPAN-13TR01), IFIC (Severo Ochoa Centre of Excellence SEV20140398) as well as the support of the Spanish Ministry of Industry, Energy, and Tourism (grant TSI1001012013019).Albiol Colomer, F.; Corbi, A.; Albiol Colomer, A. (2017). 3D measurements in conventional X-ray imaging with RGB-D sensors. Medical Engineering & Physics. 42:73-79. https://doi.org/10.1016/j.medengphy.2017.01.024S73794
Creating Simplified 3D Models with High Quality Textures
This paper presents an extension to the KinectFusion algorithm which allows
creating simplified 3D models with high quality RGB textures. This is achieved
through (i) creating model textures using images from an HD RGB camera that is
calibrated with Kinect depth camera, (ii) using a modified scheme to update
model textures in an asymmetrical colour volume that contains a higher number
of voxels than that of the geometry volume, (iii) simplifying dense polygon
mesh model using quadric-based mesh decimation algorithm, and (iv) creating and
mapping 2D textures to every polygon in the output 3D model. The proposed
method is implemented in real-time by means of GPU parallel processing.
Visualization via ray casting of both geometry and colour volumes provides
users with a real-time feedback of the currently scanned 3D model. Experimental
results show that the proposed method is capable of keeping the model texture
quality even for a heavily decimated model and that, when reconstructing small
objects, photorealistic RGB textures can still be reconstructed.Comment: 2015 International Conference on Digital Image Computing: Techniques
and Applications (DICTA), Page 1 -
Visual Perception For Robotic Spatial Understanding
Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability.
Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently.
We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet
Non-iterative RGB-D-inertial Odometry
This paper presents a non-iterative solution to RGB-D-inertial odometry
system. Traditional odometry methods resort to iterative algorithms which are
usually computationally expensive or require well-designed initialization. To
overcome this problem, this paper proposes to combine a non-iterative front-end
(odometry) with an iterative back-end (loop closure) for the RGB-D-inertial
SLAM system. The main contribution lies in the novel non-iterative front-end,
which leverages on inertial fusion and kernel cross-correlators (KCC) to match
point clouds in frequency domain. Dominated by the fast Fourier transform
(FFT), our method is only of complexity , where is
the number of points. Map fusion is conducted by element-wise operations, so
that both time and space complexity are further reduced. Extensive experiments
show that, due to the lightweight of the proposed front-end, the framework is
able to run at a much faster speed yet still with comparable accuracy with the
state-of-the-arts
RGBDTAM: A Cost-Effective and Accurate RGB-D Tracking and Mapping System
Simultaneous Localization and Mapping using RGB-D cameras has been a fertile
research topic in the latest decade, due to the suitability of such sensors for
indoor robotics. In this paper we propose a direct RGB-D SLAM algorithm with
state-of-the-art accuracy and robustness at a los cost. Our experiments in the
RGB-D TUM dataset [34] effectively show a better accuracy and robustness in CPU
real time than direct RGB-D SLAM systems that make use of the GPU. The key
ingredients of our approach are mainly two. Firstly, the combination of a
semi-dense photometric and dense geometric error for the pose tracking (see
Figure 1), which we demonstrate to be the most accurate alternative. And
secondly, a model of the multi-view constraints and their errors in the mapping
and tracking threads, which adds extra information over other approaches. We
release the open-source implementation of our approach 1 . The reader is
referred to a video with our results 2 for a more illustrative visualization of
its performance
- …