42 research outputs found
Dense Point-Cloud Representation of a Scene using Monocular Vision
We present a three-dimensional (3-D) reconstruction system designed to support various autonomous navigation applications. The system presented focuses on the 3-D reconstruction of a scene using only a single moving camera. Utilizing video frames captured at different points in time allows us to determine the depths of a scene. In this way, the system can be used to construct a point-cloud model of its unknown surroundings.
We present the step-by-step methodology and analysis used in developing the 3-D reconstruction technique.
We present a reconstruction framework that generates a primitive point cloud, which is computed based on feature matching and depth triangulation analysis. To populate the reconstruction, we utilized optical flow features to create an extremely dense representation model. With the third algorithmic modification, we introduce the addition of the preprocessing step of nonlinear single-image super resolution. With this addition, the depth accuracy of the point cloud, which relies on precise disparity measurement, has significantly increased.
Our final contribution is an additional postprocessing step designed to filter noise points and mismatched features unveiling the complete dense point-cloud representation (DPR) technique. We measure the success of DPR by evaluating the visual appeal, density, accuracy, and computational expense and compare with two state-of-the-art techniques
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Soft, Round, High Resolution Tactile Fingertip Sensors for Dexterous Robotic Manipulation
High resolution tactile sensors are often bulky and have shape profiles that
make them awkward for use in manipulation. This becomes important when using
such sensors as fingertips for dexterous multi-fingered hands, where boxy or
planar fingertips limit the available set of smooth manipulation strategies.
High resolution optical based sensors such as GelSight have until now been
constrained to relatively flat geometries due to constraints on illumination
geometry.Here, we show how to construct a rounded fingertip that utilizes a
form of light piping for directional illumination. Our sensors can replace the
standard rounded fingertips of the Allegro hand.They can capture high
resolution maps of the contact surfaces,and can be used to support various
dexterous manipulation tasks
Precise Depth Image Based Real-Time 3D Difference Detection
3D difference detection is the task to verify whether the 3D geometry of a real object exactly corresponds to a 3D model of this object. This thesis introduces real-time 3D difference detection with a hand-held depth camera. In contrast to previous works, with the proposed approach, geometric differences can be detected in real time and from arbitrary viewpoints. Therefore, the scan position of the 3D difference detection be changed on the fly, during the 3D scan. Thus, the user can move the scan position closer to the object to inspect details or to bypass occlusions.
The main research questions addressed by this thesis are:
Q1: How can 3D differences be detected in real time and from arbitrary viewpoints using a single depth camera?
Q2: Extending the first question, how can 3D differences be detected with a high precision?
Q3: Which accuracy can be achieved with concrete setups of the proposed concept for real time, depth image based 3D difference detection?
This thesis answers Q1 by introducing a real-time approach for depth image based 3D difference detection. The real-time difference detection is based on an algorithm which maps the 3D measurements of a depth camera onto an arbitrary 3D model in real time by fusing computer vision (depth imaging and pose estimation) with a computer graphics based analysis-by-synthesis approach.
Then, this thesis answers Q2 by providing solutions for enhancing the 3D difference detection accuracy, both by precise pose estimation and by reducing depth measurement noise. A precise variant of the 3D difference detection concept is proposed, which combines two main aspects. First, the precision of the depth camera’s pose estimation is improved by coupling the depth camera with a very precise coordinate measuring machine. Second, measurement noise of the captured depth images is reduced and missing depth information is filled in by extending the 3D difference detection with 3D reconstruction.
The accuracy of the proposed 3D difference detection is quantified by a quantitative evaluation. This provides an anwer to Q3. The accuracy is evaluated both for the basic setup and for the variants that focus on a high precision. The quantitative evaluation using real-world data covers both the accuracy which can be achieved with a time-of-flight camera (SwissRanger 4000) and with a structured light depth camera (Kinect). With the basic setup and the structured light depth camera, differences of 8 to 24 millimeters can be detected from one meter measurement distance. With the enhancements proposed for precise 3D difference detection, differences of 4 to 12 millimeters can be detected from one meter measurement distance using the same depth camera.
By solving the challenges described by the three research question, this thesis provides a solution for precise real-time 3D difference detection based on depth images. With the approach proposed in this thesis, dense 3D differences can be detected in real time and from arbitrary viewpoints using a single depth camera. Furthermore, by coupling the depth camera with a coordinate measuring machine and by integrating 3D reconstruction in the 3D difference detection, 3D differences can be detected in real time and with a high precision
CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation
Realistic simulation is key to enabling safe and scalable development of %
self-driving vehicles. A core component is simulating the sensors so that the
entire autonomy system can be tested in simulation. Sensor simulation involves
modeling traffic participants, such as vehicles, with high quality appearance
and articulated geometry, and rendering them in real time. The self-driving
industry has typically employed artists to build these assets. However, this is
expensive, slow, and may not reflect reality. Instead, reconstructing assets
automatically from sensor data collected in the wild would provide a better
path to generating a diverse and large set with good real-world coverage.
Nevertheless, current reconstruction approaches struggle on in-the-wild sensor
data, due to its sparsity and noise. To tackle these issues, we present CADSim,
which combines part-aware object-class priors via a small set of CAD models
with differentiable rendering to automatically reconstruct vehicle geometry,
including articulated wheels, with high-quality appearance. Our experiments
show our method recovers more accurate shapes from sparse data compared to
existing approaches. Importantly, it also trains and renders efficiently. We
demonstrate our reconstructed vehicles in several applications, including
accurate testing of autonomy perception systems.Comment: CoRL 2022. Project page: https://waabi.ai/cadsim
Augmentation of Visual Odometry using Radar
As UAVs become viable for more applications, pose estimation continues to be critical. All UAVs need to know where they are at all times, in order to avoid disaster. However, in the event that UAVs are deployed in an area with poor visual conditions, such as in many disaster scenarios, many localization algorithms have difficulties working.
This thesis presents VIL-DSO, a visual odometry method as a pose estimation solution, combining several different algorithms in order to improve pose estimation and provide metric scale. This thesis also presents a method for automatically determining an accurate physical transform between radar and camera data, and in doing so, allow for the projection of radar information into the image plane. Finally, this thesis presents EVIL-DSO, a method for localization that fuses visual-inertial odometry with radar information. The proposed EVIL-DSO algorithm uses radar information projected into the image plane in order to create a depth map for odometry to directly observe depth of features, which can then be used as part of the odometry algorithm to remove the need to perform costly depth estimations.
Trajectory analysis of the proposed algorithm on outdoor data, compared to differential GPS data, shows that the proposed algorithm is more accurate in terms of root-mean-square error, as well as having a lower percentage of scale error. Runtime analysis shows that the proposed algorithm updates more frequently than other, similar, algorithms
Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases
The proliferation of technologies, such as extended reality (XR), has
increased the demand for high-quality three-dimensional (3D) graphical
representations. Industrial 3D applications encompass computer-aided design
(CAD), finite element analysis (FEA), scanning, and robotics. However, current
methods employed for industrial 3D representations suffer from high
implementation costs and reliance on manual human input for accurate 3D
modeling. To address these challenges, neural radiance fields (NeRFs) have
emerged as a promising approach for learning 3D scene representations based on
provided training 2D images. Despite a growing interest in NeRFs, their
potential applications in various industrial subdomains are still unexplored.
In this paper, we deliver a comprehensive examination of NeRF industrial
applications while also providing direction for future research endeavors. We
also present a series of proof-of-concept experiments that demonstrate the
potential of NeRFs in the industrial domain. These experiments include
NeRF-based video compression techniques and using NeRFs for 3D motion
estimation in the context of collision avoidance. In the video compression
experiment, our results show compression savings up to 48\% and 74\% for
resolutions of 1920x1080 and 300x168, respectively. The motion estimation
experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF)
and achieved an average peak signal-to-noise ratio (PSNR) of disparity map with
the value of 23 dB and an structural similarity index measure (SSIM) 0.97
From Capture to Display: A Survey on Volumetric Video
Volumetric video, which offers immersive viewing experiences, is gaining
increasing prominence. With its six degrees of freedom, it provides viewers
with greater immersion and interactivity compared to traditional videos.
Despite their potential, volumetric video services poses significant
challenges. This survey conducts a comprehensive review of the existing
literature on volumetric video. We firstly provide a general framework of
volumetric video services, followed by a discussion on prerequisites for
volumetric video, encompassing representations, open datasets, and quality
assessment metrics. Then we delve into the current methodologies for each stage
of the volumetric video service pipeline, detailing capturing, compression,
transmission, rendering, and display techniques. Lastly, we explore various
applications enabled by this pioneering technology and we present an array of
research challenges and opportunities in the domain of volumetric video
services. This survey aspires to provide a holistic understanding of this
burgeoning field and shed light on potential future research trajectories,
aiming to bring the vision of volumetric video to fruition.Comment: Submitte
Plenoptic Signal Processing for Robust Vision in Field Robotics
This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications