6,102 research outputs found
Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots
In the last decade, many medical companies and research groups have tried to
convert passive capsule endoscopes as an emerging and minimally invasive
diagnostic technology into actively steerable endoscopic capsule robots which
will provide more intuitive disease detection, targeted drug delivery and
biopsy-like operations in the gastrointestinal(GI) tract. In this study, we
introduce a fully unsupervised, real-time odometry and depth learner for
monocular endoscopic capsule robots. We establish the supervision by warping
view sequences and assigning the re-projection minimization to the loss
function, which we adopt in multi-view pose estimation and single-view depth
estimation network. Detailed quantitative and qualitative analyses of the
proposed framework performed on non-rigidly deformable ex-vivo porcine stomach
datasets proves the effectiveness of the method in terms of motion estimation
and depth recovery.Comment: submitted to IROS 201
Micro Fourier Transform Profilometry (FTP): 3D shape measurement at 10,000 frames per second
Recent advances in imaging sensors and digital light projection technology
have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces
of complex-shaped objects to be captured with improved resolution and accuracy.
However, due to the large number of projection patterns required for phase
recovery and disambiguation, the maximum fame rates of current 3D shape
measurement techniques are still limited to the range of hundreds of frames per
second (fps). Here, we demonstrate a new 3D dynamic imaging technique, Micro
Fourier Transform Profilometry (FTP), which can capture 3D surfaces of
transient events at up to 10,000 fps based on our newly developed high-speed
fringe projection system. Compared with existing techniques, FTP has the
prominent advantage of recovering an accurate, unambiguous, and dense 3D point
cloud with only two projected patterns. Furthermore, the phase information is
encoded within a single high-frequency fringe image, thereby allowing
motion-artifact-free reconstruction of transient events with temporal
resolution of 50 microseconds. To show FTP's broad utility, we use it to
reconstruct 3D videos of 4 transient scenes: vibrating cantilevers, rotating
fan blades, bullet fired from a toy gun, and balloon's explosion triggered by a
flying dart, which were previously difficult or even unable to be captured with
conventional approaches.Comment: This manuscript was originally submitted on 30th January 1
Cavlectometry: Towards Holistic Reconstruction of Large Mirror Objects
We introduce a method based on the deflectometry principle for the
reconstruction of specular objects exhibiting significant size and geometric
complexity. A key feature of our approach is the deployment of an Automatic
Virtual Environment (CAVE) as pattern generator. To unfold the full power of
this extraordinary experimental setup, an optical encoding scheme is developed
which accounts for the distinctive topology of the CAVE. Furthermore, we devise
an algorithm for detecting the object of interest in raw deflectometric images.
The segmented foreground is used for single-view reconstruction, the background
for estimation of the camera pose, necessary for calibrating the sensor system.
Experiments suggest a significant gain of coverage in single measurements
compared to previous methods. To facilitate research on specular surface
reconstruction, we will make our data set publicly available
Vision technology/algorithms for space robotics applications
The thrust of automation and robotics for space applications has been proposed for increased productivity, improved reliability, increased flexibility, higher safety, and for the performance of automating time-consuming tasks, increasing productivity/performance of crew-accomplished tasks, and performing tasks beyond the capability of the crew. This paper provides a review of efforts currently in progress in the area of robotic vision. Both systems and algorithms are discussed. The evolution of future vision/sensing is projected to include the fusion of multisensors ranging from microwave to optical with multimode capability to include position, attitude, recognition, and motion parameters. The key feature of the overall system design will be small size and weight, fast signal processing, robust algorithms, and accurate parameter determination. These aspects of vision/sensing are also discussed
Survey on Controlable Image Synthesis with Deep Learning
Image synthesis has attracted emerging research interests in academic and
industry communities. Deep learning technologies especially the generative
models greatly inspired controllable image synthesis approaches and
applications, which aim to generate particular visual contents with latent
prompts. In order to further investigate low-level controllable image synthesis
problem which is crucial for fine image rendering and editing tasks, we present
a survey of some recent works on 3D controllable image synthesis using deep
learning. We first introduce the datasets and evaluation indicators for 3D
controllable image synthesis. Then, we review the state-of-the-art research for
geometrically controllable image synthesis in two aspects: 1)
Viewpoint/pose-controllable image synthesis; 2) Structure/shape-controllable
image synthesis. Furthermore, the photometrically controllable image synthesis
approaches are also reviewed for 3D re-lighting researches. While the emphasis
is on 3D controllable image synthesis algorithms, the related applications,
products and resources are also briefly summarized for practitioners.Comment: 19 pages, 17 figure
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields
Neural Radiance Field (NeRF) and its variants have exhibited great success on
representing 3D scenes and synthesizing photo-realistic novel views. However,
they are generally based on the pinhole camera model and assume all-in-focus
inputs. This limits their applicability as images captured from the real world
often have finite depth-of-field (DoF). To mitigate this issue, we introduce
DoF-NeRF, a novel neural rendering approach that can deal with shallow DoF
inputs and can simulate DoF effect. In particular, it extends NeRF to simulate
the aperture of lens following the principles of geometric optics. Such a
physical guarantee allows DoF-NeRF to operate views with different focus
configurations. Benefiting from explicit aperture modeling, DoF-NeRF also
enables direct manipulation of DoF effect by adjusting virtual aperture and
focus parameters. It is plug-and-play and can be inserted into NeRF-based
frameworks. Experiments on synthetic and real-world datasets show that,
DoF-NeRF not only performs comparably with NeRF in the all-in-focus setting,
but also can synthesize all-in-focus novel views conditioned on shallow DoF
inputs. An interesting application of DoF-NeRF to DoF rendering is also
demonstrated. The source code will be made available at
https://github.com/zijinwuzijin/DoF-NeRF.Comment: Accepted by ACMMM 202
- …