50,503 research outputs found
Real time trinocular stereo for tele-immersion
Tele-immersion is a technology that augments your space with real-time 3D projections of remote spaces thus facilitating the interaction of people from different places in virtually the same environment. Tele-immersion combines 3D scene recovery from computer vision, and rendering and interaction from computer graphics. We describe the real-time 3D scene acquisition using a new algorithm for trinocular stereo. We extend this method in time by combining motion and stereo in order to increase speed and robustness
Robust visual servoing in 3d reaching tasks
This paper describes a novel approach to the problem of reaching an object in space under visual guidance. The approach is characterized by a great robustness to calibration errors, such that virtually no calibration is required. Servoing is based on binocular vision: a continuous measure of the end-effector motion field, derived from real-time computation of the binocular optical flow over the stereo images, is compared with the actual position of the target and the relative error in the end-effector trajectory is continuously corrected. The paper outlines the general framework of the approach, shows how visual measures are obtained and discusses the synthesis of the controller along with its stability analysis. Real-time experiments are presented to show the applicability of the approach in real 3-D applications
Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality
Real-time Stereo Matching is a cornerstone algorithm for many Extended
Reality (XR) applications, such as indoor 3D understanding, video pass-through,
and mixed-reality games. Despite significant advancements in deep stereo
methods, achieving real-time depth inference with high accuracy on a low-power
device remains a major challenge. One of the major difficulties is the lack of
high-quality indoor video stereo training datasets captured by head-mounted
VR/AR glasses. To address this issue, we introduce a novel video stereo
synthetic dataset that comprises photorealistic renderings of various indoor
scenes and realistic camera motion captured by a 6-DoF moving VR/AR
head-mounted display (HMD). This facilitates the evaluation of existing
approaches and promotes further research on indoor augmented reality scenarios.
Our newly proposed dataset enables us to develop a novel framework for
continuous video-rate stereo matching.
As another contribution, our dataset enables us to proposed a new video-based
stereo matching approach tailored for XR applications, which achieves real-time
inference at an impressive 134fps on a standard desktop computer, or 30fps on a
battery-powered HMD. Our key insight is that disparity and contextual
information are highly correlated and redundant between consecutive stereo
frames. By unrolling an iterative cost aggregation in time (i.e. in the
temporal dimension), we are able to distribute and reuse the aggregated
features over time. This approach leads to a substantial reduction in
computation without sacrificing accuracy. We conducted extensive evaluations
and comparisons and demonstrated that our method achieves superior performance
compared to the current state-of-the-art, making it a strong contender for
real-time stereo matching in VR/AR applications
Multi Camera Stereo and Tracking Patient Motion for SPECT Scanning Systems
Patient motion, which causes artifacts in reconstructed images, can be a serious problem in Single Photon Emission Computed Tomography (SPECT) imaging. If patient motion can be detected and quantified, the reconstruction algorithm can compensate for the motion. A real-time multi-threaded Visual Tracking System (VTS) using optical cameras, which will be suitable for deployment in clinical trials, is under development. The VTS tracks patients using multiple video images and image processing techniques, calculating patient motion in three-dimensional space. This research aimed to develop and implement an algorithm for feature matching and stereo location computation using multiple cameras. Feature matching is done based on the epipolar geometry constraints for a pair of images and extended to the multiple view case with an iterative algorithm. Stereo locations of the matches are then computed using sum of squared distances from the projected 3D lines in SPECT coordinates as the error metric. This information from the VTS, when coupled with motion assessment from the emission data itself, can provide a robust compensation for patient motion as part of reconstruction
Self-Attention Dense Depth Estimation Network for Unrectified Video Sequences
The dense depth estimation of a 3D scene has numerous applications, mainly in
robotics and surveillance. LiDAR and radar sensors are the hardware solution
for real-time depth estimation, but these sensors produce sparse depth maps and
are sometimes unreliable. In recent years research aimed at tackling depth
estimation using single 2D image has received a lot of attention. The deep
learning based self-supervised depth estimation methods from the rectified
stereo and monocular video frames have shown promising results. We propose a
self-attention based depth and ego-motion network for unrectified images. We
also introduce non-differentiable distortion of the camera into the training
pipeline. Our approach performs competitively when compared to other
established approaches that used rectified images for depth estimation
- …