587 research outputs found
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery
One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions
Application of augmented reality and robotic technology in broadcasting: A survey
As an innovation technique, Augmented Reality (AR) has been gradually deployed in the broadcast, videography and cinematography industries. Virtual graphics generated by AR are dynamic and overlap on the surface of the environment so that the original appearance can be greatly enhanced in comparison with traditional broadcasting. In addition, AR enables broadcasters to interact with augmented virtual 3D models on a broadcasting scene in order to enhance the performance of broadcasting. Recently, advanced robotic technologies have been deployed in a camera shooting system to create a robotic cameraman so that the performance of AR broadcasting could be further improved, which is highlighted in the paper
3D sparse feature model using short baseline stereo and multiple view registration
This paper outlines a methodology to generate a distinctive object representation offline, using short-baseline stereo fundamentals to triangulate highly descriptive object features in multiple pairs of stereo images. A group of sparse 2.5D perspective views are built and the multiple views are then fused into a single sparse 3D model using a common 3D shape registration technique. Having prior knowledge, such as the proposed sparse feature model, is useful when detecting an object and estimating its pose for real-time systems like augmented reality
UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers
This paper proposes a novel approach for Simultaneous Localization and
Mapping by fusing natural and artificial landmarks. Most of the SLAM approaches
use natural landmarks (such as keypoints). However, they are unstable over
time, repetitive in many cases or insufficient for a robust tracking (e.g. in
indoor buildings). On the other hand, other approaches have employed artificial
landmarks (such as squared fiducial markers) placed in the environment to help
tracking and relocalization. We propose a method that integrates both
approaches in order to achieve long-term robust tracking in many scenarios.
Our method has been compared to the start-of-the-art methods ORB-SLAM2 and
LDSO in the public dataset Kitti, Euroc-MAV, TUM and SPM, obtaining better
precision, robustness and speed. Our tests also show that the combination of
markers and keypoints achieves better accuracy than each one of them
independently.Comment: Paper submitted to Pattern Recognitio
Cross-View Visual Geo-Localization for Outdoor Augmented Reality
Precise estimation of global orientation and location is critical to ensure a
compelling outdoor Augmented Reality (AR) experience. We address the problem of
geo-pose estimation by cross-view matching of query ground images to a
geo-referenced aerial satellite image database. Recently, neural network-based
methods have shown state-of-the-art performance in cross-view matching.
However, most of the prior works focus only on location estimation, ignoring
orientation, which cannot meet the requirements in outdoor AR applications. We
propose a new transformer neural network-based model and a modified triplet
ranking loss for joint location and orientation estimation. Experiments on
several benchmark cross-view geo-localization datasets show that our model
achieves state-of-the-art performance. Furthermore, we present an approach to
extend the single image query-based geo-localization approach by utilizing
temporal information from a navigation pipeline for robust continuous
geo-localization. Experimentation on several large-scale real-world video
sequences demonstrates that our approach enables high-precision and stable AR
insertion.Comment: IEEE VR 202
Towards System Agnostic Calibration of Optical See-Through Head-Mounted Displays for Augmented Reality
This dissertation examines the developments and progress of spatial calibration procedures for Optical See-Through (OST) Head-Mounted Display (HMD) devices for visual Augmented Reality (AR) applications. Rapid developments in commercial AR systems have created an explosion of OST device options for not only research and industrial purposes, but also the consumer market as well. This expansion in hardware availability is equally matched by a need for intuitive standardized calibration procedures that are not only easily completed by novice users, but which are also readily applicable across the largest range of hardware options. This demand for robust uniform calibration schemes is the driving motive behind the original contributions offered within this work. A review of prior surveys and canonical description for AR and OST display developments is provided before narrowing the contextual scope to the research questions evolving within the calibration domain. Both established and state of the art calibration techniques and their general implementations are explored, along with prior user study assessments and the prevailing evaluation metrics and practices employed within. The original contributions begin with a user study evaluation comparing and contrasting the accuracy and precision of an established manual calibration method against a state of the art semi-automatic technique. This is the first formal evaluation of any non-manual approach and provides insight into the current usability limitations of present techniques and the complexities of next generation methods yet to be solved. The second study investigates the viability of a user-centric approach to OST HMD calibration through novel adaptation of manual calibration to consumer level hardware. Additional contributions describe the development of a complete demonstration application incorporating user-centric methods, a novel strategy for visualizing both calibration results and registration error from the user’s perspective, as well as a robust intuitive presentation style for binocular manual calibration. The final study provides further investigation into the accuracy differences observed between user-centric and environment-centric methodologies. The dissertation concludes with a summarization of the contribution outcomes and their impact on existing AR systems and research endeavors, as well as a short look ahead into future extensions and paths that continued calibration research should explore
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments
It is typically challenging for visual or visual-inertial odometry systems to
handle the problems of dynamic scenes and pure rotation. In this work, we
design a novel visual-inertial odometry (VIO) system called RD-VIO to handle
both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which
can robustly detect and match keypoints in a two-stage process. In the first
state, landmarks are matched with new keypoints using visual and IMU
measurements. We collect statistical information from the matching and then
guide the intra-keypoint matching in the second stage. Secondly, to handle the
problem of pure rotation, we detect the motion type and adapt the
deferred-triangulation technique during the data-association process. We make
the pure-rotational frames into the special subframes. When solving the
visual-inertial bundle adjustment, they provide additional constraints to the
pure-rotational motion. We evaluate the proposed VIO system on public datasets.
Experiments show the proposed RD-VIO has obvious advantages over other methods
in dynamic environments
- …