10,274 research outputs found
Learned Monocular Depth Priors in Visual-Inertial Initialization
Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR
and autonomous robotic systems today, in both academia and industry. However,
these systems are highly sensitive to the initialization of key parameters such
as sensor biases, gravity direction, and metric scale. In practical scenarios
where high-parallax or variable acceleration assumptions are rarely met (e.g.
hovering aerial robot, smartphone AR user not gesticulating with phone),
classical visual-inertial initialization formulations often become
ill-conditioned and/or fail to meaningfully converge. In this paper we target
visual-inertial initialization specifically for these low-excitation scenarios
critical to in-the-wild usage. We propose to circumvent the limitations of
classical visual-inertial structure-from-motion (SfM) initialization by
incorporating a new learning-based measurement as a higher-level input. We
leverage learned monocular depth images (mono-depth) to constrain the relative
depth of features, and upgrade the mono-depth to metric scale by jointly
optimizing for its scale and shift. Our experiments show a significant
improvement in problem conditioning compared to a classical formulation for
visual-inertial initialization, and demonstrate significant accuracy and
robustness improvements relative to the state-of-the-art on public benchmarks,
particularly under motion-restricted scenarios. We further extend this
improvement to implementation within an existing odometry system to illustrate
the impact of our improved initialization method on resulting tracking
trajectories
Adaptive User Perspective Rendering for Handheld Augmented Reality
Handheld Augmented Reality commonly implements some variant of magic lens
rendering, which turns only a fraction of the user's real environment into AR
while the rest of the environment remains unaffected. Since handheld AR devices
are commonly equipped with video see-through capabilities, AR magic lens
applications often suffer from spatial distortions, because the AR environment
is presented from the perspective of the camera of the mobile device. Recent
approaches counteract this distortion based on estimations of the user's head
position, rendering the scene from the user's perspective. To this end,
approaches usually apply face-tracking algorithms on the front camera of the
mobile device. However, this demands high computational resources and therefore
commonly affects the performance of the application beyond the already high
computational load of AR applications. In this paper, we present a method to
reduce the computational demands for user perspective rendering by applying
lightweight optical flow tracking and an estimation of the user's motion before
head tracking is started. We demonstrate the suitability of our approach for
computationally limited mobile devices and we compare it to device perspective
rendering, to head tracked user perspective rendering, as well as to fixed
point of view user perspective rendering
Wireless Software Synchronization of Multiple Distributed Cameras
We present a method for precisely time-synchronizing the capture of image
sequences from a collection of smartphone cameras connected over WiFi. Our
method is entirely software-based, has only modest hardware requirements, and
achieves an accuracy of less than 250 microseconds on unmodified commodity
hardware. It does not use image content and synchronizes cameras prior to
capture. The algorithm operates in two stages. In the first stage, we designate
one device as the leader and synchronize each client device's clock to it by
estimating network delay. Once clocks are synchronized, the second stage
initiates continuous image streaming, estimates the relative phase of image
timestamps between each client and the leader, and shifts the streams into
alignment. We quantitatively validate our results on a multi-camera rig imaging
a high-precision LED array and qualitatively demonstrate significant
improvements to multi-view stereo depth estimation and stitching of dynamic
scenes. We release as open source 'libsoftwaresync', an Android implementation
of our system, to inspire new types of collective capture applications.Comment: Main: 9 pages, 10 figures. Supplemental: 3 pages, 5 figure
Towards a Practical Pedestrian Distraction Detection Framework using Wearables
Pedestrian safety continues to be a significant concern in urban communities
and pedestrian distraction is emerging as one of the main causes of grave and
fatal accidents involving pedestrians. The advent of sophisticated mobile and
wearable devices, equipped with high-precision on-board sensors capable of
measuring fine-grained user movements and context, provides a tremendous
opportunity for designing effective pedestrian safety systems and applications.
Accurate and efficient recognition of pedestrian distractions in real-time
given the memory, computation and communication limitations of these devices,
however, remains the key technical challenge in the design of such systems.
Earlier research efforts in pedestrian distraction detection using data
available from mobile and wearable devices have primarily focused only on
achieving high detection accuracy, resulting in designs that are either
resource intensive and unsuitable for implementation on mainstream mobile
devices, or computationally slow and not useful for real-time pedestrian safety
applications, or require specialized hardware and less likely to be adopted by
most users. In the quest for a pedestrian safety system that achieves a
favorable balance between computational efficiency, detection accuracy, and
energy consumption, this paper makes the following main contributions: (i)
design of a novel complex activity recognition framework which employs motion
data available from users' mobile and wearable devices and a lightweight
frequency matching approach to accurately and efficiently recognize complex
distraction related activities, and (ii) a comprehensive comparative evaluation
of the proposed framework with well-known complex activity recognition
techniques in the literature with the help of data collected from human subject
pedestrians and prototype implementations on commercially-available mobile and
wearable devices
- …