207 research outputs found
Wireless Software Synchronization of Multiple Distributed Cameras
We present a method for precisely time-synchronizing the capture of image
sequences from a collection of smartphone cameras connected over WiFi. Our
method is entirely software-based, has only modest hardware requirements, and
achieves an accuracy of less than 250 microseconds on unmodified commodity
hardware. It does not use image content and synchronizes cameras prior to
capture. The algorithm operates in two stages. In the first stage, we designate
one device as the leader and synchronize each client device's clock to it by
estimating network delay. Once clocks are synchronized, the second stage
initiates continuous image streaming, estimates the relative phase of image
timestamps between each client and the leader, and shifts the streams into
alignment. We quantitatively validate our results on a multi-camera rig imaging
a high-precision LED array and qualitatively demonstrate significant
improvements to multi-view stereo depth estimation and stitching of dynamic
scenes. We release as open source 'libsoftwaresync', an Android implementation
of our system, to inspire new types of collective capture applications.Comment: Main: 9 pages, 10 figures. Supplemental: 3 pages, 5 figure
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
This dissertation advances the state of the art for AR/VR tracking systems by
increasing the tracking frequency by orders of magnitude and proposes an
efficient algorithm for the problem of edge-aware optimization.
AR/VR is a natural way of interacting with computers, where the physical and
digital worlds coexist. We are on the cusp of a radical change in how humans
perform and interact with computing. Humans are sensitive to small
misalignments between the real and the virtual world, and tracking at
kilo-Hertz frequencies becomes essential. Current vision-based systems fall
short, as their tracking frequency is implicitly limited by the frame-rate of
the camera. This thesis presents a prototype system which can track at orders
of magnitude higher than the state-of-the-art methods using multiple commodity
cameras. The proposed system exploits characteristics of the camera
traditionally considered as flaws, namely rolling shutter and radial
distortion. The experimental evaluation shows the effectiveness of the method
for various degrees of motion.
Furthermore, edge-aware optimization is an indispensable tool in the computer
vision arsenal for accurate filtering of depth-data and image-based rendering,
which is increasingly being used for content creation and geometry processing
for AR/VR. As applications increasingly demand higher resolution and speed,
there exists a need to develop methods that scale accordingly. This
dissertation proposes such an edge-aware optimization framework which is
efficient, accurate, and algorithmically scales well, all of which are much
desirable traits not found jointly in the state of the art. The experiments
show the effectiveness of the framework in a multitude of computer vision tasks
such as computational photography and stereo.Comment: PhD thesi
Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature
Most computer vision systems assume distortion-free images as inputs. The
widely used rolling-shutter (RS) image sensors, however, suffer from geometric
distortion when the camera and object undergo motion during capture. Extensive
researches have been conducted on correcting RS distortions. However, most of
the existing work relies heavily on the prior assumptions of scenes or motions.
Besides, the motion estimation steps are either oversimplified or
computationally inefficient due to the heavy flow warping, limiting their
applicability. In this paper, we investigate using rolling shutter with a
global reset feature (RSGR) to restore clean global shutter (GS) videos. This
feature enables us to turn the rectification problem into a deblur-like one,
getting rid of inaccurate and costly explicit motion estimation. First, we
build an optic system that captures paired RSGR/GS videos. Second, we develop a
novel algorithm incorporating spatial and temporal designs to correct the
spatial-varying RSGR distortion. Third, we demonstrate that existing
image-to-image translation algorithms can recover clean GS videos from
distorted RSGR inputs, yet our algorithm achieves the best performance with the
specific designs. Our rendered results are not only visually appealing but also
beneficial to downstream tasks. Compared to the state-of-the-art RS solution,
our RSGR solution is superior in both effectiveness and efficiency. Considering
it is easy to realize without changing the hardware, we believe our RSGR
solution can potentially replace the RS solution in taking distortion-free
videos with low noise and low budget.Comment: CVPR2022, https://github.com/lightChaserX/neural-global-shutte
Cost-effective solution to synchronised audio-visual data capture using multiple sensors
Applications such as surveillance and human behaviour analysis require high-bandwidth recording from multiple cameras, as well as from other sensors. In turn, sensor fusion has increased the required accuracy of synchronisation between sensors. Using commercial off-the-shelf components may compromise quality and accuracy due to several challenges, such as dealing with the combined data rate from multiple sensors; unknown offset and rate discrepancies between independent hardware clocks; the absence of trigger inputs or -outputs in the hardware; as well as the different methods for time-stamping the recorded data. To achieve accurate synchronisation, we centralise the synchronisation task by recording all trigger- or timestamp signals with a multi-channel audio interface. For sensors that don't have an external trigger signal, we let the computer that captures the sensor data periodically generate timestamp signals from its serial port output. These signals can also be used as a common time base to synchronise multiple asynchronous audio interfaces. Furthermore, we show that a consumer PC can currently capture 8-bit video data with 1024 × 1024 spatial- and 59.1 Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96 kHz. We thus improve the quality/cost ratio of multi-sensor systems data capture systems
Cost-effective solution to synchronised audio-visual data capture using multiple sensors
Applications such as surveillance and human behaviour analysis require high- bandwidth recording from multiple cameras, as well as from other sensors. In turn, sensor fusion has increased the required accuracy of synchronisation be- tween sensors. Using commercial off-the-shelf components may compromise quality and accuracy, because it is difficult to handle the combined data rate from multiple sensors, the offset and rate discrepancies between independent hardware clocks, the absence of trigger inputs or -outputs in the hardware, as well as the different methods for timestamping the recorded data. To achieve accurate synchronisation, we centralise the synchronisation task by recording all trigger- or timestamp signals with a multi-channel audio interface. For sensors that don’t have an external trigger signal, we let the computer that captures the sensor data periodically generate timestamp signals from its se- rial port output. These signals can also be used as a common time base to synchronise multiple asynchronous audio interfaces. Furthermore, we show that a consumer PC can currently capture 8-bit video data with 1024x1024 spatial- and 59.1Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96kHz. We thus improve the quality/cost ratio of multi-sensor systems data capture systems
- …