73 research outputs found
Wireless Software Synchronization of Multiple Distributed Cameras
We present a method for precisely time-synchronizing the capture of image
sequences from a collection of smartphone cameras connected over WiFi. Our
method is entirely software-based, has only modest hardware requirements, and
achieves an accuracy of less than 250 microseconds on unmodified commodity
hardware. It does not use image content and synchronizes cameras prior to
capture. The algorithm operates in two stages. In the first stage, we designate
one device as the leader and synchronize each client device's clock to it by
estimating network delay. Once clocks are synchronized, the second stage
initiates continuous image streaming, estimates the relative phase of image
timestamps between each client and the leader, and shifts the streams into
alignment. We quantitatively validate our results on a multi-camera rig imaging
a high-precision LED array and qualitatively demonstrate significant
improvements to multi-view stereo depth estimation and stitching of dynamic
scenes. We release as open source 'libsoftwaresync', an Android implementation
of our system, to inspire new types of collective capture applications.Comment: Main: 9 pages, 10 figures. Supplemental: 3 pages, 5 figure
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
Riesz pyramids for fast phase-based video magnification
We present a new compact image pyramid representation, the Riesz pyramid, that can be used for real-time phase-based motion magnification. Our new representation is less overcomplete than even the smallest two orientation, octave-bandwidth complex steerable pyramid, and can be implemented using compact, efficient linear filters in the spatial domain. Motion-magnified videos produced with this new representation are of comparable quality to those produced with the complex steerable pyramid. When used with phase-based video magnification, the Riesz pyramid phase-shifts image features along only their dominant orientation rather than every orientation like the complex steerable pyramid.Quanta Computer (Firm)Shell ResearchNational Science Foundation (U.S.) (CGV-1111415)Microsoft Research (PhD Fellowship)Massachusetts Institute of Technology. Department of MathematicsNational Science Foundation (U.S.). Graduate Research Fellowship (Grant 1122374
Quaternionic Representation of the Riesz Pyramid for Video Magnification
Recently, we presented a new image pyramid, called the Riesz pyramid, that uses the Riesz transform to manipulate the phase in non-oriented sub-bands of an image sequence to produce real-time motion-magnified videos. In this report we give a quaternionic formulation of the Riesz pyramid, and show how several seemingly heuristic choices in how to use the Riesz transform for phase-based video magnification fall out of this formulation in a natural and principled way. We intend this report to accompany the original paper on the Riesz pyramid for video magnification
Phase-based video motion processing
We introduce a technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids. Phase variations of the coefficients of a complex-valued steerable pyramid over time correspond to motion, and can be temporally processed and amplified to reveal imperceptible motions, or attenuated to remove distracting changes. This processing does not involve the computation of optical flow, and in comparison to the previous Eulerian Video Magnification method it supports larger amplification factors and is significantly less sensitive to noise. These improved capabilities broaden the set of applications for motion processing in videos. We demonstrate the advantages of this approach on synthetic and natural video sequences, and explore applications in scientific analysis, visualization and video enhancement.Shell ResearchUnited States. Defense Advanced Research Projects Agency. Soldier Centric Imaging via Computational CamerasNational Science Foundation (U.S.) (CGV-1111415)Cognex CorporationMicrosoft Research (PhD Fellowship)American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi
The visual microphone: Passive recovery of sound from video
When sound hits an object, it causes small vibrations of the object's surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects---a glass of water, a potted plant, a box of tissues, or a bag of chips---into visual microphones. We recover sounds from high-speed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object's surface, which we can use to recover the vibration modes of an object.Qatar Computing Research InstituteNational Science Foundation (U.S.) (CGV-1111415)National Science Foundation (U.S.). Graduate Research Fellowship (Grant 1122374)Massachusetts Institute of Technology. Department of MathematicsMicrosoft Research (PhD Fellowship
Learned Dual-View Reflection Removal
Traditional reflection removal algorithms either use a single image as input,
which suffers from intrinsic ambiguities, or use multiple images from a moving
camera, which is inconvenient for users. We instead propose a learning-based
dereflection algorithm that uses stereo images as input. This is an effective
trade-off between the two extremes: the parallax between two views provides
cues to remove reflections, and two views are easy to capture due to the
adoption of stereo cameras in smartphones. Our model consists of a
learning-based reflection-invariant flow model for dual-view registration, and
a learned synthesis model for combining aligned image pairs. Because no dataset
for dual-view reflection removal exists, we render a synthetic dataset of
dual-views with and without reflections for use in training. Our evaluation on
an additional real-world dataset of stereo pairs shows that our algorithm
outperforms existing single-image and multi-image dereflection approaches.Comment: http://sniklaus.com/dualre
- …