23 research outputs found
NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
Geometry and geography can play an important role in recognition tasks in computer vision. To aid in study-ing connections between geometry and recognition, we in-troduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and ge-ographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric in-formation in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic in-formation in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration. 1
Omnidirectional Stereo
Omnidirectional stereo (ODS) is a type of multi-perspective projection that captures horizontal parallax tangential to a viewing circle. This data allows the creation of stereo panoramas that provide plausible stereo views in all viewing directions on the equatorial plane
A Practical Stereo Depth System for Smart Glasses
We present the design of a productionized end-to-end stereo depth sensing
system that does pre-processing, online stereo rectification, and stereo depth
estimation with a fallback to monocular depth estimation when rectification is
unreliable. The output of our depth sensing system is then used in a novel view
generation pipeline to create 3D computational photography effects using
point-of-view images captured by smart glasses. All these steps are executed
on-device on the stringent compute budget of a mobile phone, and because we
expect the users can use a wide range of smartphones, our design needs to be
general and cannot be dependent on a particular hardware or ML accelerator such
as a smartphone GPU. Although each of these steps is well studied, a
description of a practical system is still lacking. For such a system, all
these steps need to work in tandem with one another and fallback gracefully on
failures within the system or less than ideal input data. We show how we handle
unforeseen changes to calibration, e.g., due to heat, robustly support depth
estimation in the wild, and still abide by the memory and latency constraints
required for a smooth user experience. We show that our trained models are
fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU.
Our models generalize well to unseen data and achieve good results on
Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202