Search CORE

23 research outputs found

NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

Author: Kevin Matzen
Noah Snavely
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Geometry and geography can play an important role in recognition tasks in computer vision. To aid in study-ing connections between geometry and recognition, we in-troduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and ge-ographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric in-formation in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic in-formation in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration. 1

CiteSeerX

Crossref

Omnidirectional Stereo

Author: Christopher Schroers
H. Ishiguro
J Gluckman
K. Tanaka
Kevin Matzen
Robert Anderson
Robert Konrad
S. Peleg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2020
Field of study

Omnidirectional stereo (ODS) is a type of multi-perspective projection that captures horizontal parallax tangential to a viewing circle. This data allows the creation of stereo panoramas that provide plausible stereo views in all viewing directions on the equatorial plane

OPUS

Crossref

A Practical Stereo Depth System for Smart Glasses

Author: Alsisan Suhib
Bapat Akash
Blackburn-Matzen Kevin
Cohen Michael F.
Frahm Jan-Michael
He Zijian
Lehman Jonathan
Scharstein Daniel
Tsai Sam
Uyttendaele Matt
Vajda Peter
Wang Jialiang
Wang Yanghan
Yu Matthew
Publication venue
Publication date: 31/03/2023
Field of study

We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. The output of our depth sensing system is then used in a novel view generation pipeline to create 3D computational photography effects using point-of-view images captured by smart glasses. All these steps are executed on-device on the stringent compute budget of a mobile phone, and because we expect the users can use a wide range of smartphones, our design needs to be general and cannot be dependent on a particular hardware or ML accelerator such as a smartphone GPU. Although each of these steps is well studied, a description of a practical system is still lacking. For such a system, all these steps need to work in tandem with one another and fallback gracefully on failures within the system or less than ideal input data. We show how we handle unforeseen changes to calibration, e.g., due to heat, robustly support depth estimation in the wild, and still abide by the memory and latency constraints required for a smooth user experience. We show that our trained models are fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU. Our models generalize well to unseen data and achieve good results on Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202

arXiv.org e-Print Archive