23 research outputs found

    NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

    Full text link
    Geometry and geography can play an important role in recognition tasks in computer vision. To aid in study-ing connections between geometry and recognition, we in-troduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and ge-ographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric in-formation in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic in-formation in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration. 1

    Omnidirectional Stereo

    Get PDF
    Omnidirectional stereo (ODS) is a type of multi-perspective projection that captures horizontal parallax tangential to a viewing circle. This data allows the creation of stereo panoramas that provide plausible stereo views in all viewing directions on the equatorial plane

    A Practical Stereo Depth System for Smart Glasses

    Full text link
    We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. The output of our depth sensing system is then used in a novel view generation pipeline to create 3D computational photography effects using point-of-view images captured by smart glasses. All these steps are executed on-device on the stringent compute budget of a mobile phone, and because we expect the users can use a wide range of smartphones, our design needs to be general and cannot be dependent on a particular hardware or ML accelerator such as a smartphone GPU. Although each of these steps is well studied, a description of a practical system is still lacking. For such a system, all these steps need to work in tandem with one another and fallback gracefully on failures within the system or less than ideal input data. We show how we handle unforeseen changes to calibration, e.g., due to heat, robustly support depth estimation in the wild, and still abide by the memory and latency constraints required for a smooth user experience. We show that our trained models are fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU. Our models generalize well to unseen data and achieve good results on Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202
    corecore