666 research outputs found

    A Practical Stereo Depth System for Smart Glasses

    Full text link
    We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. The output of our depth sensing system is then used in a novel view generation pipeline to create 3D computational photography effects using point-of-view images captured by smart glasses. All these steps are executed on-device on the stringent compute budget of a mobile phone, and because we expect the users can use a wide range of smartphones, our design needs to be general and cannot be dependent on a particular hardware or ML accelerator such as a smartphone GPU. Although each of these steps is well studied, a description of a practical system is still lacking. For such a system, all these steps need to work in tandem with one another and fallback gracefully on failures within the system or less than ideal input data. We show how we handle unforeseen changes to calibration, e.g., due to heat, robustly support depth estimation in the wild, and still abide by the memory and latency constraints required for a smooth user experience. We show that our trained models are fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU. Our models generalize well to unseen data and achieve good results on Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202

    Vision-Based Three Dimensional Hand Interaction In Markerless Augmented Reality Environment

    Get PDF
    Kemunculan realiti tambahan membolehkan objek maya untuk wujud bersama dengan dunia sebenar dan ini memberi kaedah baru untuk berinteraksi dengan objek maya. Sistem realiti tambahan memerlukan penunjuk tertentu, seperti penanda untuk menentukan bagaimana objek maya wujud dalam dunia sebenar. Penunjuk tertentu mesti diperolehi untuk menggunakan sistem realiti tambahan, tetapi susah untuk seseorang mempunyai penunjuk tersebut pada bila-bila masa. Tangan manusia, yang merupakan sebahagian dari badan manusia dapat menyelesaikan masalah ini. Selain itu, tangan boleh digunakan untuk berinteraksi dengan objek maya dalam dunia realiti tambahan. Tesis ini membentangkan sebuah sistem realiti tambahan yang menggunakan tangan terbuka untuk pendaftaran objek maya dalam persekitaran sebenar dan membolehkan pengguna untuk menggunakan tangan yang satu lagi untuk berinteraksi dengan objek maya yang ditambahkan dalam tiga-matra. Untuk menggunakan tangan untuk pendaftaran dan interaksi dalam realiti tambahan, postur dan isyarat tangan pengguna perlu dikesan. The advent of augmented reality (AR) enables virtual objects to be superimposed on the real world and provides a new way to interact with the virtual objects. AR system requires an indicator to determine for how the virtual objects aligned in the real world. The indicator must first be obtained to access to a particular AR system. It may be inconvenient to have the indicator in reach at all time. Human hand, which is part of the human body may be a solution for this. Besides, hand is also a promising tool for interaction with virtual objects in AR environment. This thesis presents a markerless Augmented Reality system which utilizes outstretched hand for registration of virtual objects in the real environment and enables the users to have three dimensional (3D) interaction with the augmented virtual objects. To employ the hand for registration and interaction in AR, hand postures and gestures that the user perform has to be recognized

    Three dimensional information estimation and tracking for moving objects detection using two cameras framework

    Get PDF
    Calibration, matching and tracking are major concerns to obtain 3D information consisting of depth, direction and velocity. In finding depth, camera parameters and matched points are two necessary inputs. Depth, direction and matched points can be achieved accurately if cameras are well calibrated using manual traditional calibration. However, most of the manual traditional calibration methods are inconvenient to use because markers or real size of an object in the real world must be provided or known. Self-calibration can solve the traditional calibration limitation, but not on depth and matched points. Other approaches attempted to match corresponding object using 2D visual information without calibration, but they suffer low matching accuracy under huge perspective distortion. This research focuses on achieving 3D information using self-calibrated tracking system. In this system, matching and tracking are done under self-calibrated condition. There are three contributions introduced in this research to achieve the objectives. Firstly, orientation correction is introduced to obtain better relationship matrices for matching purpose during tracking. Secondly, after having relationship matrices another post-processing method, which is status based matching, is introduced for improving object matching result. This proposed matching algorithm is able to achieve almost 90% of matching rate. Depth is estimated after the status based matching. Thirdly, tracking is done based on x-y coordinates and the estimated depth under self-calibrated condition. Results show that the proposed self-calibrated tracking system successfully differentiates the location of objects even under occlusion in the field of view, and is able to determine the direction and the velocity of multiple moving objects

    A framework for evaluating stereo-based pedestrian detection techniques

    Get PDF
    Automated pedestrian detection, counting, and tracking have received significant attention in the computer vision community of late. As such, a variety of techniques have been investigated using both traditional 2-D computer vision techniques and, more recently, 3-D stereo information. However, to date, a quantitative assessment of the performance of stereo-based pedestrian detection has been problematic, mainly due to the lack of standard stereo-based test data and an agreed methodology for carrying out the evaluation. This has forced researchers into making subjective comparisons between competing approaches. In this paper, we propose a framework for the quantitative evaluation of a short-baseline stereo-based pedestrian detection system. We provide freely available synthetic and real-world test data and recommend a set of evaluation metrics. This allows researchers to benchmark systems, not only with respect to other stereo-based approaches, but also with more traditional 2-D approaches. In order to illustrate its usefulness, we demonstrate the application of this framework to evaluate our own recently proposed technique for pedestrian detection and tracking

    Single-Image Depth Prediction Makes Feature Matching Easier

    Get PDF
    Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.Comment: 14 pages, 7 figures, accepted for publication at the European conference on computer vision (ECCV) 202

    Camera Planning and Fusion in a Heterogeneous Camera Network

    Get PDF
    Wide-area camera networks are becoming more and more common. They have widerange of commercial and military applications from video surveillance to smart home and from traffic monitoring to anti-terrorism. The design of such a camera network is a challenging problem due to the complexity of the environment, self and mutual occlusion of moving objects, diverse sensor properties and a myriad of performance metrics for different applications. In this dissertation, we consider two such challenges: camera planing and camera fusion. Camera planning is to determine the optimal number and placement of cameras for a target cost function. Camera fusion describes the task of combining images collected by heterogenous cameras in the network to extract information pertinent to a target application. I tackle the camera planning problem by developing a new unified framework based on binary integer programming (BIP) to relate the network design parameters and the performance goals of a variety of camera network tasks. Most of the BIP formulations are NP hard problems and various approximate algorithms have been proposed in the literature. In this dissertation, I develop a comprehensive framework in comparing the entire spectrum of approximation algorithms from Greedy, Markov Chain Monte Carlo (MCMC) to various relaxation techniques. The key contribution is to provide not only a generic formulation of the camera planning problem but also novel approaches to adapt the formulation to powerful approximation schemes including Simulated Annealing (SA) and Semi-Definite Program (SDP). The accuracy, efficiency and scalability of each technique are analyzed and compared in depth. Extensive experimental results are provided to illustrate the strength and weakness of each method. The second problem of heterogeneous camera fusion is a very complex problem. Information can be fused at different levels from pixel or voxel to semantic objects, with large variation in accuracy, communication and computation costs. My focus is on the geometric transformation of shapes between objects observed at different camera planes. This so-called the geometric fusion approach usually provides the most reliable fusion approach at the expense of high computation and communication costs. To tackle the complexity, a hierarchy of camera models with different levels of complexity was proposed to balance the effectiveness and efficiency of the camera network operation. Then different calibration and registration methods are proposed for each camera model. At last, I provide two specific examples to demonstrate the effectiveness of the model: 1)a fusion system to improve the segmentation of human body in a camera network consisted of thermal and regular visible light cameras and 2) a view dependent rendering system by combining the information from depth and regular cameras to collecting the scene information and generating new views in real time

    Pedestrian detection and tracking using stereo vision techniques

    Get PDF
    Automated pedestrian detection, counting and tracking has received significant attention from the computer vision community of late. Many of the person detection techniques described so far in the literature work well in controlled environments, such as laboratory settings with a small number of people. This allows various assumptions to be made that simplify this complex problem. The performance of these techniques, however, tends to deteriorate when presented with unconstrained environments where pedestrian appearances, numbers, orientations, movements, occlusions and lighting conditions violate these convenient assumptions. Recently, 3D stereo information has been proposed as a technique to overcome some of these issues and to guide pedestrian detection. This thesis presents such an approach, whereby after obtaining robust 3D information via a novel disparity estimation technique, pedestrian detection is performed via a 3D point clustering process within a region-growing framework. This clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. This pedestrian detection technique requires no external training and is able to robustly handle challenging real-world unconstrained environments from various camera positions and orientations. In addition, this thesis presents a continuous detect-and-track approach, with additional kinematic constraints and explicit occlusion analysis, to obtain robust temporal tracking of pedestrians over time. These approaches are experimentally validated using challenging datasets consisting of both synthetic data and real-world sequences gathered from a number of environments. In each case, the techniques are evaluated using both 2D and 3D groundtruth methodologies
    corecore