72 research outputs found

    Direct interaction with large displays through monocular computer vision

    Get PDF
    Large displays are everywhere, and have been shown to provide higher productivity gain and user satisfaction compared to traditional desktop monitors. The computer mouse remains the most common input tool for users to interact with these larger displays. Much effort has been made on making this interaction more natural and more intuitive for the user. The use of computer vision for this purpose has been well researched as it provides freedom and mobility to the user and allows them to interact at a distance. Interaction that relies on monocular computer vision, however, has not been well researched, particularly when used for depth information recovery. This thesis aims to investigate the feasibility of using monocular computer vision to allow bare-hand interaction with large display systems from a distance. By taking into account the location of the user and the interaction area available, a dynamic virtual touchscreen can be estimated between the display and the user. In the process, theories and techniques that make interaction with computer display as easy as pointing to real world objects is explored. Studies were conducted to investigate the way human point at objects naturally with their hand and to examine the inadequacy in existing pointing systems. Models that underpin the pointing strategy used in many of the previous interactive systems were formalized. A proof-of-concept prototype is built and evaluated from various user studies. Results from this thesis suggested that it is possible to allow natural user interaction with large displays using low-cost monocular computer vision. Furthermore, models developed and lessons learnt in this research can assist designers to develop more accurate and natural interactive systems that make use of human’s natural pointing behaviours

    Urban Environment Navigation with Real-Time Data Utilizing Computer Vision, Inertial, and GPS Sensors

    Get PDF
    The purpose of this research was to obtain a navigation solution that used real data, in a degraded or denied global positioning system (GPS) environment, from low cost commercial o the shelf sensors. The sensors that were integrated together were a commercial inertial measurement unit (IMU), monocular camera computer vision algorithm, and GPS. Furthermore, the monocular camera computer vision algorithm had to be robust enough to handle any camera orientation that was presented to it. This research develops a visual odometry 2-D zero velocity measurement that is derived by both the features points that are extracted from a monocular camera and the rotation values given by an IMU. By presenting measurements as a 2-D zero velocity measurements, errors associated with scale, which is unobservable by a monocular camera, can be removed from the measurements. The 2-D zero velocity measurements are represented as two normalized velocity vectors that are orthogonal to the vehicle\u27s direction of travel, and are used to determine the error in the INS\u27s measured velocity vector. This error is produced by knowing which directions the vehicle is not moving, given by the 2-D zero velocity measurements, in and comparing it to the direction of travel the vehicle is thought to be moving in. The performance was evaluated by comparing results that were obtained when different sensor pairings of a commercial IMU, GPS, and monocular computer vision algorithm were used to obtain the vehicle\u27s trajectory. Three separate monocular cameras, that each pointed in a different directions, were tested independently. Finally, the solutions provided by the GPS were degraded (i.e., the number of satellites available from the GPS were limited) to determine the e effectiveness of adding a monocular computer vision algorithm to a system operating with a degraded GPS solution

    Applications in Monocular Computer Vision using Geometry and Learning : Map Merging, 3D Reconstruction and Detection of Geometric Primitives

    Get PDF
    As the dream of autonomous vehicles moving around in our world comes closer, the problem of robust localization and mapping is essential to solve. In this inherently structured and geometric problem we also want the agents to learn from experience in a data driven fashion. How the modern Neural Network models can be combined with Structure from Motion (SfM) is an interesting research question and this thesis studies some related problems in 3D reconstruction, feature detection, SfM and map merging.In Paper I we study how a Bayesian Neural Network (BNN) performs in Semantic Scene Completion, where the task is to predict a semantic 3D voxel grid for the Field of View of a single RGBD image. We propose an extended task and evaluate the benefits of the BNN when encountering new classes at inference time. It is shown that the BNN outperforms the deterministic baseline.Papers II-­III are about detection of points, lines and planes defining a Room Layout in an RGB image. Due to the repeated textures and homogeneous colours of indoor surfaces it is not ideal to only use point features for Structure from Motion. The idea is to complement the point features by detecting a Wireframe – a connected set of line segments – which marks the intersection of planes in the Room Layout. Paper II concerns a task for detecting a Semantic Room Wireframe and implements a Neural Network model utilizing a Graph Convolutional Network module. The experiments show that the method is more flexible than previous Room Layout Estimation methods and perform better than previous Wireframe Parsing methods. Paper III takes the task closer to Room Layout Estimation by detecting a connected set of semantic polygons in an RGB image. The end­-to-­end trainable model is a combination of a Wireframe Parsing model and a Heterogeneous Graph Neural Network. We show promising results by outperforming state of the art models for Room Layout Estimation using synthetic Wireframe detections. However, the joint Wireframe and Polygon detector requires further research to compete with the state of the art models.In Paper IV we propose minimal solvers for SfM with parallel cylinders. The problem may be reduced to estimating circles in 2D and the paper contributes with theory for the two­view relative motion and two­-circle relative structure problem. Fast solvers are derived and experiments show good performance in both simulation and on real data.Papers V-­VII cover the task of map merging. That is, given a set of individually optimized point clouds with camera poses from a SfM pipeline, how can the solutions be effectively merged without completely re­solving the Structure from Motion problem? Papers V­-VI introduce an effective method for merging and shows the effectiveness through experiments of real and simulated data. Paper VII considers the matching problem for point clouds and proposes minimal solvers that allows for deformation ofeach point cloud. Experiments show that the method robustly matches point clouds with drift in the SfM solution

    Vehicle detection and tracking using homography-based plane rectification and particle filtering

    Get PDF
    This paper presents a full system for vehicle detection and tracking in non-stationary settings based on computer vision. The method proposed for vehicle detection exploits the geometrical relations between the elements in the scene so that moving objects (i.e., vehicles) can be detected by analyzing motion parallax. Namely, the homography of the road plane between successive images is computed. Most remarkably, a novel probabilistic framework based on Kalman filtering is presented for reliable and accurate homography estimation. The estimated homography is used for image alignment, which in turn allows to detect the moving vehicles in the image. Tracking of vehicles is performed on the basis of a multidimensional particle filter, which also manages the exit and entries of objects. The filter involves a mixture likelihood model that allows a better adaptation of the particles to the observed measurements. The system is specially designed for highway environments, where it has been proven to yield excellent results

    Image-Based Bed Material Mapping of a Large River

    Get PDF
    The composition or bed material plays a crucial role in the physical hydromorphological processes of fluvial systems. However, conventional bed material sampling methods provide only pointwise information, which can be inadequate when investigating large rivers of inhomogeneous bed material characteristics. In this study, novel, image-based approaches are implemented to gain areal information of the bed surface composition using two different techniques: monocular and stereo computer vision. Using underwater videos, captured in shorter reaches of the Hungarian Danube River, a comparison of the bed material grain size distributions from conventional physical samplings and the ones reconstructed from the images is carried out. Moreover, an attempt is made to quantify bed surface roughness, using the so-called Structure from Motion image analysis method. Practical aspects of the applicability of image-based bed material mapping are discussed and future improvements towards an automatized mapping methodology are outlined

    Sparse Encoding of Binocular Images for Depth Inference

    Get PDF
    Sparse coding models have been widely used to decompose monocular images into linear combinations of small numbers of basis vectors drawn from an overcomplete set. However, little work has examined sparse coding in the context of stereopsis. In this paper, we demonstrate that sparse coding facilitates better depth inference with sparse activations than comparable feed-forward networks of the same size. This is likely due to the noise and redundancy of feed-forward activations, whereas sparse coding utilizes lateral competition to selectively encode image features within a narrow band of depths

    Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methods

    Full text link
    To help smart wearable researchers choose the optimal ground truth methods for motion capturing (MoCap) for all types of loose garments, we present a benchmark, DrapeMoCapBench (DMCB), specifically designed to evaluate the performance of optical marker-based and marker-less MoCap. High-cost marker-based MoCap systems are well-known as precise golden standards. However, a less well-known caveat is that they require skin-tight fitting markers on bony areas to ensure the specified precision, making them questionable for loose garments. On the other hand, marker-less MoCap methods powered by computer vision models have matured over the years, which have meager costs as smartphone cameras would suffice. To this end, DMCB uses large real-world recorded MoCap datasets to perform parallel 3D physics simulations with a wide range of diversities: six levels of drape from skin-tight to extremely draped garments, three levels of motions and six body type - gender combinations to benchmark state-of-the-art optical marker-based and marker-less MoCap methods to identify the best-performing method in different scenarios. In assessing the performance of marker-based and low-cost marker-less MoCap for casual loose garments both approaches exhibit significant performance loss (>10cm), but for everyday activities involving basic and fast motions, marker-less MoCap slightly outperforms marker-based MoCap, making it a favorable and cost-effective choice for wearable studies
    corecore