437 research outputs found

    Adaptive Vision Based Scene Registration for Outdoor Augmented Reality

    Get PDF
    Augmented Reality (AR) involves adding virtual content into real scenes. Scenes are viewed using a Head-Mounted Display or other display type. In order to place content into the user's view of a scene, the user's position and orientation relative to the scene, commonly referred to as their pose, must be determined accurately. This allows the objects to be placed in the correct positions and to remain there when the user moves or the scene changes. It is achieved by tracking the user in relation to their environment using a variety of technology. One technology which has proven to provide accurate results is computer vision. Computer vision involves a computer analysing images and achieving an understanding of them. This may be locating objects such as faces in the images, or in the case of AR, determining the pose of the user. One of the ultimate goals of AR systems is to be capable of operating under any condition. For example, a computer vision system must be robust under a range of different scene types, and under unpredictable environmental conditions due to variable illumination and weather. The majority of existing literature tests algorithms under the assumption of ideal or 'normal' imaging conditions. To ensure robustness under as many circumstances as possible it is also important to evaluate the systems under adverse conditions. This thesis seeks to analyse the effects that variable illumination has on computer vision algorithms. To enable this analysis, test data is required to isolate weather and illumination effects, without other factors such as changes in viewpoint that would bias the results. A new dataset is presented which also allows controlled viewpoint differences in the presence of weather and illumination changes. This is achieved by capturing video from a camera undergoing a repeatable motion sequence. Ground truth data is stored per frame allowing images from the same position under differing environmental conditions, to be easily extracted from the videos. An in depth analysis of six detection algorithms and five matching techniques demonstrates the impact that non-uniform illumination changes can have on vision algorithms. Specifically, shadows can degrade performance and reduce confidence in the system, decrease reliability, or even completely prevent successful operation. An investigation into approaches to improve performance yields techniques that can help reduce the impact of shadows. A novel algorithm is presented that merges reference data captured at different times, resulting in reference data with minimal shadow effects. This can significantly improve performance and reliability when operating on images containing shadow effects. These advances improve the robustness of computer vision systems and extend the range of conditions in which they can operate. This can increase the usefulness of the algorithms and the AR systems that employ them

    Enriching remote labs with computer vision and drones

    Get PDF
    165 p.With the technological advance, new learning technologies are being developed in order to contribute to better learning experience. In particular, remote labs constitute an interesting and a practical way that can motivate nowadays students to learn. The studen can at anytime, and from anywhere, access the remote lab and do his lab-work. Despite many advantages, remote tecnologies in education create a distance between the student and the teacher. Without the presence of a teacher, students can have difficulties, if no appropriate interventions can be taken to help them. In this thesis, we aim to enrich an existing remote electronic lab made for engineering students called "LaboREM" (for remote Laboratory) in two ways: first we enable the student to send high level commands to a mini-drone available in the remote lab facility. The objective is to examine the front panels of electronic measurement instruments, by the camera embedded on the drone. Furthermore, we allow remote student-teacher communication using the drone, in case there is a teacher present in the remote lab facility. Finally, the drone has to go back home when the mission is over to land on a platform for automatic recharge of the batteries. Second, we propose an automatic system that estimates the affective state of the student (frustrated/confused/flow) in order to take appropriate interventions to ensure good learning outcomes. For example, if the studen is having major difficulties we can try to give him hints or to reduce the difficulty level of the lab experiment. We propose to do this by using visual cues (head pose estimation and facil expression analysis). Many evidences on the state of the student can be acquired, however these evidences are incomplete, sometims inaccurate, and do not cover all the aspects of the state of the student alone. This is why we propose to fuse evidences using the theory of Dempster-Shafer that allows the fusion of incomplete evidence

    Scene representation and matching for visual localization in hybrid camera scenarios

    Get PDF
    Scene representation and matching are crucial steps in a variety of tasks ranging from 3D reconstruction to virtual/augmented/mixed reality applications, to robotics, and others. While approaches exist that tackle these tasks, they mostly overlook the issue of efficiency in the scene representation, which is fundamental in resource-constrained systems and for increasing computing speed. Also, they normally assume the use of projective cameras, while performance on systems based on other camera geometries remains suboptimal. This dissertation contributes with a new efficient scene representation method that dramatically reduces the number of 3D points. The approach sets up an optimization problem for the automated selection of the most relevant points to retain. This leads to a constrained quadratic program, which is solved optimally with a newly introduced variant of the sequential minimal optimization method. In addition, a new initialization approach is introduced for the fast convergence of the method. Extensive experimentation on public benchmark datasets demonstrates that the approach produces a compressed scene representation quickly while delivering accurate pose estimates. The dissertation also contributes with new methods for scene matching that go beyond the use of projective cameras. Alternative camera geometries, like fisheye cameras, produce images with very high distortion, making current image feature point detectors and descriptors less efficient, since designed for projective cameras. New methods based on deep learning are introduced to address this problem, where feature detectors and descriptors can overcome distortion effects and more effectively perform feature matching between pairs of fisheye images, and also between hybrid pairs of fisheye and perspective images. Due to the limited availability of fisheye-perspective image datasets, three datasets were collected for training and testing the methods. The results demonstrate an increase of the detection and matching rates which outperform the current state-of-the-art methods

    A ROBUST RGB-D SLAM SYSTEM FOR 3D ENVIRONMENT WITH PLANAR SURFACES

    Get PDF
    Simultaneous localization and mapping is the technique to construct a 3D map of unknown environment. With the increasing popularity of RGB-depth (RGB-D) sensors such as the Microsoft Kinect, there have been much research on capturing and reconstructing 3D environments using a movable RGB-D sensor. The key process behind these kinds of simultaneous location and mapping (SLAM) systems is the iterative closest point or ICP algorithm, which is an iterative algorithm that can estimate the rigid movement of the camera based on the captured 3D point clouds. While ICP is a well-studied algorithm, it is problematic when it is used in scanning large planar regions such as wall surfaces in a room. The lack of depth variations on planar surfaces makes the global alignment an ill-conditioned problem. In this thesis, we present a novel approach for registering 3D point clouds by combining both color and depth information. Instead of directly searching for point correspondences among 3D data, the proposed method first extracts features from the RGB images, and then back-projects the features to the 3D space to identify more reliable correspondences. These color correspondences form the initial input to the ICP procedure which then proceeds to refine the alignment. Experimental results show that our proposed approach can achieve better accuracy than existing SLAMs in reconstructing indoor environments with large planar surfaces

    Augmented reality applications for cultural heritage using Kinect

    Get PDF
    AbstractThis paper explores the use of data from the Kinect sensor for performing augmented reality, with emphasis on cultural heritage applications. It is shown that the combination of depth and image correspondences from the Kinect can yield a reliable estimate of the location and pose of the camera, though noise from the depth sensor introduces an unpleasant jittering of the rendered view. Kalman filtering of the camera position was found to yield a much more stable view. Results show that the system is accurate enough for in situ augmented reality applications. Skeleton tracking using Kinect data allows the appearance of participants to be augmented, and together these facilitate the development of cultural heritage applications.</jats:p

    Enriching remote labs with computer vision and drones

    Get PDF
    165 p.With the technological advance, new learning technologies are being developed in order to contribute to better learning experience. In particular, remote labs constitute an interesting and a practical way that can motivate nowadays students to learn. The studen can at anytime, and from anywhere, access the remote lab and do his lab-work. Despite many advantages, remote tecnologies in education create a distance between the student and the teacher. Without the presence of a teacher, students can have difficulties, if no appropriate interventions can be taken to help them. In this thesis, we aim to enrich an existing remote electronic lab made for engineering students called "LaboREM" (for remote Laboratory) in two ways: first we enable the student to send high level commands to a mini-drone available in the remote lab facility. The objective is to examine the front panels of electronic measurement instruments, by the camera embedded on the drone. Furthermore, we allow remote student-teacher communication using the drone, in case there is a teacher present in the remote lab facility. Finally, the drone has to go back home when the mission is over to land on a platform for automatic recharge of the batteries. Second, we propose an automatic system that estimates the affective state of the student (frustrated/confused/flow) in order to take appropriate interventions to ensure good learning outcomes. For example, if the studen is having major difficulties we can try to give him hints or to reduce the difficulty level of the lab experiment. We propose to do this by using visual cues (head pose estimation and facil expression analysis). Many evidences on the state of the student can be acquired, however these evidences are incomplete, sometims inaccurate, and do not cover all the aspects of the state of the student alone. This is why we propose to fuse evidences using the theory of Dempster-Shafer that allows the fusion of incomplete evidence

    Computational Multimedia for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

    Egocentric Perception of Hands and Its Applications

    Get PDF
    corecore