1,578 research outputs found

    Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios

    Full text link
    Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high speed motions or in scenes characterized by high dynamic range. However, event cameras output only little information when the amount of motion is limited, such as in the case of almost still motion. Conversely, standard cameras provide instant and rich information about the environment most of the time (in low-speed and good lighting scenarios), but they fail severely in case of fast motions, or difficult lighting such as high dynamic range or low light scenes. In this paper, we present the first state estimation pipeline that leverages the complementary advantages of these two sensors by fusing in a tightly-coupled manner events, standard frames, and inertial measurements. We show on the publicly available Event Camera Dataset that our hybrid pipeline leads to an accuracy improvement of 130% over event-only pipelines, and 85% over standard-frames-only visual-inertial systems, while still being computationally tractable. Furthermore, we use our pipeline to demonstrate - to the best of our knowledge - the first autonomous quadrotor flight using an event camera for state estimation, unlocking flight scenarios that were not reachable with traditional visual-inertial odometry, such as low-light environments and high-dynamic range scenes.Comment: 8 pages, 9 figures, 2 table

    A Robust Head Tracking System Based on Monocular Vision and Planar Templates

    Get PDF
    This paper details the implementation of a head tracking system suitable for its use in teleoperation stations or control centers, taking into account the limitations and constraints usually associated to those environments. The paper discusses and justifies the selection of the different methods and sensors to build the head tracking system, detailing also the processing steps of the system in operation. A prototype to validate the proposed approach is also presented along with several tests in a real environment with promising results

    Low-latency Cloud-based Volumetric Video Streaming Using Head Motion Prediction

    Full text link
    Volumetric video is an emerging key technology for immersive representation of 3D spaces and objects. Rendering volumetric video requires lots of computational power which is challenging especially for mobile devices. To mitigate this, we developed a streaming system that renders a 2D view from the volumetric video at a cloud server and streams a 2D video stream to the client. However, such network-based processing increases the motion-to-photon (M2P) latency due to the additional network and processing delays. In order to compensate the added latency, prediction of the future user pose is necessary. We developed a head motion prediction model and investigated its potential to reduce the M2P latency for different look-ahead times. Our results show that the presented model reduces the rendering errors caused by the M2P latency compared to a baseline system in which no prediction is performed.Comment: 7 pages, 4 figure

    Autonomous Sweet Pepper Harvesting for Protected Cropping Systems

    Full text link
    In this letter, we present a new robotic harvester (Harvey) that can autonomously harvest sweet pepper in protected cropping environments. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. Initial field trials in protected cropping environments, with two cultivar, demonstrate the efficacy of this approach achieving a 46% success rate for unmodified crop, and 58% for modified crop. Furthermore, for the more favourable cultivar we were also able to detach 90% of sweet peppers, indicating that improvements in the grasping success rate would result in greatly improved harvesting performance

    SILVR: A Synthetic Immersive Large-Volume Plenoptic Dataset

    Full text link
    In six-degrees-of-freedom light-field (LF) experiences, the viewer's freedom is limited by the extent to which the plenoptic function was sampled. Existing LF datasets represent only small portions of the plenoptic function, such that they either cover a small volume, or they have limited field of view. Therefore, we propose a new LF image dataset "SILVR" that allows for six-degrees-of-freedom navigation in much larger volumes while maintaining full panoramic field of view. We rendered three different virtual scenes in various configurations, where the number of views ranges from 642 to 2226. One of these scenes (called Zen Garden) is a novel scene, and is made publicly available. We chose to position the virtual cameras closely together in large cuboid and spherical organisations (2.2m32.2m^3 to 48m348m^3), equipped with 180{\deg} fish-eye lenses. Every view is rendered to a color image and depth map of 2048px ×\times 2048px. Additionally, we present the software used to automate the multi-view rendering process, as well as a lens-reprojection tool that converts between images with panoramic or fish-eye projection to a standard rectilinear (i.e., perspective) projection. Finally, we demonstrate how the proposed dataset and software can be used to evaluate LF coding/rendering techniques(in this case for training NeRFs with instant-ngp). As such, we provide the first publicly-available LF dataset for large volumes of light with full panoramic field of viewComment: In 13th ACM Multimedia Systems Conference (MMSys '22), June 14-17, 2022, Athlone, Ireland. ACM, New York, NY, USA, 6 page

    Realnav: Exploring Natural User Interfaces For Locomotion In Video Games

    Get PDF
    We present an exploration into realistic locomotion interfaces in video games using spatially convenient input hardware. In particular, we use Nintendo Wii Remotes to create natural mappings between user actions and their representation in a video game. Targeting American Football video games, we used the role of the quarterback as an exemplar since the game player needs to maneuver effectively in a small area, run down the field, and perform evasive gestures such as spinning, jumping, or the juke . In our study, we developed three locomotion techniques. The first technique used a single Wii Remote, placed anywhere on the user\u27s body, using only the acceleration data. The second technique just used the Wii Remote\u27s infrared sensor and had to be placed on the user\u27s head. The third technique combined a Wii Remote\u27s acceleration and infrared data using a Kalman filter. The Wii Motion Plus was also integrated to add the orientation of the user into the video game. To evaluate the different techniques, we compared them with a cost effective six degree of freedom (6DOF) optical tracker and two Wii Remotes placed on the user\u27s feet. Experiments were performed comparing each to this technique. Finally, a user study was performed to determine if a preference existed among these techniques. The results showed that the second and third technique had the same location accuracy as the cost effective 6DOF tracker, but the first was too inaccurate for video game players. Furthermore, the range of the Wii remote infrared and Motion Plus exceeded the optical tracker of the comparison technique. Finally, the user study showed that video game players preferred the third method over the second, but were split on the use of the Motion Plus when the tasks did not require it

    DeepFactors: Real-time probabilistic dense monocular SLAM

    Get PDF
    The ability to estimate rich geometry and camera motion from monocular imagery is fundamental to future interactive robotics and augmented reality applications. Different approaches have been proposed that vary in scene geometry representation (sparse landmarks, dense maps), the consistency metric used for optimising the multi-view problem, and the use of learned priors. We present a SLAM system that unifies these methods in a probabilistic framework while still maintaining real-time performance. This is achieved through the use of a learned compact depth map representation and reformulating three different types of errors: photometric, reprojection and geometric, which we make use of within standard factor graph software. We evaluate our system on trajectory estimation and depth reconstruction on real-world sequences and present various examples of estimated dense geometry
    corecore