276 research outputs found

    Random access prediction structures for light field video coding with MV-HEVC

    Get PDF
    Computational imaging and light field technology promise to deliver the required six-degrees-of-freedom for natural scenes in virtual reality. Already existing extensions of standardized video coding formats, such as multi-view coding and multi-view plus depth, are the most conventional light field video coding solutions at the moment. The latest multi-view coding format, which is a direct extension of the high efficiency video coding (HEVC) standard, is called multi-view HEVC (or MV-HEVC). MV-HEVC treats each light field view as a separate video sequence, and uses syntax elements similar to standard HEVC for exploiting redundancies between neighboring views. To achieve this, inter-view and temporal prediction schemes are deployed with the aim to find the most optimal trade-off between coding performance and reconstruction quality. The number of possible prediction structures is unlimited and many of them are proposed in the literature. Although some of them are efficient in terms of compression ratio, they complicate random access due to the dependencies on previously decoded pixels or frames. Random access is an important feature in video delivery, and a crucial requirement in multi-view video coding. In this work, we propose and compare different prediction structures for coding light field video using MV-HEVC with a focus on both compression efficiency and random accessibility. Experiments on three different short-baseline light field video sequences show the trade-off between bit-rate and distortion, as well as the average number of decoded views/frames, necessary for displaying any random frame at any time instance. The findings of this work indicate the most appropriate prediction structure depending on the available bandwidth and the required degree of random access

    Stereoscopic bimanual interaction for 3D visualization

    Get PDF
    Virtual Environments (VE) are being widely used in various research fields for several decades such as 3D visualization, education, training and games. VEs have the potential to enhance the visualization and act as a general medium for human-computer interaction (HCI). However, limited research has evaluated virtual reality (VR) display technologies, monocular and binocular depth cues, for human depth perception of volumetric (non-polygonal) datasets. In addition, a lack of standardization of three-dimensional (3D) user interfaces (UI) makes it challenging to interact with many VE systems. To address these issues, this dissertation focuses on evaluation of effects of stereoscopic and head-coupled displays on depth judgment of volumetric dataset. It also focuses on evaluation of a two-handed view manipulation techniques which support simultaneous 7 degree-of-freedom (DOF) navigation (x,y,z + yaw,pitch,roll + scale) in a multi-scale virtual environment (MSVE). Furthermore, this dissertation evaluates auto-adjustment of stereo view parameters techniques for stereoscopic fusion problems in a MSVE. Next, this dissertation presents a bimanual, hybrid user interface which combines traditional tracking devices with computer-vision based "natural" 3D inputs for multi-dimensional visualization in a semi-immersive desktop VR system. In conclusion, this dissertation provides a guideline for research design for evaluating UI and interaction techniques

    Microdrone-Based Indoor Mapping with Graph SLAM

    Get PDF
    Unmanned aerial vehicles offer a safe and fast approach to the production of three-dimensional spatial data on the surrounding space. In this article, we present a low-cost SLAM-based drone for creating exploration maps of building interiors. The focus is on emergency response mapping in inaccessible or potentially dangerous places. For this purpose, we used a quadcopter microdrone equipped with six laser rangefinders (1D scanners) and an optical sensor for mapping and positioning. The employed SLAM is designed to map indoor spaces with planar structures through graph optimization. It performs loop-closure detection and correction to recognize previously visited places, and to correct the accumulated drift over time. The proposed methodology was validated for several indoor environments. We investigated the performance of our drone against a multilayer LiDAR-carrying macrodrone, a vision-aided navigation helmet, and ground truth obtained with a terrestrial laser scanner. The experimental results indicate that our SLAM system is capable of creating quality exploration maps of small indoor spaces, and handling the loop-closure problem. The accumulated drift without loop closure was on average 1.1% (0.35 m) over a 31-m-long acquisition trajectory. Moreover, the comparison results demonstrated that our flying microdrone provided a comparable performance to the multilayer LiDAR-based macrodrone, given the low deviation between the point clouds built by both drones. Approximately 85 % of the cloud-to-cloud distances were less than 10 cm

    Realistic Haptics Interaction in Complex Virtual Environments

    Get PDF

    A Robust Approach for Monocular Visual Odometry in Underwater Environments

    Get PDF
    This work presents a visual odometric system for camera tracking in underwater scenarios of the seafloor which are strongly perturbed with sunlight caustics and cloudy water. Particularly, we focuse on the performance and robustnes of the system, which structurally associates a deflickering filter with a visual tracker. Two state-of-the-art trackers are employed for our study, one pixel-oriented and the other feature-based. The contrivances of the trackers were crumbled and their suitability for underwater environments analyzed comparatively. To this end real subaquatic footages in perturbed environments were employed.Sociedad Argentina de Informática e Investigación Operativ

    GRID: Scene-Graph-based Instruction-driven Robotic Task Planning

    Full text link
    Recent works have shown that Large Language Models (LLMs) can promote grounding instructions to robotic task planning. Despite the progress, most existing works focused on utilizing raw images to help LLMs understand environmental information, which not only limits the observation scope but also typically requires massive multimodal data collection and large-scale models. In this paper, we propose a novel approach called Graph-based Robotic Instruction Decomposer (GRID), leverages scene graph instead of image to perceive global scene information and continuously plans subtask in each stage for a given instruction. Our method encodes object attributes and relationships in graphs through an LLM and Graph Attention Networks, integrating instruction features to predict subtasks consisting of pre-defined robot actions and target objects in the scene graph. This strategy enables robots to acquire semantic knowledge widely observed in the environment from the scene graph. To train and evaluate GRID, we build a dataset construction pipeline to generate synthetic datasets in graph-based robotic task planning. Experiments have shown that our method outperforms GPT-4 by over 25.4% in subtask accuracy and 43.6% in task accuracy. Experiments conducted on datasets of unseen scenes and scenes with different numbers of objects showed that the task accuracy of GRID declined by at most 3.8%, which demonstrates its good cross-scene generalization ability. We validate our method in both physical simulation and the real world
    • …
    corecore