32,039 research outputs found

    InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure

    Full text link
    Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can be achieved with GPU implementations of these systems. However, this representation is memory-intensive and limits applicability to small-scale reconstructions. Several avenues have been explored to overcome this. With the aim of summarizing them and providing for a fast, flexible 3D reconstruction pipeline, we propose a new, unifying framework called InfiniTAM. The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user's needs. This report describes the technical implementation details of InfiniTAM v3, the third version of our InfiniTAM system. We have added various new features, as well as making numerous enhancements to the low-level code that significantly improve our camera tracking performance. The new features that we expect to be of most interest are (i) a robust camera tracking module; (ii) an implementation of Glocker et al.'s keyframe-based random ferns camera relocaliser; (iii) a novel approach to globally-consistent TSDF-based reconstruction, based on dividing the scene into rigid submaps and optimising the relative poses between them; and (iv) an implementation of Keller et al.'s surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version 3 of the InfiniTAM framework

    A perceptual comparison of empirical and predictive region-of-interest video

    Get PDF
    When viewing multimedia presentations, a user only attends to a relatively small part of the video display at any one point in time. By shifting allocation of bandwidth from peripheral areas to those locations where a user’s gaze is more likely to rest, attentive displays can be produced. Attentive displays aim to reduce resource requirements while minimizing negative user perception—understood in this paper as not only a user’s ability to assimilate and understand information but also his/her subjective satisfaction with the video content. This paper introduces and discusses a perceptual comparison between two region-of-interest display (RoID) adaptation techniques. A RoID is an attentive display where bandwidth has been preallocated around measured or highly probable areas of user gaze. In this paper, video content was manipulated using two sources of data: empirical measured data (captured using eye-tracking technology) and predictive data (calculated from the physical characteristics of the video data). Results show that display adaptation causes significant variation in users’ understanding of specific multimedia content. Interestingly, RoID adaptation and the type of video being presented both affect user perception of video quality. Moreover, the use of frame rates less than 15 frames per second, for any video adaptation technique, caused a significant reduction in user perceived quality, suggesting that although users are aware of video quality reduction, it does impact level of information assimilation and understanding. Results also highlight that user level of enjoyment is significantly affected by the type of video yet is not as affected by the quality or type of video adaptation—an interesting implication in the field of entertainment

    A Novel Rate Control Algorithm for Onboard Predictive Coding of Multispectral and Hyperspectral Images

    Get PDF
    Predictive coding is attractive for compression onboard of spacecrafts thanks to its low computational complexity, modest memory requirements and the ability to accurately control quality on a pixel-by-pixel basis. Traditionally, predictive compression focused on the lossless and near-lossless modes of operation where the maximum error can be bounded but the rate of the compressed image is variable. Rate control is considered a challenging problem for predictive encoders due to the dependencies between quantization and prediction in the feedback loop, and the lack of a signal representation that packs the signal's energy into few coefficients. In this paper, we show that it is possible to design a rate control scheme intended for onboard implementation. In particular, we propose a general framework to select quantizers in each spatial and spectral region of an image so as to achieve the desired target rate while minimizing distortion. The rate control algorithm allows to achieve lossy, near-lossless compression, and any in-between type of compression, e.g., lossy compression with a near-lossless constraint. While this framework is independent of the specific predictor used, in order to show its performance, in this paper we tailor it to the predictor adopted by the CCSDS-123 lossless compression standard, obtaining an extension that allows to perform lossless, near-lossless and lossy compression in a single package. We show that the rate controller has excellent performance in terms of accuracy in the output rate, rate-distortion characteristics and is extremely competitive with respect to state-of-the-art transform coding

    Scalable software architecture for on-line multi-camera video processing

    Get PDF
    In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade oïŹ€ between computational power, scalability and ïŹ‚exibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to eïŹƒciently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under diïŹ€erent load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with diïŹ€erent image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhea
    • 

    corecore