1,159 research outputs found

    Foveated Video Streaming for Cloud Gaming

    Full text link
    Good user experience with interactive cloud-based multimedia applications, such as cloud gaming and cloud-based VR, requires low end-to-end latency and large amounts of downstream network bandwidth at the same time. In this paper, we present a foveated video streaming system for cloud gaming. The system adapts video stream quality by adjusting the encoding parameters on the fly to match the player's gaze position. We conduct measurements with a prototype that we developed for a cloud gaming system in conjunction with eye tracker hardware. Evaluation results suggest that such foveated streaming can reduce bandwidth requirements by even more than 50% depending on parametrization of the foveated video coding and that it is feasible from the latency perspective.Comment: Submitted to: IEEE 19th International Workshop on Multimedia Signal Processin

    Perceptually optimised sign language video coding

    Get PDF

    A video coding system for sign language communication at low bit rates

    Get PDF

    Adaptive foveated single-pixel imaging with dynamic super-sampling

    Get PDF
    As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typically exhibit low frame-rates. To mitigate this, a range of compressive sensing techniques have been developed which rely on a priori knowledge of the scene to reconstruct images from an under-sampled set of measurements. In this work we take a different approach and adopt a strategy inspired by the foveated vision systems found in the animal kingdom - a framework that exploits the spatio-temporal redundancy present in many dynamic scenes. In our single-pixel imaging system a high-resolution foveal region follows motion within the scene, but unlike a simple zoom, every frame delivers new spatial information from across the entire field-of-view. Using this approach we demonstrate a four-fold reduction in the time taken to record the detail of rapidly evolving features, whilst simultaneously accumulating detail of more slowly evolving regions over several consecutive frames. This tiered super-sampling technique enables the reconstruction of video streams in which both the resolution and the effective exposure-time spatially vary and adapt dynamically in response to the evolution of the scene. The methods described here can complement existing compressive sensing approaches and may be applied to enhance a variety of computational imagers that rely on sequential correlation measurements.Comment: 13 pages, 5 figure

    A Biologically Motivated Software Retina for Robotic Sensors for ARM-Based Mobile Platform Technology

    Get PDF
    A key issue in designing robotics systems is the cost of an integrated camera sensor that meets the bandwidth/processing requirement for many advanced robotics applications, especially lightweight robotics applications, such as visual surveillance or SLAM in autonomous aerial vehicles. There is currently much work going on to adapt smartphones to provide complete robot vision systems, as the smartphone is so exquisitely integrated by having camera(s), inertial sensing, sound I/O and excellent wireless connectivity. Mass market production makes this a very low-cost platform and manufacturers from quadrotor drone suppliers to children’s toys, such as the Meccanoid robot [5], employ a smartphone to provide a vision system/control system [7,8]. Accordingly, many research groups are attempting to optimise image analysis, computer vision and machine learning libraries for the smartphone platform. However current approaches to robot vision remain highly demanding for mobile processors such as the ARM, and while a number of algorithms have been developed, these are very stripped down, i.e. highly compromised in function or performance. For example, the semi-dense visual odometry implementation of [1] operates on images of only 320x240pixels. In our research we have been developing biologically motivated foveated vision algorithms based on a model of the mammalian retina [2], potentially 100 times more efficient than their conventional counterparts. Accordingly, vision systems based on the foveated architectures found in mammals have also the potential to reduce bandwidth and processing requirements by about x100 - it has been estimated that our brains would weigh ~60Kg if we were to process all our visual input at uniform high resolution. We have reported a foveated visual architecture [2,3,4] that implements a functional model of the retina-visual cortex to produce feature vectors that can be matched/classified using conventional methods, or indeed could be adapted to employ Deep Convolutional Neural Nets for the classification/interpretation stage. Given the above processing/bandwidth limitations, a viable way forward would be to perform off-line learning and implement the forward recognition path on the mobile platform, returning simple object labels, or sparse hierarchical feature symbols, and gaze control commands to the host robot vision system and controller. We are now at the early stages of investigating how best to port our foveated architecture onto an ARM-based smartphone platform. To achieve the required levels of performance we propose to port and optimise our retina model to the mobile ARM processor architecture in conjunction with their integrated GPUs. We will then be in the position to provide a foveated smart vision system on a smartphone with the advantage of processing speed gains and bandwidth optimisations. Our approach will be to develop efficient parallelising compilers and perhaps propose new processor architectural features to support this approach to computer vision, e.g. efficient processing of hexagonally sampled foveated images. Our current goal is to have a foveated system running in real-time on at least a 1080p input video stream to serve as a front-end robot sensor for tasks such as general purpose object recognition and reliable dense SLAM using a commercial off-the-shelf smartphone. Initially this system would communicate a symbol stream to conventional hardware performing back-end visual classification/interpretation, although simple object detection and recognition tasks should be possible on-board the device. We propose that, as in Nature, foveated vision is the key to achieving the necessary data reduction to be able to implement complete visual recognition and learning processes on the smartphone itself

    A Software Retina for Egocentric & Robotic Vision Applications on Mobile Platforms

    Get PDF
    We present work in progress to develop a low-cost highly integrated camera sensor for egocentric and robotic vision. Our underlying approach is to address current limitations to image analysis by Deep Convolutional Neural Networks, such as the requirement to learn simple scale and rotation transformations, which contribute to the large computational demands for training and opaqueness of the learned structure, by applying structural constraints based on known properties of the human visual system. We propose to apply a version of the retino-cortical transform to reduce the dimensionality of the input image space by a factor of ex100, and map this spatially to transform rotations and scale changes into spatial shifts. By reducing the input image size accordingly, and therefore learning requirements, we aim to develop compact and lightweight egocentric and robot vision sensor using a smartphone as the target platfor

    Integrating a Non-Uniformly Sampled Software Retina with a Deep CNN Model

    Get PDF
    We present a biologically inspired method for pre-processing images applied to CNNs that reduces their memory requirements while increasing their invariance to scale and rotation changes. Our method is based on the mammalian retino-cortical transform: a mapping between a pseudo-randomly tessellated retina model (used to sample an input image) and a CNN. The aim of this first pilot study is to demonstrate a functional retinaintegrated CNN implementation and this produced the following results: a network using the full retino-cortical transform yielded an F1 score of 0.80 on a test set during a 4-way classification task, while an identical network not using the proposed method yielded an F1 score of 0.86 on the same task. The method reduced the visual data by eĂ—7, the input data to the CNN by 40% and the number of CNN training epochs by 64%. These results demonstrate the viability of our method and hint at the potential of exploiting functional traits of natural vision systems in CNNs
    • …
    corecore