1,158 research outputs found

    Evaluation of optimisation techniques for multiscopic rendering

    Get PDF
    A thesis submitted to the University of Bedfordshire in fulfilment of the requirements for the degree of Master of Science by ResearchThis project evaluates different performance optimisation techniques applied to stereoscopic and multiscopic rendering for interactive applications. The artefact features a robust plug-in package for the Unity game engine. The thesis provides background information for the performance optimisations, outlines all the findings, evaluates the optimisations and provides suggestions for future work. Scrum development methodology is used to develop the artefact and quantitative research methodology is used to evaluate the findings by measuring performance. This project concludes that the use of each performance optimisation has specific use case scenarios in which performance benefits. Foveated rendering provides greatest performance increase for both stereoscopic and multiscopic rendering but is also more computationally intensive as it requires an eye tracking solution. Dynamic resolution is very beneficial when overall frame rate smoothness is needed and frame drops are present. Depth optimisation is beneficial for vast open environments but can lead to decreased performance if used inappropriately

    Human saccadic eye movements and tracking by active foveation in log polar space

    Full text link
    One of the possible models of the human visual system (HVS) in the computer vision literature has a high resolution fovea and exponentially decreasing resolution periphery. The high resolution fovea is used to extract necessary information in order to solve a vision task and the periphery may be used to detect motion. To obtain the desired information, the fovea is guided by the contents of the scene and other knowledge to position the fovea over areas of interest. These eye movements are called saccades and corrective saccades. A two stage process has been implemented as a mechanism for changing foveation in log polar space. Initially, the open loop stage roughly foveates on the best interest feature and then the closed loop stage is invoked to accurately iteratively converge onto the foveation point. The open loop stage developed for the foveation algorithm is applied to saccadic eye movements and a tracking system. Log polar space is preferred over Cartesian space as: (1) it simultaneously provides high resolution and a wide viewing angle; and (2) feature invariance occurs in the fovea which simplifies the foveation process

    Object Detection Through Exploration With A Foveated Visual Field

    Get PDF
    We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.Comment: An extended version of this manuscript was published in PLOS Computational Biology (October 2017) at https://doi.org/10.1371/journal.pcbi.100574

    Foveated Video Streaming for Cloud Gaming

    Full text link
    Good user experience with interactive cloud-based multimedia applications, such as cloud gaming and cloud-based VR, requires low end-to-end latency and large amounts of downstream network bandwidth at the same time. In this paper, we present a foveated video streaming system for cloud gaming. The system adapts video stream quality by adjusting the encoding parameters on the fly to match the player's gaze position. We conduct measurements with a prototype that we developed for a cloud gaming system in conjunction with eye tracker hardware. Evaluation results suggest that such foveated streaming can reduce bandwidth requirements by even more than 50% depending on parametrization of the foveated video coding and that it is feasible from the latency perspective.Comment: Submitted to: IEEE 19th International Workshop on Multimedia Signal Processin

    Adaptive foveated single-pixel imaging with dynamic super-sampling

    Get PDF
    As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typically exhibit low frame-rates. To mitigate this, a range of compressive sensing techniques have been developed which rely on a priori knowledge of the scene to reconstruct images from an under-sampled set of measurements. In this work we take a different approach and adopt a strategy inspired by the foveated vision systems found in the animal kingdom - a framework that exploits the spatio-temporal redundancy present in many dynamic scenes. In our single-pixel imaging system a high-resolution foveal region follows motion within the scene, but unlike a simple zoom, every frame delivers new spatial information from across the entire field-of-view. Using this approach we demonstrate a four-fold reduction in the time taken to record the detail of rapidly evolving features, whilst simultaneously accumulating detail of more slowly evolving regions over several consecutive frames. This tiered super-sampling technique enables the reconstruction of video streams in which both the resolution and the effective exposure-time spatially vary and adapt dynamically in response to the evolution of the scene. The methods described here can complement existing compressive sensing approaches and may be applied to enhance a variety of computational imagers that rely on sequential correlation measurements.Comment: 13 pages, 5 figure

    Saccadic Predictive Vision Model with a Fovea

    Full text link
    We propose a model that emulates saccades, the rapid movements of the eye, called the Error Saccade Model, based on the prediction error of the Predictive Vision Model (PVM). The Error Saccade Model carries out movements of the model's field of view to regions with the highest prediction error. Comparisons of the Error Saccade Model on Predictive Vision Models with and without a fovea show that a fovea-like structure in the input level of the PVM improves the Error Saccade Model's ability to pursue detailed objects in its view. We hypothesize that the improvement is due to poorer resolution in the periphery causing higher prediction error when an object passes, triggering a saccade to the next location.Comment: 10 pages, 6 figure, Accepted in International Conference of Neuromorphic Computing (2018

    Human Attention in Image Captioning: Dataset and Analysis

    Get PDF
    In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment mechanisms in the top-down soft attention approach that is argued to mimic human attention in captioning tasks, and investigate whether visual saliency can help image captioning. Our study reveals that (1) human attention behaviour differs in free-viewing and image description tasks. Humans tend to fixate on a greater variety of regions under the latter task, (2) there is a strong relationship between described objects and attended objects (97%97\% of the described objects are being attended), (3) a convolutional neural network as feature encoder accounts for human-attended regions during image captioning to a great extent (around 78%78\%), (4) soft-attention mechanism differs from human attention, both spatially and temporally, and there is low correlation between caption scores and attention consistency scores. These indicate a large gap between humans and machines in regards to top-down attention, and (5) by integrating the soft attention model with image saliency, we can significantly improve the model's performance on Flickr30k and MSCOCO benchmarks. The dataset can be found at: https://github.com/SenHe/Human-Attention-in-Image-Captioning.Comment: To appear at ICCV 201
    • …
    corecore