Search CORE

1,158 research outputs found

Evaluation of optimisation techniques for multiscopic rendering

Author: Todorov Grigor
Publication venue: University of Bedfordshire
Publication date: 01/10/2015
Field of study

A thesis submitted to the University of Bedfordshire in fulfilment of the requirements for the degree of Master of Science by ResearchThis project evaluates different performance optimisation techniques applied to stereoscopic and multiscopic rendering for interactive applications. The artefact features a robust plug-in package for the Unity game engine. The thesis provides background information for the performance optimisations, outlines all the findings, evaluates the optimisations and provides suggestions for future work. Scrum development methodology is used to develop the artefact and quantitative research methodology is used to evaluate the findings by measuring performance. This project concludes that the use of each performance optimisation has specific use case scenarios in which performance benefits. Foveated rendering provides greatest performance increase for both stereoscopic and multiscopic rendering but is also more computationally intensive as it requires an eye tracking solution. Dynamic resolution is very beneficial when overall frame rate smoothness is needed and frame drops are present. Depth optimisation is beneficial for vast open environments but can lead to decreased performance if used inappropriately

University of Bedfordshire Repository

Human saccadic eye movements and tracking by active foveation in log polar space

Author: Lim Fee-Lee
Venkatesh Svetha
West Geoffrey A.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/1996
Field of study

One of the possible models of the human visual system (HVS) in the computer vision literature has a high resolution fovea and exponentially decreasing resolution periphery. The high resolution fovea is used to extract necessary information in order to solve a vision task and the periphery may be used to detect motion. To obtain the desired information, the fovea is guided by the contents of the scene and other knowledge to position the fovea over areas of interest. These eye movements are called saccades and corrective saccades. A two stage process has been implemented as a mechanism for changing foveation in log polar space. Initially, the open loop stage roughly foveates on the best interest feature and then the closed loop stage is invoked to accurately iteratively converge onto the foveation point. The open loop stage developed for the foveation algorithm is applied to saccadic eye movements and a tracking system. Log polar space is preferred over Cartesian space as: (1) it simultaneously provides high resolution and a wide viewing angle; and (2) feature invariance occurs in the fovea which simplifies the foveation process

Deakin Research Online

Crossref

Object Detection Through Exploration With A Foveated Visual Field

Author: A Borji
A Lewis
A Torralba
B Alexe
BR Beutter
BW Tatler
C Bradley
C Morvan
CA Curcio
CA Curcio
CA Curcio
CH Lampert
CJ Ludwig
DG Lowe
DM Dacey
DM Levi
Emre Akbas
GJ Zelinsky
GL Malcolm
H Larochelle
H Strasburger
H Yamamoto
I Kokkinos
J Elder
J Freeman
J Hosang
J Najemnik
J Najemnik
J Rovamo
JH Elder
JM Findlay
JM Findlay
K Koehler
L Itti
L Itti
L Zhaoping
L Zhaoping
LW Renninger
MB Neider
MF Land
Miguel P. Eckstein
MJ Choi
MP Eckstein
MP Eckstein
MP Eckstein
MP Eckstein
MP Eckstein
ND Bruce
NJ Butko
NJ Marshall
P Azzopardi
P Kontschieder
P Verghese
P Viola
PF Felzenszwalb
R Rosenholtz
S Ren
S Zhang
SC Mack
T Malisiewicz
T Wertheim
TJ Preston
W Zhang
Wolfgang Einhäuser
X Chen
Z Li
ZP Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2017
Field of study

We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.Comment: An extended version of this manuscript was published in PLOS Computational Biology (October 2017) at https://doi.org/10.1371/journal.pcbi.100574

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

OpenMETU (Middle East Technical University)

Foveated Video Streaming for Cloud Gaming

Author: Illahi Gazi
Masala Enrico
Siekkinen Matti
Publication venue
Publication date: 15/06/2017
Field of study

Good user experience with interactive cloud-based multimedia applications, such as cloud gaming and cloud-based VR, requires low end-to-end latency and large amounts of downstream network bandwidth at the same time. In this paper, we present a foveated video streaming system for cloud gaming. The system adapts video stream quality by adjusting the encoding parameters on the fly to match the player's gaze position. We conduct measurements with a prototype that we developed for a cloud gaming system in conjunction with eye tracker hardware. Evaluation results suggest that such foveated streaming can reduce bandwidth requirements by even more than 50% depending on parametrization of the foveated video coding and that it is feasible from the latency perspective.Comment: Submitted to: IEEE 19th International Workshop on Multimedia Signal Processin

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Adaptive foveated single-pixel imaging with dynamic super-sampling

Author: Barnett Stephen M.
Edgar Matthew P.
Gibson Graham G.
Padgett Miles J.
Phillips David B.
Sun Ming-Jie
Taylor Jonathan M.
Publication venue
Publication date: 27/07/2016
Field of study

As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typically exhibit low frame-rates. To mitigate this, a range of compressive sensing techniques have been developed which rely on a priori knowledge of the scene to reconstruct images from an under-sampled set of measurements. In this work we take a different approach and adopt a strategy inspired by the foveated vision systems found in the animal kingdom - a framework that exploits the spatio-temporal redundancy present in many dynamic scenes. In our single-pixel imaging system a high-resolution foveal region follows motion within the scene, but unlike a simple zoom, every frame delivers new spatial information from across the entire field-of-view. Using this approach we demonstrate a four-fold reduction in the time taken to record the detail of rapidly evolving features, whilst simultaneously accumulating detail of more slowly evolving regions over several consecutive frames. This tiered super-sampling technique enables the reconstruction of video streams in which both the resolution and the effective exposure-time spatially vary and adapt dynamically in response to the evolution of the scene. The methods described here can complement existing compressive sensing approaches and may be applied to enhance a variety of computational imagers that rely on sequential correlation measurements.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Enlighten: Research Data (University of Glasgow)

Enlighten

Saccadic Predictive Vision Model with a Fovea

Author: Hazoglou Michael
Hylton Todd
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2018
Field of study

We propose a model that emulates saccades, the rapid movements of the eye, called the Error Saccade Model, based on the prediction error of the Predictive Vision Model (PVM). The Error Saccade Model carries out movements of the model's field of view to regions with the highest prediction error. Comparisons of the Error Saccade Model on Predictive Vision Models with and without a fovea show that a fovea-like structure in the input level of the PVM improves the Error Saccade Model's ability to pursue detailed objects in its view. We hypothesize that the improvement is due to poorer resolution in the periphery causing higher prediction error when an object passes, triggering a saccade to the next location.Comment: 10 pages, 6 figure, Accepted in International Conference of Neuromorphic Computing (2018

arXiv.org e-Print Archive

Crossref

Human Attention in Image Captioning: Dataset and Analysis

Author: Borji Ali
He Sen
Pugeault Nicolas
Tavakoli Hamed R.
Publication venue
Publication date: 07/08/2019
Field of study

In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment mechanisms in the top-down soft attention approach that is argued to mimic human attention in captioning tasks, and investigate whether visual saliency can help image captioning. Our study reveals that (1) human attention behaviour differs in free-viewing and image description tasks. Humans tend to fixate on a greater variety of regions under the latter task, (2) there is a strong relationship between described objects and attended objects (

97\%

of the described objects are being attended), (3) a convolutional neural network as feature encoder accounts for human-attended regions during image captioning to a great extent (around

78\%

), (4) soft-attention mechanism differs from human attention, both spatially and temporally, and there is low correlation between caption scores and attention consistency scores. These indicate a large gap between humans and machines in regards to top-down attention, and (5) by integrating the soft attention model with image saliency, we can significantly improve the model's performance on Flickr30k and MSCOCO benchmarks. The dataset can be found at: https://github.com/SenHe/Human-Attention-in-Image-Captioning.Comment: To appear at ICCV 201

arXiv.org e-Print Archive

Crossref

Open Research Exeter

Enlighten