79 research outputs found

    Space-variant picture coding

    Get PDF
    PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of the human visual system in order to increase coding efficiency in terms of perceived quality per bit. This thesis extends space-variant coding research in two directions. The first of these directions is in foveated coding. Past foveated coding research has been dominated by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing an additive multi-viewer sensitivity function based on an established eye resolution model, and, from this, a blur map that is optimal in the sense of discarding frequencies in least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm is presented for the efficient computation of high-accuracy smoothly space-variant Gaussian blurring, using a specialised filter bank which approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to the brute force approach of employing a separate low-pass filter at each image location. The second direction is that of artifi cially increasing the depth-of- field of an image, an idea borrowed from photography with the advantage of allowing an image to be reduced in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic occlusion eff ects as occur in natural blurring, and of handling any number of blurring and occlusion levels with the same level of computational complexity. The merits of this coding approach have been investigated by subjective experiments to compare it with single-viewer foveated image coding. The results found the depth-based preblurring to generally be significantly preferable to the same level of foveation blurring

    A new type of eye movement model based on recurrent neural networks for simulating the gaze behavior of human reading.

    Get PDF
    Traditional eye movement models are based on psychological assumptions and empirical data that are not able to simulate eye movement on previously unseen text data. To address this problem, a new type of eye movement model is presented and tested in this paper. In contrast to conventional psychology-based eye movement models, ours is based on a recurrent neural network (RNN) to generate a gaze point prediction sequence, by using the combination of convolutional neural networks (CNN), bidirectional long short-term memory networks (LSTM), and conditional random fields (CRF). The model uses the eye movement data of a reader reading some texts as training data to predict the eye movements of the same reader reading a previously unseen text. A theoretical analysis of the model is presented to show its excellent convergence performance. Experimental results are then presented to demonstrate that the proposed model can achieve similar prediction accuracy while requiring fewer features than current machine learning models

    The Contribution of the Magnocellular Visual Pathway to the Process of Visual Word Recognition

    Get PDF
    Previous research on visual word recognition has uncovered a variety of factors which influence how easily this process is achieved. Some factors are intrinsic to the word itself (e.g., length, frequency, regularity) and some are environmental factors (e.g., stimuli contrast or visual field position). Any proposed account of visual word recognition must consider not only the properties of the word itself, but also the properties of the visual system that processes the words. This thesis tested the hypothesis that the magnocellular visual pathway contributes to the processing of words and that this contribution is most evident when words are presented in parafoveal vision. Experiments 1 and 2 investigated the effect on the recognition of isolated words of limiting input to the visual system by occluding one eye. We looked at the effect of visual field presentation position and word length. Previous research using binocular viewing had shown a large length effect in the left visual field. We found that occluding the right eye reduced the left visual field length effect. Experiments 3, 4 and 5 looked at the impact of varying presentation position on competent readers and dyslexics. Numerous studies in sentence processing have shown that phonological information can be extracted during parafoveal preview. We asked whether dyslexics’ well attested phonological impairment will hinder their ability to extract phonological information in parafoveal vision. Experiments 3 and 4 demonstrated that only the dyslexic group showed an effect of word regularity. Experiment 5 used a rhyme-matching task to show that only dyslexic readers have a problem in extracting phonological information from word pairs presented to the right visual field. We relate this to magnocellular functioning. Experiments 6, 7 and 8 used isoluminant stimuli to directly test the consequences of inhibiting the magnocellular visual pathway on the recognition of words presented both foveally and parafoveally. The results of these experiments show that blocking the magnocellular pathway affects parafoveal areas of the visual field more than the foveal area and that words are affected by this whereas non-words are not. In conclusion, we demonstrated that the magnocellular pathway does contribute significantly to the recognition of words and that the parafoveal area of the retina is more heavily dependent on the magnocellular pathway compared to the foveal area of the retina. We go on to propose plans for future research looking at the role of the magnocellular pathway in parafoveal preview in sentence reading

    Biologically inspired feature extraction for rotation and scale tolerant pattern analysis

    Get PDF
    Biologically motivated information processing has been an important area of scientific research for decades. The central topic addressed in this dissertation is utilization of lateral inhibition and more generally, linear networks with recurrent connectivity along with complex-log conformal mapping in machine based implementations of information encoding, feature extraction and pattern recognition. The reasoning behind and method for spatially uniform implementation of inhibitory/excitatory network model in the framework of non-uniform log-polar transform is presented. For the space invariant connectivity model characterized by Topelitz-Block-Toeplitz matrix, the overall network response is obtained without matrix inverse operations providing the connection matrix generating function is bound by unity. It was shown that for the network with the inter-neuron connection function expandable in a Fourier series in polar angle, the overall network response is steerable. The decorrelating/whitening characteristics of networks with lateral inhibition are used in order to develop space invariant pre-whitening kernels specialized for specific category of input signals. These filters have extremely small memory footprint and are successfully utilized in order to improve performance of adaptive neural whitening algorithms. Finally, the method for feature extraction based on localized Independent Component Analysis (ICA) transform in log-polar domain and aided by previously developed pre-whitening filters is implemented. Since output codes produced by ICA are very sparse, a small number of non-zero coefficients was sufficient to encode input data and obtain reliable pattern recognition performance
    • …
    corecore