964 research outputs found

    Information recovery from rank-order encoded images

    Get PDF
    The time to detection of a visual stimulus by the primate eye is recorded at 100 – 150ms. This near instantaneous recognition is in spite of the considerable processing required by the several stages of the visual pathway to recognise and react to a visual scene. How this is achieved is still a matter of speculation. Rank-order codes have been proposed as a means of encoding by the primate eye in the rapid transmission of the initial burst of information from the sensory neurons to the brain. We study the efficiency of rank-order codes in encoding perceptually-important information in an image. VanRullen and Thorpe built a model of the ganglion cell layers of the retina to simulate and study the viability of rank-order as a means of encoding by retinal neurons. We validate their model and quantify the information retrieved from rank-order encoded images in terms of the visually-important information recovered. Towards this goal, we apply the ‘perceptual information preservation algorithm’, proposed by Petrovic and Xydeas after slight modification. We observe a low information recovery due to losses suffered during the rank-order encoding and decoding processes. We propose to minimise these losses to recover maximum information in minimum time from rank-order encoded images. We first maximise information recovery by using the pseudo-inverse of the filter-bank matrix to minimise losses during rankorder decoding. We then apply the biological principle of lateral inhibition to minimise losses during rank-order encoding. In doing so, we propose the Filteroverlap Correction algorithm. To test the perfomance of rank-order codes in a biologically realistic model, we design and simulate a model of the foveal-pit ganglion cells of the retina keeping close to biological parameters. We use this as a rank-order encoder and analyse its performance relative to VanRullen and Thorpe’s retinal model

    Bio-inspired log-polar based color image pattern analysis in multiple frequency channels

    Get PDF
    The main topic addressed in this thesis is to implement color image pattern recognition based on the lateral inhibition subtraction phenomenon combined with a complex log-polar mapping in multiple spatial frequency channels. It is shown that the individual red, green and blue channels have different recognition performances when put in the context of former work done by Dragan Vidacic. It is observed that the green channel performs better than the other two channels, with the blue channel having the poorest performance. Following the application of a contrast stretching function the object recognition performance is improved in all channels. Multiple spatial frequency filters were designed to simulate the filtering channels that occur in the human visual system. Following these preprocessing steps Dragan Vidacic\u27s methodology is followed in order to determine the benefits that are obtained from the preprocessing steps being investigated. It is shown that performance gains are realized by using such preprocessing steps

    Visionary Ophthalmics: Confluence of Computer Vision and Deep Learning for Ophthalmology

    Get PDF
    Ophthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have sprung into the forefront following advances in GPU hardware. This development raises important questions regarding how to best leverage insights from both modern deep learning approaches and more classical computer vision approaches for a given problem. In this dissertation, we tackle challenging computer vision problems in ophthalmology using methods all across this spectrum. Perhaps our most significant work is a highly successful iris registration algorithm for use in laser eye surgery. This algorithm relies on matching features extracted from the structure tensor and a Gabor wavelet – a classically driven approach that does not utilize modern machine learning. However, drawing on insight from the deep learning revolution, we demonstrate successful application of backpropagation to optimize the registration significantly faster than the alternative of relying on finite differences. Towards the other end of the spectrum, we also present a novel framework for improving RANSAC segmentation algorithms by utilizing a convolutional neural network (CNN) trained on a RANSAC-based loss function. Finally, we apply state-of-the-art deep learning methods to solve the problem of pathological fluid detection in optical coherence tomography images of the human retina, using a novel retina-specific data augmentation technique to greatly expand the data set. Altogether, our work demonstrates benefits of applying a holistic view of computer vision, which leverages deep learning and associated insights without neglecting techniques and insights from the previous era

    Retinal vessel segmentation using textons

    Get PDF
    Segmenting vessels from retinal images, like segmentation in many other medical image domains, is a challenging task, as there is no unified way that can be adopted to extract the vessels accurately. However, it is the most critical stage in automatic assessment of various forms of diseases (e.g. Glaucoma, Age-related macular degeneration, diabetic retinopathy and cardiovascular diseases etc.). Our research aims to investigate retinal image segmentation approaches based on textons as they provide a compact description of texture that can be learnt from a training set. This thesis presents a brief review of those diseases and also includes their current situations, future trends and techniques used for their automatic diagnosis in routine clinical applications. The importance of retinal vessel segmentation is particularly emphasized in such applications. An extensive review of previous work on retinal vessel segmentation and salient texture analysis methods is presented. Five automatic retinal vessel segmentation methods are proposed in this thesis. The first method focuses on addressing the problem of removing pathological anomalies (Drusen, exudates) for retinal vessel segmentation, which have been identified by other researchers as a problem and a common source of error. The results show that the modified method shows some improvement compared to a previously published method. The second novel supervised segmentation method employs textons. We propose a new filter bank (MR11) that includes bar detectors for vascular feature extraction and other kernels to detect edges and photometric variations in the image. The k-means clustering algorithm is adopted for texton generation based on the vessel and non-vessel elements which are identified by ground truth. The third improved supervised method is developed based on the second one, in which textons are generated by k-means clustering and texton maps representing vessels are derived by back projecting pixel clusters onto hand labelled ground truth. A further step is implemented to ensure that the best combinations of textons are represented in the map and subsequently used to identify vessels in the test set. The experimental results on two benchmark datasets show that our proposed method performs well compared to other published work and the results of human experts. A further test of our system on an independent set of optical fundus images verified its consistent performance. The statistical analysis on experimental results also reveals that it is possible to train unified textons for retinal vessel segmentation. In the fourth method a novel scheme using Gabor filter bank for vessel feature extraction is proposed. The ii method is inspired by the human visual system. Machine learning is used to optimize the Gabor filter parameters. The experimental results demonstrate that our method significantly enhances the true positive rate while maintaining a level of specificity that is comparable with other approaches. Finally, we proposed a new unsupervised texton based retinal vessel segmentation method using derivative of SIFT and multi-scale Gabor filers. The lack of sufficient quantities of hand labelled ground truth and the high level of variability in ground truth labels amongst experts provides the motivation for this approach. The evaluation results reveal that our unsupervised segmentation method is comparable with the best other supervised methods and other best state of the art methods

    Biologically inspired feature extraction for rotation and scale tolerant pattern analysis

    Get PDF
    Biologically motivated information processing has been an important area of scientific research for decades. The central topic addressed in this dissertation is utilization of lateral inhibition and more generally, linear networks with recurrent connectivity along with complex-log conformal mapping in machine based implementations of information encoding, feature extraction and pattern recognition. The reasoning behind and method for spatially uniform implementation of inhibitory/excitatory network model in the framework of non-uniform log-polar transform is presented. For the space invariant connectivity model characterized by Topelitz-Block-Toeplitz matrix, the overall network response is obtained without matrix inverse operations providing the connection matrix generating function is bound by unity. It was shown that for the network with the inter-neuron connection function expandable in a Fourier series in polar angle, the overall network response is steerable. The decorrelating/whitening characteristics of networks with lateral inhibition are used in order to develop space invariant pre-whitening kernels specialized for specific category of input signals. These filters have extremely small memory footprint and are successfully utilized in order to improve performance of adaptive neural whitening algorithms. Finally, the method for feature extraction based on localized Independent Component Analysis (ICA) transform in log-polar domain and aided by previously developed pre-whitening filters is implemented. Since output codes produced by ICA are very sparse, a small number of non-zero coefficients was sufficient to encode input data and obtain reliable pattern recognition performance

    The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work).

    Get PDF
    This paper explores the theoretical consequences of a simple assumption: the computational goal of the feedforward path in the ventral stream -- from V1, V2, V4 and to IT -- is to discount image transformations, after learning them during development

    A Spiking Neural Network Based Cortex-Like Mechanism and Application to Facial Expression Recognition

    Get PDF
    In this paper, we present a quantitative, highly structured cortex-simulated model, which can be simply described as feedforward, hierarchical simulation of ventral stream of visual cortex using biologically plausible, computationally convenient spiking neural network system. The motivation comes directly from recent pioneering works on detailed functional decomposition analysis of the feedforward pathway of the ventral stream of visual cortex and developments on artificial spiking neural networks (SNNs). By combining the logical structure of the cortical hierarchy and computing power of the spiking neuron model, a practical framework has been presented. As a proof of principle, we demonstrate our system on several facial expression recognition tasks. The proposed cortical-like feedforward hierarchy framework has the merit of capability of dealing with complicated pattern recognition problems, suggesting that, by combining the cognitive models with modern neurocomputational approaches, the neurosystematic approach to the study of cortex-like mechanism has the potential to extend our knowledge of brain mechanisms underlying the cognitive analysis and to advance theoretical models of how we recognize face or, more specifically, perceive other people’s facial expression in a rich, dynamic, and complex environment, providing a new starting point for improved models of visual cortex-like mechanism

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    The fundamentals of unimodal palmprint authentication based on a biometric system: A review

    Get PDF
    Biometric system can be defined as the automated method of identifying or authenticating the identity of a living person based on physiological or behavioral traits. Palmprint biometric-based authentication has gained considerable attention in recent years. Globally, enterprises have been exploring biometric authorization for some time, for the purpose of security, payment processing, law enforcement CCTV systems, and even access to offices, buildings, and gyms via the entry doors. Palmprint biometric system can be divided into unimodal and multimodal. This paper will investigate the biometric system and provide a detailed overview of the palmprint technology with existing recognition approaches. Finally, we introduce a review of previous works based on a unimodal palmprint system using different databases
    corecore