285 research outputs found

    Digital system for bio-inspired visual attention processing fast and efficient information theoretic modelling of saliency

    Get PDF
    Visual attention is a biological mechanism of human vision systems to cope with rich and fast-changing visual information in surrounding environments. Visual saliency is a strategy, which recommends attentive spots to be visited in descending orders of interest or information amounts. This thesis aims to utilize information theory in computational saliency models, assumed that more attention is drawn toward more informative locations. As visual media, i.e. images and videos, are high-dimensional data, information estimation is often computationally infeasible due to enormous requirement of computation and data samples. This thesis proposes and analyses three different practical and innovative information-based saliency models. The first model, called entropy-based saliency method (ENT), measures salient information with centre-surrounding operation by conditional entropy (ENT-CON) or Kullback-Leibler diver-gence (ENT-KLD). However, ENT only estimates information from local features offixed-size windows, it does not utilize multi-scale and global information of visual media, which are proven to be important in biological visual attention. To utilise multi-scale information, Wavelet-based Scale-Saliency (WSS), the second model, estimates information from power distribution of data across wavelet sub-bands basis descriptors in multiple dyadic scales. Though WSS has benefited from local features at multiple scales, it has not integrated information of global context or statistical characteristics of natural images. Multiscale Discriminant Saliency (MDIS), the third model, adopts Wavelet Hidden Markov Tree (WHMT) to unify both multiple-scale and global information for a comprehensive saliency method. All three models, ENT, WSS and MDIS are evaluated and compared against well-known saliency methods such as PSS, AIM, DIS, etc quantitatively by standard numerical tools (Normalized Scale Saliency (NSS), Linear Correlation Coefficient (LCC), Area Under Curver (AUC)) on N.Bruce’s, Kootstra’s and Judd’s databases with human eye-tracking ground-truth as well as qualitatively by visual examination of individual cases. Performances and comprehen-siveness of three models are reflected through numerical results of an experiment on Bruce’s database. As the latter model is designed in more comprehensive and computationally complex manner than the previous, all three quantitative evaluations (LCC,NSS,AUC) generally and computational time increase in that order. ENT WSS MDIS LCC 0.02263 -0.01731 0.02382 NSS -0.17533 0.31782 0.48019 AUC 0.78167 0.70292 0.88335 TIME(s/frame) 0.87040 1.26889 2.32734 Table 1: ENT,WSS,MDIS’s quantitative results on N.Bruce’s databas

    Biologically Inspired Computer Vision/ Applications of Computational Models of Primate Visual Systems in Computer Vision and Image Processing

    Get PDF
    Biologically Inspired Computer VisionApplications of Computational Models of Primate Visual Systems in Computer Vision and Image Processing Reza Hojjaty Saeedy Abstract Biological vision systems are remarkable at extracting and analyzing the information that is essential for vital functional needs. They perform all these tasks with both high sensitivity and strong reliability. They can efficiently and quickly solve most of the difficult computa- tional problems that are still challenging for artificial systems, such as scene segmentation, 3D/depth perception, motion recognition, etc. So it is no surprise that biological vision systems have been a source of inspiration for computer vision problems. In this research, we aim to provide a computer vision task centric framework out of models primarily originating in biological vision studies. We try to address two specific tasks here: saliency detection and object classification. In both of these tasks we use features extracted from computational models of biological vision systems as a starting point for further processing. Saliency maps are 2D topographic maps that catch the most conspicuous regions of a scene, i.e. the pixels in an image that stand out against their neighboring pixels. So these maps can be thought of as representations of the human attention process and thus have a lot of applications in computer vision. We propose a cascade that combines two well- known computational models for perception of color and orientation in order to simulate the responses of the primary areas of the primate visual cortex. We use these responses as inputs to a spiking neural network(SNN) and finally the output of this SNN will serve as the input to our post-processing algorithm for saliency detection. Object classification/detection is the most studied task in computer vision and machine learning and it is interesting that while it looks trivial for humans it is a difficult problem for artificial systems. For this part of the thesis we also design a pipeline including feature extraction using biologically inspired systems, manifold learning for dimensionality reduction and self-organizing(vector quantization) neural network as a supervised method for prototype learning

    Biologically Inspired Computer Vision/ Applications of Computational Models of Primate Visual Systems in Computer Vision and Image Processing

    Get PDF
    Biologically Inspired Computer VisionApplications of Computational Models of Primate Visual Systems in Computer Vision and Image Processing Reza Hojjaty Saeedy Abstract Biological vision systems are remarkable at extracting and analyzing the information that is essential for vital functional needs. They perform all these tasks with both high sensitivity and strong reliability. They can efficiently and quickly solve most of the difficult computa- tional problems that are still challenging for artificial systems, such as scene segmentation, 3D/depth perception, motion recognition, etc. So it is no surprise that biological vision systems have been a source of inspiration for computer vision problems. In this research, we aim to provide a computer vision task centric framework out of models primarily originating in biological vision studies. We try to address two specific tasks here: saliency detection and object classification. In both of these tasks we use features extracted from computational models of biological vision systems as a starting point for further processing. Saliency maps are 2D topographic maps that catch the most conspicuous regions of a scene, i.e. the pixels in an image that stand out against their neighboring pixels. So these maps can be thought of as representations of the human attention process and thus have a lot of applications in computer vision. We propose a cascade that combines two well- known computational models for perception of color and orientation in order to simulate the responses of the primary areas of the primate visual cortex. We use these responses as inputs to a spiking neural network(SNN) and finally the output of this SNN will serve as the input to our post-processing algorithm for saliency detection. Object classification/detection is the most studied task in computer vision and machine learning and it is interesting that while it looks trivial for humans it is a difficult problem for artificial systems. For this part of the thesis we also design a pipeline including feature extraction using biologically inspired systems, manifold learning for dimensionality reduction and self-organizing(vector quantization) neural network as a supervised method for prototype learning

    Biologically motivated keypoint detection for RGB-D data

    Get PDF
    With the emerging interest in active vision, computer vision researchers have been increasingly concerned with the mechanisms of attention. Therefore, several visual attention computational models inspired by the human visual system, have been developed, aiming at the detection of regions of interest in images. This thesis is focused on selective visual attention, which provides a mechanism for the brain to focus computational resources on an object at a time, guided by low-level image properties (Bottom-Up attention). The task of recognizing objects in different locations is achieved by focusing on different locations, one at a time. Given the computational requirements of the models proposed, the research in this area has been mainly of theoretical interest. More recently, psychologists, neurobiologists and engineers have developed cooperation's and this has resulted in considerable benefits. The first objective of this doctoral work is to bring together concepts and ideas from these different research areas, providing a study of the biological research on human visual system and a discussion of the interdisciplinary knowledge in this area, as well as the state-of-art on computational models of visual attention (bottom-up). Normally, the visual attention is referred by engineers as saliency: when people fix their look in a particular region of the image, that's because that region is salient. In this research work, saliency methods are presented based on their classification (biological plausible, computational or hybrid) and in a chronological order. A few salient structures can be used for applications like object registration, retrieval or data simplification, being possible to consider these few salient structures as keypoints when aiming at performing object recognition. Generally, object recognition algorithms use a large number of descriptors extracted in a dense set of points, which comes along with very high computational cost, preventing real-time processing. To avoid the problem of the computational complexity required, the features have to be extracted from a small set of points, usually called keypoints. The use of keypoint-based detectors allows the reduction of the processing time and the redundancy in the data. Local descriptors extracted from images have been extensively reported in the computer vision literature. Since there is a large set of keypoint detectors, this suggests the need of a comparative evaluation between them. In this way, we propose to do a description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint detectors in a public available point cloud library with 3D real objects. The invariance of the 3D keypoint detectors was evaluated according to rotations, scale changes and translations. This evaluation reports the robustness of a particular detector for changes of point-of-view and the criteria used are the absolute and the relative repeatability rate. In our experiments, the method that achieved better repeatability rate was the ISS3D method. The analysis of the human visual system and saliency maps detectors with biological inspiration led to the idea of making an extension for a keypoint detector based on the color information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior of the early visual system. Our method is a color extension of the BIMP keypoint detector, where we include both color and intensity channels of an image: color information is included in a biological plausible way and multi-scale image features are combined into a single keypoints map. This detector is compared against state-of-art detectors and found particularly well-suited for tasks such as category and object recognition. The recognition process is performed by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector and the SHOTCOLOR descriptor a good category recognition rate and object recognition rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results. A 3D recognition system involves the choice of keypoint detector and descriptor. A new method for the detection of 3D keypoints on point clouds is presented and a benchmarking is performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance on object and category recognition. These evaluations are done in a public database of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency map, which is a map that encodes the saliency of objects in the visual environment. The saliency map is determined by computing conspicuity maps (a combination across different modalities) of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the focus of attention (or "keypoint location") is sequentially directed to the most salient points in this map. Inhibiting this location automatically allows the system to attend to the next most salient location. The main conclusions are: with a similar average number of keypoints, our 3D keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the best result in 32 of the evaluated metrics in the category and object recognition experiments, when the second best detector only obtained the best result in 8 of these metrics. The unique drawback is the computational time, since BIK-BUS is slower than the other detectors. Given that differences are big in terms of recognition performance, size and time requirements, the selection of the keypoint detector and descriptor has to be matched to the desired task and we give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and tracking method for 3D objects by using keypoint information in a particle filter. This method consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation is made to remove all the background information, reducing the number of points for further processing. In the initialization, we use a keypoint detector with biological inspiration. The information of the object that we want to follow is given by the extracted keypoints. The particle filter does the tracking of the keypoints, so with that we can predict where the keypoints will be in the next frame. In a recognition system, one of the problems is the computational cost of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking method are done indoors in an office/home environment, where personal robots are expected to operate. The Tracking Error evaluates the stability of the general tracking method. We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by the computation of the keypoint and particle centroid. Comparing our system that the tracking method which exists in the Point Cloud Library, we archive better results, with a much smaller number of points and computational time. Our method is faster and more robust to occlusion when compared to the OpenniTracker.Com o interesse emergente na visão ativa, os investigadores de visão computacional têm estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos. Esses modelos têm como objetivo detetar regiões de interesse nas imagens. Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades de baixo nível da imagem (atenção Bottom-Up). A tarefa de reconhecimento de objetos em diferentes locais é conseguida através da concentração em diferentes locais, um de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos e engenheiros desenvolveram cooperações e isso resultou em benefícios consideráveis. No início deste trabalho, o objetivo é reunir os conceitos e ideias a partir dessas diferentes áreas de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente, a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação, os métodos saliência são apresentados em função da sua classificação (biologicamente plausível, computacional ou híbrido) e numa ordem cronológica. Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações tais como registo de objetos, recuperação ou simplificação de dados. É possível considerar estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um grande número de descritores extraídos num denso conjunto de pontos. Com isso, estes têm um custo computacional muito elevado, impedindo que o processamento seja realizado em tempo real. A fim de evitar o problema da complexidade computacional requerido, as características devem ser extraídas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave. O uso de detetores de pontos-chave permite a redução do tempo de processamento e a quantidade de redundância dos dados. Os descritores locais extraídos a partir das imagens têm sido amplamente reportados na literatura de visão por computador. Uma vez que existe um grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave 2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa biblioteca de pública disponível e com objetos 3D reais. A invariância dos detetores de pontoschave 3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações. Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método ISS3D. Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração biológica, surgiu a ideia de se fazer uma extensão para um detetor de ponto-chave com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de uma imagem. A informação de cor é incluída de forma biológica plausível e as características multi-escala da imagem são combinadas num único mapas de pontos-chave. Este detetor é comparado com os detetores de estado-da-arte e é particularmente adequado para tarefas como o reconhecimento de categorias e de objetos. O processo de reconhecimento é realizado comparando os descritores 3D extraídos nos locais indicados pelos pontos-chave. Para isso, as localizações do pontos-chave 2D têm de ser convertido para o espaço 3D. Isto foi possível porque o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset. Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de reconhecimento de categorias e para o reconhecimento de objetos é com o descritor PFHRGB que obtemos os melhores resultados. Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor, por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave 3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos. Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor de ponto-chave é inspirado no comportamento e na arquitetura neural do sistema visual dos primatas. Os pontos-chave 3D são extraídas com base num mapa de saliências 3D bottom-up, ou seja, um mapa que codifica a saliência dos objetos no ambiente visual. O mapa de saliência é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente orientada para o estímulo. Estes três mapas de conspicuidade são fundidos num mapa de saliência 3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema automaticamente orientado para próximo local mais saliente. As principais conclusões são: com um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas avaliadas nas experiências do reconhecimento das categorias e dos objetos, quando o segundo melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem é o tempo computacional, uma vez que BIK-BUS é mais lento do que os outros detetores. Dado que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de investigação. Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro de partículas. Este método consiste em três etapas distintas: Segmentação, Inicialização do Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo, a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir é dada pelos pontos-chave extraídos. O filtro de partículas faz o acompanhamento dos pontoschave, de modo a se poder prever onde os pontos-chave estarão no próximo frame. As experiências com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e das partículas. Comparando o nosso sistema com o método de tracking que existe na biblioteca usada no desenvolvimento, nós obtemos melhores resultados, com um número muito menor de pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de oclusão, quando comparado com o OpenniTracker

    A computational model of visual attention.

    Get PDF
    Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation

    Predicting human behavior in smart environments: theory and application to gaze prediction

    Get PDF
    Predicting human behavior is desirable in many application scenarios in smart environments. The existing models for eye movements do not take contextual factors into account. This addressed in this thesis using a systematic machine-learning approach, where user profiles for eye movements behaviors are learned from data. In addition, a theoretical innovation is presented, which goes beyond pure data analysis. The thesis proposed the modeling of eye movements as a Markov Decision Processes. It uses Inverse Reinforcement Learning paradigm to infer the user eye movements behaviors

    Contribution to study and implementation of a bio-inspired perception system based on visual and auditory attention

    Get PDF
    The main goal of these researches is the design of one artificial perception system allowing to identify events or scenes in a complex environment. The work carried out during this thesis focused on the study and the conception of a bio-inspired perception system based on the both visual and auditory saliency. The main contributions of this thesis are auditory saliency with sound recognition and visual saliency with object recognition. The auditory saliency is computed by merging information from the both temporal and spectral signals with a saliency map of a spectrogram. The visual perception system is based on visual saliency and recognition of foreground object. In addition, the originality of the proposed approach is the possibility to do an evaluation of the coherence between visual and auditory observations using the obtained information from the features extracted from both visual and auditory patters. The experimental results have proven the interest of this method in the framework of scene identification in a complex environmentL'objectif principal de cette thèse porte sur la conception d'un système de perception artificiel permettant d'identifier des scènes ou évènements pertinents dans des environnements complexes. Les travaux réalisés ont permis d'étudier et de mettre en œuvre d'un système de perception bio-inspiré basé sur l'attention visuelle et auditive. Les principales contributions de cette thèse concernent la saillance auditive associée à une identification des sons et bruits environnementaux ainsi que la saillance visuelle associée à une reconnaissance d'objets pertinents. La saillance du signal sonore est calculée en fusionnant des informations extraites des représentations temporelles et spectrales du signal acoustique avec une carte de saillance visuelle du spectrogramme du signal concerné. Le système de perception visuelle est quant à lui composé de deux mécanismes distincts. Le premier se base sur des méthodes de saillance visuelle et le deuxième permet d'identifier l'objet en premier plan. D'autre part, l'originalité de notre approche est qu'elle permet d'évaluer la cohérence des observations en fusionnant les informations extraites des signaux auditifs et visuels perçus. Les résultats expérimentaux ont permis de confirmer l'intérêt des méthodes utilisées dans le cadre de l'identification de scènes pertinentes dans un environnement complex

    Neutro-Connectedness Theory, Algorithms and Applications

    Get PDF
    Connectedness is an important topological property and has been widely studied in digital topology. However, three main challenges exist in applying connectedness to solve real world problems: (1) the definitions of connectedness based on the classic and fuzzy logic cannot model the “hidden factors” that could influence our decision-making; (2) these definitions are too general to be applied to solve complex problem; and (4) many measurements of connectedness are heavily dependent on the shape (spatial distribution of vertices) of the graph and violate the intuitive idea of connectedness. This research focused on solving these challenges by redesigning the connectedness theory, developing fast algorithms for connectedness computation, and applying the newly proposed theory and algorithms to solve challenges in real problems. The newly proposed Neutro-Connectedness (NC) generalizes the conventional definitions of connectedness and can model uncertainty and describe the part and the whole relationship. By applying the dynamic programming strategy, a fast algorithm was proposed to calculate NC for general dataset. It is not just calculating NC map, and the output NC forest can discover a dataset’s topological structure regarding connectedness. In the first application, interactive image segmentation, two approaches were proposed to solve the two most difficult challenges: user interaction-dependence and intense interaction. The first approach, named NC-Cut, models global topologic property among image regions and reduces the dependence of segmentation performance on the appearance models generated by user interactions. It is less sensitive to the initial region of interest (ROI) than four state-of-the-art ROI-based methods. The second approach, named EISeg, provides user with visual clues to guide the interacting process based on NC. It reduces user interaction greatly by guiding user to where interacting can produce the best segmentation results. In the second application, NC was utilized to solve the challenge of weak boundary problem in breast ultrasound image segmentation. The approach can model the indeterminacy resulted from weak boundaries better than fuzzy connectedness, and achieved more accurate and robust result on our dataset with 131 breast tumor cases

    Texture and Colour in Image Analysis

    Get PDF
    Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews
    corecore