44 research outputs found

    Visual attention and swarm cognition for off-road robots

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2011Esta tese aborda o problema da modelação de atenção visual no contexto de robôs autónomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual é o de focar a percepção nos aspectos do ambiente mais relevantes à tarefa do robô. Esta tese mostra que, na detecção de obstáculos e de trilhos, esta capacidade promove robustez e parcimónia computacional. Estas são características chave para a rapidez e eficiência dos robôs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advém da necessidade de gerir o compromisso velocidade-precisão na presença de variações de contexto ou de tarefa. Esta tese mostra que este compromisso é resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação é modulada pelo módulo de selecção de acção, responsável pelo controlo do robô. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o último é capaz de operar apenas onde é necessário, antecipando as acções do robô. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtém inspiração da Natureza. Concretamente, os mecanismos responsáveis pela capacidade que as formigas guerreiras têm de procurar alimento de forma auto-organizada, são usados como metáfora na resolução da tarefa de procurar, também de forma auto-organizada, obstáculos e trilhos no campo visual do robô. A solução proposta nesta tese é a de colocar vários focos de atenção encoberta a operar como um enxame, através de interacções baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este é um novo campo de investigação que procura descobrir os princípios básicos da cognição, inspeccionando as propriedades auto-organizadas da inteligência colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robótica como disciplina de engenharia e para a robótica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptável.Esta tese aborda o problema da modelação de atenção visual no contexto de robôs autónomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual é o de focar a percepção nos aspectos do ambiente mais relevantes à tarefa do robô. Esta tese mostra que, na detecção de obstáculos e de trilhos, esta capacidade promove robustez e parcimónia computacional. Estas são características chave para a rapidez e eficiência dos robôs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advém da necessidade de gerir o compromisso velocidade-precisão na presença de variações de contexto ou de tarefa. Esta tese mostra que este compromisso é resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação é modulada pelo módulo de selecção de acção, responsável pelo controlo do robô. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o último é capaz de operar apenas onde é necessário, antecipando as acções do robô. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtém inspi- ração da Natureza. Concretamente, os mecanismos responsáveis pela capacidade que as formi- gas guerreiras têm de procurar alimento de forma auto-organizada, são usados como metáfora na resolução da tarefa de procurar, também de forma auto-organizada, obstáculos e trilhos no campo visual do robô. A solução proposta nesta tese é a de colocar vários focos de atenção encoberta a operar como um enxame, através de interacções baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este é um novo campo de investigação que procura descobrir os princípios básicos da cognição, ins- peccionando as propriedades auto-organizadas da inteligência colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robótica como disciplina de engenharia e para a robótica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptável.Fundação para a Ciência e a Tecnologia (FCT,SFRH/BD/27305/2006); Laboratory of Agent Modelling (LabMag

    Image Processing Pipeline Based on Coupled Oscillator Models

    Get PDF
    For decades, researchers have been developing algorithms for image processing pipelines. Image Processing Pipelines (IPPs) are algorithmic constructions built to iteratively modify an input image into a series of abstractions for the purposes of decoding its contents into a higher level representation. There have been many proposed IPPs, varying in both physical construction, and in algorithmic paradigm, but by and large these propositions have been based in Boolean computation and arithmetic. Studies and trends have shown that Boolean computers are hitting a theoretical ceiling on their performance in terms of transistor size, energy consumption/heat dissipation, clock rates, and by extension computational time. Due to these issues, researchers have proposed using non-Boolean approaches, where possible, for various computations in common algorithms. One of the emerging technologies in the field of non-Boolean computation has been the use of coupled oscillators. A proposed use of coupled oscillators is for pattern matching, which can also be interpreted as a high-dimensional distance measurement. Using an approach based on the use of coupled oscillators as a basic computational primitive, this work aims to utilize the benefits gained from this new computational paradigm to gain performance in terms of both speed and power with respect to IPPs, without decreasing the accuracy of their algorithms

    Spatial and temporal background modelling of non-stationary visual scenes

    Get PDF
    PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent in recent years. Applications are to be found in medical scanning, automated manufacture, and perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic management all employ and benefit from an unprecedented quantity of video cameras for monitoring purposes. But the high cost and limited effectiveness of employing humans as the final link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques. Whilst the field of machine vision has enjoyed consistent rapid development in the last 20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner. Central to a great many vision applications is the concept of segmentation, and in particular, most practical systems perform background subtraction as one of the first stages of video processing. This involves separation of ‘interesting foreground’ from the less informative but persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and liable to be application specific. Furthermore, the background may be interpreted as including the visual appearance of normal activity of any agents present in the scene, human or otherwise. Thus a background model might be called upon to absorb lighting changes, moving trees and foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in ‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails of the computer vision field, and consequently the subject has received considerable attention. This thesis sets out to address some of the limitations of contemporary methods of background segmentation by investigating methods of inducing local mutual support amongst pixels in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm time domain, and (3) locality in the domain of cyclic repetition frequency. Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose a structure in which every image pixel bears the same relation to every other pixel. But Markov Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and 3 are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple learned local pattern hypotheses, whilst relying solely on monochrome image data. Many background models enforce temporal consistency constraints on a pixel in attempt to confirm background membership before being accepted as part of the model, and typically some control over this process is exercised by a learning rate parameter. But in busy scenes, a true background pixel may be visible for a relatively small fraction of the time and in a temporally fragmented fashion, thus hindering such background acquisition. However, support in terms of temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm background estimates which induce a similar consistency, but are considerably more robust to disturbance. A novel technique is presented here in which the short-term estimates act as ‘pre-filtered’ data from which a far more compact eigen-background may be constructed. Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions employing traffic signals are among these, yet little is to be found amongst the literature regarding the explicit modelling of such periodic processes in a scene. Previous work focussing on gait recognition has demonstrated approaches based on recurrence of self-similarity by which local periodicity may be identified. The present work harnesses and extends this method in order to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal model. The model may then be used to highlight abnormality in scene activity. Furthermore, a Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to maintain correct synchronization with scene activity in spite of noise and drift of periodicity. This thesis contends that these three approaches are all manifestations of the same broad underlying concept: local support in each of the space, time and frequency domains, and furthermore, that the support can be harnessed practically, as will be demonstrated experimentally

    Visual attention in primates and for machines - neuronal mechanisms

    Get PDF
    Visual attention is an important cognitive concept for the daily life of humans, but still not fully understood. Due to this, it is also rarely utilized in computer vision systems. However, understanding visual attention is challenging as it has many and seemingly-different aspects, both at neuronal and behavioral level. Thus, it is very hard to give a uniform explanation of visual attention that can account for all aspects. To tackle this problem, this thesis has the goal to identify a common set of neuronal mechanisms, which underlie both neuronal and behavioral aspects. The mechanisms are simulated by neuro-computational models, thus, resulting in a single modeling approach to explain a wide range of phenomena at once. In the thesis, the chosen aspects are multiple neurophysiological effects, real-world object localization, and a visual masking paradigm (OSM). In each of the considered fields, the work also advances the current state-of-the-art to better understand this aspect of attention itself. The three chosen aspects highlight that the approach can account for crucial neurophysiological, functional, and behavioral properties, thus the mechanisms might constitute the general neuronal substrate of visual attention in the cortex. As outlook, our work provides for computer vision a deeper understanding and a concrete prototype of attention to incorporate this crucial aspect of human perception in future systems.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusionVisuelle Aufmerksamkeit ist ein wichtiges kognitives Konzept für das tägliche Leben des Menschen. Es ist aber immer noch nicht komplett verstanden, so dass es ein langjähriges Ziel der Neurowissenschaften ist, das Phänomen grundlegend zu durchdringen. Gleichzeitig wird es aufgrund des mangelnden Verständnisses nur selten in maschinellen Sehsystemen in der Informatik eingesetzt. Das Verständnis von visueller Aufmerksamkeit ist jedoch eine komplexe Herausforderung, da Aufmerksamkeit äußerst vielfältige und scheinbar unterschiedliche Aspekte besitzt. Sie verändert multipel sowohl die neuronalen Feuerraten als auch das menschliche Verhalten. Daher ist es sehr schwierig, eine einheitliche Erklärung von visueller Aufmerksamkeit zu finden, welche für alle Aspekte gleichermaßen gilt. Um dieses Problem anzugehen, hat diese Arbeit das Ziel, einen gemeinsamen Satz neuronaler Mechanismen zu identifizieren, welche sowohl den neuronalen als auch den verhaltenstechnischen Aspekten zugrunde liegen. Die Mechanismen werden in neuro-computationalen Modellen simuliert, wodurch ein einzelnes Modellierungsframework entsteht, welches zum ersten Mal viele und verschiedenste Phänomene von visueller Aufmerksamkeit auf einmal erklären kann. Als Aspekte wurden in dieser Dissertation multiple neurophysiologische Effekte, Realwelt Objektlokalisation und ein visuelles Maskierungsparadigma (OSM) gewählt. In jedem dieser betrachteten Felder wird gleichzeitig der State-of-the-Art verbessert, um auch diesen Teilbereich von Aufmerksamkeit selbst besser zu verstehen. Die drei gewählten Gebiete zeigen, dass der Ansatz grundlegende neurophysiologische, funktionale und verhaltensbezogene Eigenschaften von visueller Aufmerksamkeit erklären kann. Da die gefundenen Mechanismen somit ausreichend sind, das Phänomen so umfassend zu erklären, könnten die Mechanismen vielleicht sogar das essentielle neuronale Substrat von visueller Aufmerksamkeit im Cortex darstellen. Für die Informatik stellt die Arbeit damit ein tiefergehendes Verständnis von visueller Aufmerksamkeit dar. Darüber hinaus liefert das Framework mit seinen neuronalen Mechanismen sogar eine Referenzimplementierung um Aufmerksamkeit in zukünftige Systeme integrieren zu können. Aufmerksamkeit könnte laut der vorliegenden Forschung sehr nützlich für diese sein, da es im Gehirn eine Aufgabenspezifische Optimierung des visuellen Systems bereitstellt. Dieser Aspekt menschlicher Wahrnehmung fehlt meist in den aktuellen, starken Computervisionssystemen, so dass eine Integration in aktuelle Systeme deren Leistung sprunghaft erhöhen und eine neue Klasse definieren dürfte.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusio

    Computational visual attention systems and their cognitive foundation: A survey

    Get PDF
    Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. (c) 2010 ACMBased on concepts of the human visual system, computational visual attention systems aim to detect regions of interest in images. Psychologists, neurobiologists, and computer scientists have investigated visual attention thoroughly during the last decades and profited considerably from each other. However, the interdisciplinarity of the topic holds not only benefits but also difficulties: concepts of other fields are usually hard to access due to differences in vocabulary and lack of knowledge of the relevant literature. This paper aims to bridge this gap and bring together concepts and ideas from the different research areas. It provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. Furthermore, it presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems and mobile robotics. We conclude with a discussion on the limitations and open questions in the field

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    Computational principles for an autonomous active vision system

    Full text link
    Vision research has uncovered computational principles that generalize across species and brain area. However, these biological mechanisms are not frequently implemented in computer vision algorithms. In this thesis, models suitable for application in computer vision were developed to address the benefits of two biologically-inspired computational principles: multi-scale sampling and active, space-variant, vision. The first model investigated the role of multi-scale sampling in motion integration. It is known that receptive fields of different spatial and temporal scales exist in the visual cortex; however, models addressing how this basic principle is exploited by species are sparse and do not adequately explain the data. The developed model showed that the solution to a classical problem in motion integration, the aperture problem, can be reframed as an emergent property of multi-scale sampling facilitated by fast, parallel, bi-directional connections at different spatial resolutions. Humans and most other mammals actively move their eyes to sample a scene (active vision); moreover, the resolution of detail in this sampling process is not uniform across spatial locations (space-variant). It is known that these eye-movements are not simply guided by image saliency, but are also influenced by factors such as spatial attention, scene layout, and task-relevance. However, it is seldom questioned how previous eye movements shape how one learns and recognizes an object in a continuously-learning system. To explore this question, a model (CogEye) was developed that integrates active, space-variant sampling with eye-movement selection (the where visual stream), and object recognition (the what visual stream). The model hypothesizes that a signal from the recognition system helps the where stream select fixation locations that best disambiguate object identity between competing alternatives. The third study used eye-tracking coupled with an object disambiguation psychophysics experiment to validate the second model, CogEye. While humans outperformed the model in recognition accuracy, when the model used information from the recognition pathway to help select future fixations, it was more similar to human eye movement patterns than when the model relied on image saliency alone. Taken together these results show that computational principles in the mammalian visual system can be used to improve computer vision models
    corecore