552 research outputs found
An enactive approach to perceptual augmentation in mobility
Event predictions are an important constituent of situation awareness, which is a key objective for many applications in human-machine interaction, in particular in driver assistance. This work focuses on facilitating event predictions in dynamic environments. Its primary contributions are 1) the theoretical development of an approach for enabling people to expand their sampling and understanding of spatiotemporal information, 2) the introduction of exemplary systems that are guided by this approach, 3) the empirical investigation of effects functional prototypes of these systems have on human behavior and safety in a range of simulated road traffic scenarios, and 4) a connection of the investigated approach to work on cooperative human-machine systems. More specific contents of this work are summarized as follows:
The first part introduces several challenges for the formation of situation awareness as a requirement for safe traffic participation. It reviews existing work on these challenges in the domain of driver assistance, resulting in an identification of the need to better inform drivers about dynamically changing aspects of a scene, including event probabilities, spatial and temporal distances, as well as a suggestion to expand the scope of assistance systems to start informing drivers about relevant scene elements at an early stage. Novel forms of assistance can be guided by different fundamental approaches that target either replacement, distribution, or augmentation of driver competencies. A subsequent differentiation of these approaches concludes that an augmentation-guided paradigm, characterized by an integration of machine capabilities into human feedback loops, can be advantageous for tasks that rely on active user engagement, the preservation of awareness and competence, and the minimization of complexity in human- machine interaction. Consequently, findings and theories about human sensorimotor processes are connected to develop an enactive approach that is consistent with an augmentation perspective on human-machine interaction. The approach is characterized by enabling drivers to exercise new sensorimotor processes through which safety-relevant spatiotemporal information may be sampled.
In the second part of this work, a concept and functional prototype for augmenting the perception of traffic dynamics is introduced as a first example for applying principles of this enactive approach. As a loose expression of functional biomimicry, the prototype utilizes a tactile inter- face that communicates temporal distances to potential hazards continuously through stimulus intensity. In a driving simulator study, participants quickly gained an intuitive understanding of the assistance without instructions and demonstrated higher driving safety in safety-critical highway scenarios. But this study also raised new questions such as whether benefits are due to a continuous time-intensity encoding and whether utility generalizes to intersection scenarios or highway driving with low criticality events. Effects of an expanded assistance prototype with lane-independent risk assessment and an option for binary signaling were thus investigated in a separate driving simulator study. Subjective responses confirmed quick signal understanding and a perception of spatial and temporal stimulus characteristics. Surprisingly, even for a binary assistance variant with a constant intensity level, participants reported perceiving a danger-dependent variation in stimulus intensity. They further felt supported by the system in the driving task, especially in difficult situations. But in contrast to the first study, this support was not expressed by changes in driving safety, suggesting that perceptual demands of the low criticality scenarios could be satisfied by existing driver capabilities. But what happens if such basic capabilities are impaired, e.g., due to poor visibility conditions or other situations that introduce perceptual uncertainty? In a third driving simulator study, the driver assistance was employed specifically in such ambiguous situations and produced substantial safety advantages over unassisted driving. Additionally, an assistance variant that adds an encoding of spatial uncertainty was investigated in these scenarios. Participants had no difficulties to understand and utilize this added signal dimension to improve safety. Despite being inherently less informative than spatially precise signals, users rated uncertainty-encoding signals as equally useful and satisfying. This appreciation for transparency of variable assistance reliability is a promising indicator for the feasibility of an adaptive trust calibration in human-machine interaction and marks one step towards a closer integration of driver and vehicle capabilities.
A complementary step on the driver side would be to increase transparency about the driver’s mental states and thus allow for mutual adaptation. The final part of this work discusses how such prerequisites of cooperation may be achieved by monitoring mental state correlates observable in human behavior, especially in eye movements. Furthermore, the outlook for an addition of cooperative features also raises new questions about the bounds of identity as well as practical consequences of human-machine systems in which co-adapting agents may exercise sensorimotor processes through one another.Die Vorhersage von Ereignissen ist ein Bestandteil des Situationsbewusstseins, dessen Unterstützung ein wesentliches Ziel diverser Anwendungen im Bereich Mensch-Maschine Interaktion ist, insbesondere in der Fahrerassistenz. Diese Arbeit zeigt Möglichkeiten auf, Menschen bei Vorhersagen in dynamischen Situationen im Straßenverkehr zu unterstützen. Zentrale Beiträge der Arbeit sind 1) eine theoretische Auseinandersetzung mit der Aufgabe, die menschliche Wahrnehmung und das Verständnis von raum-zeitlichen Informationen im Straßenverkehr zu erweitern, 2) die Einführung beispielhafter Systeme, die aus dieser Betrachtung hervorgehen, 3) die empirische Untersuchung der Auswirkungen dieser Systeme auf das Nutzerverhalten und die Fahrsicherheit in simulierten Verkehrssituationen und 4) die Verknüpfung der untersuchten Ansätze mit Arbeiten an kooperativen Mensch-Maschine Systemen. Die Arbeit ist in drei Teile gegliedert:
Der erste Teil stellt einige Herausforderungen bei der Bildung von Situationsbewusstsein vor, welches für die sichere Teilnahme am Straßenverkehr notwendig ist. Aus einem Vergleich dieses Überblicks mit früheren Arbeiten zeigt sich, dass eine Notwendigkeit besteht, Fahrer besser über dynamische Aspekte von Fahrsituationen zu informieren. Dies umfasst unter anderem Ereigniswahrscheinlichkeiten, räumliche und zeitliche Distanzen, sowie eine frühere Signalisierung relevanter Elemente in der Umgebung.
Neue Formen der Assistenz können sich an verschiedenen grundlegenden Ansätzen der Mensch-Maschine Interaktion orientieren, die entweder auf einen Ersatz, eine Verteilung oder eine Erweiterung von Fahrerkompetenzen abzielen. Die Differenzierung dieser Ansätze legt den Schluss nahe, dass ein von Kompetenzerweiterung geleiteter Ansatz für die Bewältigung jener Aufgaben von Vorteil ist, bei denen aktiver Nutzereinsatz, die Erhaltung bestehender Kompetenzen und Situationsbewusstsein gefordert sind. Im Anschluss werden Erkenntnisse und Theorien über menschliche sensomotorische Prozesse verknüpft, um einen enaktiven Ansatz der Mensch-Maschine Interaktion zu entwickeln, der einer erweiterungsgeleiteten Perspektive Rechnung trägt. Dieser Ansatz soll es Fahrern ermöglichen, sicherheitsrelevante raum-zeitliche Informationen über neue sensomotorische Prozesse zu erfassen.
Im zweiten Teil der Arbeit wird ein Konzept und funktioneller Prototyp zur Erweiterung der Wahrnehmung von Verkehrsdynamik als ein erstes Beispiel zur Anwendung der Prinzipien dieses enaktiven Ansatzes vorgestellt. Dieser Prototyp nutzt vibrotaktile Aktuatoren zur Kommunikation von Richtungen und zeitlichen Distanzen zu möglichen Gefahrenquellen über die Aktuatorposition und -intensität. Teilnehmer einer Fahrsimulationsstudie waren in der Lage, in kurzer Zeit ein intuitives Verständnis dieser Assistenz zu entwickeln, ohne vorher über die Funktionalität unterrichtet worden zu sein. Sie zeigten zudem ein erhöhtes Maß an Fahrsicherheit in kritischen Verkehrssituationen. Doch diese Studie wirft auch neue Fragen auf, beispielsweise, ob der Sicherheitsgewinn auf kontinuierliche Distanzkodierung zurückzuführen ist und ob ein Nutzen auch in weiteren Szenarien vorliegen würde, etwa bei Kreuzungen und weniger kritischem longitudinalen Verkehr. Um diesen Fragen nachzugehen, wurden Effekte eines erweiterten Prototypen mit spurunabhängiger Kollisionsprädiktion, sowie einer Option zur binären Kommunikation möglicher Kollisionsrichtungen in einer weiteren Fahrsimulatorstudie untersucht. Auch in dieser Studie bestätigen die subjektiven Bewertungen ein schnelles Verständnis der Signale und eine Wahrnehmung räumlicher und zeitlicher Signalkomponenten. Überraschenderweise berichteten Teilnehmer größtenteils auch nach der Nutzung einer binären Assistenzvariante, dass sie eine gefahrabhängige Variation in der Intensität von taktilen Stimuli wahrgenommen hätten. Die Teilnehmer fühlten sich mit beiden Varianten in der Fahraufgabe unterstützt, besonders in Situationen, die von ihnen als kritisch eingeschätzt wurden. Im Gegensatz zur ersten Studie hat sich diese gefühlte Unterstützung nur geringfügig in einer messbaren Sicherheitsveränderung widergespiegelt. Dieses Ergebnis deutet darauf hin, dass die Wahrnehmungsanforderungen der Szenarien mit geringer Kritikalität mit den vorhandenen Fahrerkapazitäten erfüllt werden konnten.
Doch was passiert, wenn diese Fähigkeiten eingeschränkt werden, beispielsweise durch schlechte Sichtbedingungen oder Situationen mit erhöhter Ambiguität? In einer dritten Fahrsimulatorstudie wurde das Assistenzsystem in speziell solchen Situationen eingesetzt, was zu substantiellen Sicherheitsvorteilen gegenüber unassistiertem Fahren geführt hat. Zusätzlich zu der vorher eingeführten Form wurde eine neue Variante des Prototyps untersucht, welche räumliche Unsicherheiten der Fahrzeugwahrnehmung in taktilen Signalen kodiert. Studienteilnehmer hatten keine Schwierigkeiten, diese zusätzliche Signaldimension zu verstehen und die Information zur Verbesserung der Fahrsicherheit zu nutzen. Obwohl sie inherent weniger informativ sind als räumlich präzise Signale, bewerteten die Teilnehmer die Signale, die die Unsicherheit übermitteln, als ebenso nützlich und zufriedenstellend. Solch eine Wertschätzung für die Transparenz variabler Informationsreliabilität ist ein vielversprechendes Indiz für die Möglichkeit einer adaptiven Vertrauenskalibrierung in der Mensch-Maschine Interaktion. Dies ist ein Schritt hin zur einer engeren Integration der Fähigkeiten von Fahrer und Fahrzeug.
Ein komplementärer Schritt wäre eine Erweiterung der Transparenz mentaler Zustände des Fahrers, wodurch eine wechselseitige Anpassung von Mensch und Maschine möglich wäre.
Der letzte Teil dieser Arbeit diskutiert, wie diese Transparenz und weitere Voraussetzungen von Mensch-Maschine Kooperation erfüllt werden könnten, indem etwa Korrelate mentaler Zustände, insbesondere über das Blickverhalten, überwacht werden. Des Weiteren ergeben sich mit Blick auf zusätzliche kooperative Fähigkeiten neue Fragen über die Definition von Identität, sowie über die praktischen Konsequenzen von Mensch-Maschine Systemen, in denen ko-adaptive Agenten sensomotorische Prozesse vermittels einander ausüben können
An Evaluation Schema for the Ethical Use of Autonomous Robotic Systems in Security Applications
We propose a multi-step evaluation schema designed to help procurement agencies and others to examine the ethical dimensions of autonomous systems to be applied in the security sector, including autonomous weapons systems
Biologically motivated keypoint detection for RGB-D data
With the emerging interest in active vision, computer vision researchers have been increasingly
concerned with the mechanisms of attention. Therefore, several visual attention
computational models inspired by the human visual system, have been developed, aiming at
the detection of regions of interest in images.
This thesis is focused on selective visual attention, which provides a mechanism for the
brain to focus computational resources on an object at a time, guided by low-level image properties
(Bottom-Up attention). The task of recognizing objects in different locations is achieved
by focusing on different locations, one at a time. Given the computational requirements of the
models proposed, the research in this area has been mainly of theoretical interest. More recently,
psychologists, neurobiologists and engineers have developed cooperation's and this has
resulted in considerable benefits. The first objective of this doctoral work is to bring together
concepts and ideas from these different research areas, providing a study of the biological research
on human visual system and a discussion of the interdisciplinary knowledge in this area, as
well as the state-of-art on computational models of visual attention (bottom-up). Normally, the
visual attention is referred by engineers as saliency: when people fix their look in a particular
region of the image, that's because that region is salient. In this research work, saliency methods
are presented based on their classification (biological plausible, computational or hybrid)
and in a chronological order.
A few salient structures can be used for applications like object registration, retrieval or
data simplification, being possible to consider these few salient structures as keypoints when
aiming at performing object recognition. Generally, object recognition algorithms use a large
number of descriptors extracted in a dense set of points, which comes along with very high computational
cost, preventing real-time processing. To avoid the problem of the computational
complexity required, the features have to be extracted from a small set of points, usually called
keypoints. The use of keypoint-based detectors allows the reduction of the processing time and
the redundancy in the data. Local descriptors extracted from images have been extensively
reported in the computer vision literature. Since there is a large set of keypoint detectors, this
suggests the need of a comparative evaluation between them. In this way, we propose to do a
description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint
detectors in a public available point cloud library with 3D real objects. The invariance of
the 3D keypoint detectors was evaluated according to rotations, scale changes and translations.
This evaluation reports the robustness of a particular detector for changes of point-of-view and
the criteria used are the absolute and the relative repeatability rate. In our experiments, the
method that achieved better repeatability rate was the ISS3D method.
The analysis of the human visual system and saliency maps detectors with biological inspiration
led to the idea of making an extension for a keypoint detector based on the color
information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior
of the early visual system. Our method is a color extension of the BIMP keypoint detector,
where we include both color and intensity channels of an image: color information is included
in a biological plausible way and multi-scale image features are combined into a single keypoints
map. This detector is compared against state-of-art detectors and found particularly
well-suited for tasks such as category and object recognition. The recognition process is performed
by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain
the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector
and the SHOTCOLOR descriptor a good category recognition rate and object recognition
rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results.
A 3D recognition system involves the choice of keypoint detector and descriptor. A new
method for the detection of 3D keypoints on point clouds is presented and a benchmarking is
performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance
on object and category recognition. These evaluations are done in a public database
of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture
of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency
map, which is a map that encodes the saliency of objects in the visual environment. The saliency
map is determined by computing conspicuity maps (a combination across different modalities)
of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven
manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the
focus of attention (or "keypoint location") is sequentially directed to the most salient points in
this map. Inhibiting this location automatically allows the system to attend to the next most
salient location. The main conclusions are: with a similar average number of keypoints, our 3D
keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the
best result in 32 of the evaluated metrics in the category and object recognition experiments,
when the second best detector only obtained the best result in 8 of these metrics. The unique
drawback is the computational time, since BIK-BUS is slower than the other detectors. Given
that differences are big in terms of recognition performance, size and time requirements, the
selection of the keypoint detector and descriptor has to be matched to the desired task and we
give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and
tracking method for 3D objects by using keypoint information in a particle filter. This method
consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation
is made to remove all the background information, reducing the number of points for
further processing. In the initialization, we use a keypoint detector with biological inspiration.
The information of the object that we want to follow is given by the extracted keypoints. The
particle filter does the tracking of the keypoints, so with that we can predict where the keypoints
will be in the next frame. In a recognition system, one of the problems is the computational cost
of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking
method are done indoors in an office/home environment, where personal robots are
expected to operate. The Tracking Error evaluates the stability of the general tracking method.
We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by
the computation of the keypoint and particle centroid. Comparing our system that the tracking
method which exists in the Point Cloud Library, we archive better results, with a much smaller
number of points and computational time. Our method is faster and more robust to occlusion
when compared to the OpenniTracker.Com o interesse emergente na visão ativa, os investigadores de visão computacional têm
estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de
modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos.
Esses modelos têm como objetivo detetar regiões de interesse nas imagens.
Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que
o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades
de baixo nível da imagem (atenção Bottom-Up). A tarefa de reconhecimento de
objetos em diferentes locais é conseguida através da concentração em diferentes locais, um
de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta
área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos
e engenheiros desenvolveram cooperações e isso resultou em benefícios consideráveis. No
início deste trabalho, o objetivo é reunir os conceitos e ideias a partir dessas diferentes áreas
de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema
visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como
um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente,
a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar
numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação,
os métodos saliência são apresentados em função da sua classificação (biologicamente
plausível, computacional ou híbrido) e numa ordem cronológica.
Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações
tais como registo de objetos, recuperação ou simplificação de dados. É possível considerar
estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento
de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um
grande número de descritores extraídos num denso conjunto de pontos. Com isso, estes têm um
custo computacional muito elevado, impedindo que o processamento seja realizado em tempo
real. A fim de evitar o problema da complexidade computacional requerido, as características
devem ser extraídas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave.
O uso de detetores de pontos-chave permite a redução do tempo de processamento e a
quantidade de redundância dos dados. Os descritores locais extraídos a partir das imagens têm
sido amplamente reportados na literatura de visão por computador. Uma vez que existe um
grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa
entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave
2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa
biblioteca de pública disponível e com objetos 3D reais. A invariância dos detetores de pontoschave
3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações.
Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças
de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas
experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método
ISS3D.
Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração
biológica, surgiu a ideia de se fazer uma extensão para um detetor de ponto-chave
com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D
inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de
uma imagem. A informação de cor é incluída de forma biológica plausível e as características
multi-escala da imagem são combinadas num único mapas de pontos-chave. Este detetor
é comparado com os detetores de estado-da-arte e é particularmente adequado para tarefas
como o reconhecimento de categorias e de objetos. O processo de reconhecimento é realizado
comparando os descritores 3D extraídos nos locais indicados pelos pontos-chave. Para isso, as
localizações do pontos-chave 2D têm de ser convertido para o espaço 3D. Isto foi possível porque
o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação
permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset.
Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de
reconhecimento de categorias e para o reconhecimento de objetos é com o descritor PFHRGB
que obtemos os melhores resultados.
Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor,
por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de
pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave
3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos.
Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor
de ponto-chave é inspirado no comportamento e na arquitetura neural do sistema visual dos
primatas. Os pontos-chave 3D são extraídas com base num mapa de saliências 3D bottom-up,
ou seja, um mapa que codifica a saliência dos objetos no ambiente visual. O mapa de saliência
é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes
modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente
orientada para o estímulo. Estes três mapas de conspicuidade são fundidos num mapa de saliência
3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente
direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema
automaticamente orientado para próximo local mais saliente. As principais conclusões são: com
um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros
oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas
avaliadas nas experiências do reconhecimento das categorias e dos objetos, quando o segundo
melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem
é o tempo computacional, uma vez que BIK-BUS é mais lento do que os outros detetores. Dado
que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho
e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a
tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de
investigação.
Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método
robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro
de partículas. Este método consiste em três etapas distintas: Segmentação, Inicialização do
Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo,
a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um
detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir
é dada pelos pontos-chave extraídos. O filtro de partículas faz o acompanhamento dos pontoschave,
de modo a se poder prever onde os pontos-chave estarão no próximo frame. As experiências
com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde
se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método
utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e
das partículas. Comparando o nosso sistema com o método de tracking que existe na biblioteca usada no desenvolvimento, nós obtemos melhores resultados, com um número muito menor de
pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de
oclusão, quando comparado com o OpenniTracker
Primate-inspired Autonomous Navigation Using Mental Rotation and Advice-Giving
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.DOI: 10.1109/MFI.2015.7295817The cognitive process that enables many primate
species to efficiently traverse their environment has been a
subject of numerous studies. Mental rotation is hypothesized
to be one such process. The evolutionary causes for dominance
in primates of mental rotation over its counterpart, rotational
invariance, is still not conclusively understood. Advice-giving
offers a possible explanation for this dominance in more evolved
primate species such as humans. This project aims at exploring
the relationship between advice-giving and mental rotation by
designing a system that combines the two processes in order to
achieve successful navigation to a goal location. Two approaches
to visual advice-giving were explored namely, segment based
and object based advice-giving. The results obtained upon
execution of the navigation algorithm on a Pioneer 2-DX robotic
platform offers evidence regarding a linkage between advice-giving
and mental rotation. An overall navigational accuracy of
90.9% and 71.43% were obtained respectively for the segment-based
and object-based methods. These results also indicate how
the two processes can function together in order to accomplish
a navigational task in the absence of any external aid, as is the
case with primates
Brain Functional Architecture and Human Understanding
The opening line in Aristotle’s Metaphysics asserts that “humans desire to understand”, establishing understanding as the defining characteristic of the human mind and human species. What is understanding and what role does it play in cognition, what advantages does it confer, what brain mechanisms are involved? The Webster’s Dictionary defines understanding as “apprehending general relations in a multitude of particulars.” A proposal discussed in this chapter defines understanding as a form of active inference in self-adaptive systems seeking to expand their inference domains while minimizing metabolic costs incurred in the expansions. Under the same proposal, understanding is viewed as an advanced adaptive mechanism involving self-directed construction of mental models establishing relations between domain entities. Understanding complements learning and serves to overcome the inertia of learned behavior when conditions are unfamiliar or deviate from those experienced in the past. While learning is common across all animals, understanding is unique to the human species. This chapter will unpack these notions, focusing on different facets of understanding. The proposal formulates hypotheses regarding the underlying neuronal mechanisms, attempting to assess their plausibility and reconcile them with the recent ideas and findings concerning brain functional architecture
Biologically Inspired Visual Control of Flying Robots
Insects posses an incredible ability to navigate their environment at high speed, despite
having small brains and limited visual acuity. Through selective pressure they have
evolved computationally efficient means for simultaneously performing navigation tasks
and instantaneous control responses. The insect’s main source of information is visual,
and through a hierarchy of processes this information is used for perception; at the
lowest level are local neurons for detecting image motion and edges, at the higher level
are interneurons to spatially integrate the output of previous stages. These higher
level processes could be considered as models of the insect's environment, reducing the
amount of information to only that which evolution has determined relevant. The scope
of this thesis is experimenting with biologically inspired visual control of flying robots
through information processing, models of the environment, and flight behaviour.
In order to test these ideas I developed a custom quadrotor robot and experimental
platform; the 'wasp' system. All algorithms ran on the robot, in real-time or better,
and hypotheses were always verified with flight experiments.
I developed a new optical flow algorithm that is computationally efficient, and able
to be applied in a regular pattern to the image. This technique is used later in my
work when considering patterns in the image motion field.
Using optical flow in the log-polar coordinate system I developed attitude estimation
and time-to-contact algorithms. I find that the log-polar domain is useful for
analysing global image motion; and in many ways equivalent to the retinotopic arrange-
ment of neurons in the optic lobe of insects, used for the same task.
I investigated the role of depth in insect flight using two experiments. In the first
experiment, to study how concurrent visual control processes might be combined, I
developed a control system using the combined output of two algorithms. The first
algorithm was a wide-field optical flow balance strategy and the second an obstacle
avoidance strategy which used inertial information to estimate the depth to objects in
the environment - objects whose depth was significantly different to their surround-
ings. In the second experiment I created an altitude control system which used a model
of the environment in the Hough space, and a biologically inspired sampling strategy,
to efficiently detect the ground. Both control systems were used to control the flight
of a quadrotor in an indoor environment.
The methods that insects use to perceive edges and control their flight in response
had not been applied to artificial systems before. I developed a quadrotor control
system that used the distribution of edges in the environment to regulate the robot
height and avoid obstacles. I also developed a model that predicted the distribution of
edges in a static scene, and using this prediction was able to estimate the quadrotor
altitude
Perception-driven approaches to real-time remote immersive visualization
In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput
Visual Neuroscience of Robotic Grasping
Supporting Informatio
Virtual Reality Games for Motor Rehabilitation
This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion
- …