11 research outputs found
Attention and Anticipation in Fast Visual-Inertial Navigation
We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to
estimate its state using an on-board camera and an inertial sensor, without any
prior knowledge of the external environment. We consider the case in which the
robot can allocate limited resources to VIN, due to tight computational
constraints. Therefore, we answer the following question: under limited
resources, what are the most relevant visual cues to maximize the performance
of visual-inertial navigation? Our approach has four key ingredients. First, it
is task-driven, in that the selection of the visual cues is guided by a metric
quantifying the VIN performance. Second, it exploits the notion of
anticipation, since it uses a simplified model for forward-simulation of robot
dynamics, predicting the utility of a set of visual cues over a future time
horizon. Third, it is efficient and easy to implement, since it leads to a
greedy algorithm for the selection of the most relevant visual cues. Fourth, it
provides formal performance guarantees: we leverage submodularity to prove that
the greedy selection cannot be far from the optimal (combinatorial) selection.
Simulations and real experiments on agile drones show that our approach ensures
state-of-the-art VIN performance while maintaining a lean processing time. In
the easy scenarios, our approach outperforms appearance-based feature selection
in terms of localization errors. In the most challenging scenarios, it enables
accurate visual-inertial navigation while appearance-based feature selection
fails to track robot's motion during aggressive maneuvers.Comment: 20 pages, 7 figures, 2 table
Drift and stabilization of cortical response selectivity
Synaptic turnover and long term functional stability are two seemingly contradicting features of neuronal networks, which show varying expressions across different brain regions. Recent studies have shown, how both of these are strongly expressed in the hippocampus, raising the question how this can be reconciled within a biological network.
In this work, I use a data set of neuron activity from mice behaving within a virtual environment recorded over up to several months to extend and develop methods, showing how the activity of hundreds of neurons per individual animal can be reliably tracked and characterized. I employ these methods to analyze network- and individual neuron behavior during the initial formation of a place map from the activity of individual place cells while the animal learns to navigate in a new environment, as well as during the condition of a constant environment over several weeks.
In a published study included in this work, we find that map formation is driven by selective stabilization of place cells coding for salient regions, with distinct characteristics for neurons coding for landmark, reward, or other locations. Strikingly, we find that in mice lacking Shank2, an autism spectrum disorder (ASD)-linked gene encoding an excitatory postsynaptic scaffold protein, a characteristic overrepresentation of visual landmarks is missing while the overrepresentation of reward location remains intact, suggesting different underlying mechanisms in the stabilization.
In the condition of a constant environment, I find how turnover dynamics largely decouple from the location of a place field and are governed by a strong decorrelation of population activity on short time scales (hours to days), followed by long-lasting correlations (days to months) above chance level. In agreement with earlier studies, I find a slow, constant drift in the population of active neurons, while – contrary to earlier results – place fields within the active population are assumed approximately randomly. Place field movement across days is governed by periods of stability around an anchor position, interrupted by random, long-range relocation. The data does not suggest the existence of populations of neurons showing distinct properties of stability, but rather shows a continuous range from highly unstable to very stable functional- and non-functional activity. Average timescales of reliable contributions to the neural code are on the order of few days, in agreement with earlier reported timescales of synaptic turnover in the hippocampus.2021-08-0
Computational visual attention systems and their cognitive foundation: A survey
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
(c) 2010 ACMBased on concepts of the human visual system, computational visual attention systems aim to
detect regions of interest in images. Psychologists, neurobiologists, and computer scientists have
investigated visual attention thoroughly during the last decades and profited considerably from
each other. However, the interdisciplinarity of the topic holds not only benefits but also difficulties:
concepts of other fields are usually hard to access due to differences in vocabulary and lack of
knowledge of the relevant literature. This paper aims to bridge this gap and bring together
concepts and ideas from the different research areas. It provides an extensive survey of the
grounding psychological and biological research on visual attention as well as the current state
of the art of computational systems. Furthermore, it presents a broad range of applications
of computational attention systems in fields like computer vision, cognitive systems and mobile
robotics. We conclude with a discussion on the limitations and open questions in the field
Local Accuracy and Global Consistency for Efficient SLAM
This thesis is concerned with the problem of Simultaneous Localisation and
Mapping (SLAM) using visual data only. Given the video stream of a moving
camera, we wish to estimate the structure of the environment and the motion
of the device most accurately and in real-time.
Two effective approaches were presented in the past. Filtering methods
marginalise out past poses and summarise the information gained over time
with a probability distribution. Keyframe methods rely on the optimisation
approach of bundle adjustment, but computationally must select only a small
number of past frames to process. We perform a rigorous comparison between
the two approaches for visual SLAM. Especially, we show that accuracy comes
from a large number of points, while the number of intermediate frames only
has a minor impact. We conclude that keyframe bundle adjustment is superior
to ltering due to a smaller computational cost.
Based on these experimental results, we develop an efficient framework for
large-scale visual SLAM using the keyframe strategy. We demonstrate that
SLAM using a single camera does not only drift in rotation and translation,
but also in scale. In particular, we perform large-scale loop closure correction
using a novel variant of pose-graph optimisation which also takes scale drift
into account. Starting from this two stage approach which tackles local motion
estimation and loop closures separately, we develop a unified framework
for real-time visual SLAM. By employing a novel double window scheme, we
present a constant-time approach which enables the local accuracy of bundle
adjustment while ensuring global consistency. Furthermore, we suggest a new
scheme for local registration using metric loop closures and present several improvements
for the visual front-end of SLAM. Our contributions are evaluated
exhaustively on a number of synthetic experiments and real-image data-set from
single cameras and range imaging devices
Local Accuracy and Global Consistency for Efficient SLAM
This thesis is concerned with the problem of Simultaneous Localisation and
Mapping (SLAM) using visual data only. Given the video stream of a moving
camera, we wish to estimate the structure of the environment and the motion
of the device most accurately and in real-time.
Two effective approaches were presented in the past. Filtering methods
marginalise out past poses and summarise the information gained over time
with a probability distribution. Keyframe methods rely on the optimisation
approach of bundle adjustment, but computationally must select only a small
number of past frames to process. We perform a rigorous comparison between
the two approaches for visual SLAM. Especially, we show that accuracy comes
from a large number of points, while the number of intermediate frames only
has a minor impact. We conclude that keyframe bundle adjustment is superior
to ltering due to a smaller computational cost.
Based on these experimental results, we develop an efficient framework for
large-scale visual SLAM using the keyframe strategy. We demonstrate that
SLAM using a single camera does not only drift in rotation and translation,
but also in scale. In particular, we perform large-scale loop closure correction
using a novel variant of pose-graph optimisation which also takes scale drift
into account. Starting from this two stage approach which tackles local motion
estimation and loop closures separately, we develop a unified framework
for real-time visual SLAM. By employing a novel double window scheme, we
present a constant-time approach which enables the local accuracy of bundle
adjustment while ensuring global consistency. Furthermore, we suggest a new
scheme for local registration using metric loop closures and present several improvements
for the visual front-end of SLAM. Our contributions are evaluated
exhaustively on a number of synthetic experiments and real-image data-set from
single cameras and range imaging devices