11 research outputs found

    Attention and Anticipation in Fast Visual-Inertial Navigation

    Get PDF
    We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to estimate its state using an on-board camera and an inertial sensor, without any prior knowledge of the external environment. We consider the case in which the robot can allocate limited resources to VIN, due to tight computational constraints. Therefore, we answer the following question: under limited resources, what are the most relevant visual cues to maximize the performance of visual-inertial navigation? Our approach has four key ingredients. First, it is task-driven, in that the selection of the visual cues is guided by a metric quantifying the VIN performance. Second, it exploits the notion of anticipation, since it uses a simplified model for forward-simulation of robot dynamics, predicting the utility of a set of visual cues over a future time horizon. Third, it is efficient and easy to implement, since it leads to a greedy algorithm for the selection of the most relevant visual cues. Fourth, it provides formal performance guarantees: we leverage submodularity to prove that the greedy selection cannot be far from the optimal (combinatorial) selection. Simulations and real experiments on agile drones show that our approach ensures state-of-the-art VIN performance while maintaining a lean processing time. In the easy scenarios, our approach outperforms appearance-based feature selection in terms of localization errors. In the most challenging scenarios, it enables accurate visual-inertial navigation while appearance-based feature selection fails to track robot's motion during aggressive maneuvers.Comment: 20 pages, 7 figures, 2 table

    Drift and stabilization of cortical response selectivity

    Get PDF
    Synaptic turnover and long term functional stability are two seemingly contradicting features of neuronal networks, which show varying expressions across different brain regions. Recent studies have shown, how both of these are strongly expressed in the hippocampus, raising the question how this can be reconciled within a biological network. In this work, I use a data set of neuron activity from mice behaving within a virtual environment recorded over up to several months to extend and develop methods, showing how the activity of hundreds of neurons per individual animal can be reliably tracked and characterized. I employ these methods to analyze network- and individual neuron behavior during the initial formation of a place map from the activity of individual place cells while the animal learns to navigate in a new environment, as well as during the condition of a constant environment over several weeks. In a published study included in this work, we find that map formation is driven by selective stabilization of place cells coding for salient regions, with distinct characteristics for neurons coding for landmark, reward, or other locations. Strikingly, we find that in mice lacking Shank2, an autism spectrum disorder (ASD)-linked gene encoding an excitatory postsynaptic scaffold protein, a characteristic overrepresentation of visual landmarks is missing while the overrepresentation of reward location remains intact, suggesting different underlying mechanisms in the stabilization. In the condition of a constant environment, I find how turnover dynamics largely decouple from the location of a place field and are governed by a strong decorrelation of population activity on short time scales (hours to days), followed by long-lasting correlations (days to months) above chance level. In agreement with earlier studies, I find a slow, constant drift in the population of active neurons, while – contrary to earlier results – place fields within the active population are assumed approximately randomly. Place field movement across days is governed by periods of stability around an anchor position, interrupted by random, long-range relocation. The data does not suggest the existence of populations of neurons showing distinct properties of stability, but rather shows a continuous range from highly unstable to very stable functional- and non-functional activity. Average timescales of reliable contributions to the neural code are on the order of few days, in agreement with earlier reported timescales of synaptic turnover in the hippocampus.2021-08-0

    Computational visual attention systems and their cognitive foundation: A survey

    Get PDF
    Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. (c) 2010 ACMBased on concepts of the human visual system, computational visual attention systems aim to detect regions of interest in images. Psychologists, neurobiologists, and computer scientists have investigated visual attention thoroughly during the last decades and profited considerably from each other. However, the interdisciplinarity of the topic holds not only benefits but also difficulties: concepts of other fields are usually hard to access due to differences in vocabulary and lack of knowledge of the relevant literature. This paper aims to bridge this gap and bring together concepts and ideas from the different research areas. It provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. Furthermore, it presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems and mobile robotics. We conclude with a discussion on the limitations and open questions in the field

    Local Accuracy and Global Consistency for Efficient SLAM

    No full text
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to ltering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local motion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several improvements for the visual front-end of SLAM. Our contributions are evaluated exhaustively on a number of synthetic experiments and real-image data-set from single cameras and range imaging devices

    Local Accuracy and Global Consistency for Efficient SLAM

    Get PDF
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to ltering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local motion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several improvements for the visual front-end of SLAM. Our contributions are evaluated exhaustively on a number of synthetic experiments and real-image data-set from single cameras and range imaging devices
    corecore