146,550 research outputs found

    Human visual exploration reduces uncertainty about the sensed world

    Get PDF
    In previous papers, we introduced a normative scheme for scene construction and epistemic (visual) searches based upon active inference. This scheme provides a principled account of how people decide where to look, when categorising a visual scene based on its contents. In this paper, we use active inference to explain the visual searches of normal human subjects; enabling us to answer some key questions about visual foraging and salience attribution. First, we asked whether there is any evidence for ‘epistemic foraging’; i.e. exploration that resolves uncertainty about a scene. In brief, we used Bayesian model comparison to compare Markov decision process (MDP) models of scan-paths that did–and did not–contain the epistemic, uncertainty-resolving imperatives for action selection. In the course of this model comparison, we discovered that it was necessary to include non-epistemic (heuristic) policies to explain observed behaviour (e.g., a reading-like strategy that involved scanning from left to right). Despite this use of heuristic policies, model comparison showed that there is substantial evidence for epistemic foraging in the visual exploration of even simple scenes. Second, we compared MDP models that did–and did not–allow for changes in prior expectations over successive blocks of the visual search paradigm. We found that implicit prior beliefs about the speed and accuracy of visual searches changed systematically with experience. Finally, we characterised intersubject variability in terms of subject-specific prior beliefs. Specifically, we used canonical correlation analysis to see if there were any mixtures of prior expectations that could predict between-subject differences in performance; thereby establishing a quantitative link between different behavioural phenotypes and Bayesian belief updating. We demonstrated that better scene categorisation performance is consistently associated with lower reliance on heuristics; i.e., a greater use of a generative model of the scene to direct its exploration. © 2018 Mirza et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

    An autonomous active vision system for complete and accurate 3D scene reconstruction

    Get PDF
    International audienceWe propose in this paper an active vision approach for performing the 3D reconstruction of static scenes. The perception-action cycles are handled at various levels: from the definition of perception strategies for scene exploration downto the automatic generation of camera motions using visual servoing. To perform the reconstruction, we use a structure from controlled motion method which allows an optimal estimation of geometrical primitive parameters. As this method is based on particular camera motions, perceptual strategies able to appropriately perform a succession of such individual primitive reconstructions are proposed in order to recover the complete spatial structure of the scene. Two algorithms are proposed to ensure the exploration of the scene. The former is an incremental reconstruction algorithm based on the use of a prediction/verification scheme managed using decision theory and Bayes nets. It allows the visual system to get a high level description of the observed part of the scene. The latter, based on the computation of new viewpoints ensures the complete reconstruction of the scene. Experiments carried out on a robotic cell have demonstrated the validity of our approach

    Computational Modelling of Information Gathering

    Get PDF
    This thesis describes computational modelling of information gathering behaviour under active inference – a framework for describing Bayes optimal behaviour. Under active inference perception, attention and action all serve for same purpose: minimising variational free energy. Variational free energy is an upper bound on surprise and minimising it maximises an agent’s evidence for its survival. An agent achieves this by acquiring information (resolving uncertainty) about the hidden states of the world and uses the acquired information to act on the outcomes it prefers. In this work I placed special emphasis on the resolution of uncertainty about the states of the world. I first created a visual search task called scene construction task. In this task one needs to accumulate evidence for competing hypotheses (different visual scenes) through sequential sampling of a visual scene and categorising it once there is sufficient evidence. I showed that a computational agent attends to the most salient (epistemically valuable) locations in this task. In the next, this task was performed by healthy humans. Healthy people’s exploration strategies provided evidence for uncertainty driven exploration. I also showed how different exploratory behaviours can be characterised using canonical correlation analysis. In the next study I showed how exploration of a visual scene under different instructions could be explained by appealing to the computational mechanisms that may correspond to attention. This entailed manipulating the precision of task irrelevant cues and their hidden causes as a function of instructions. In the final work, I was interested in characterising impulsive behaviour using a patch leaving paradigm. By varying the parameters of the MDP model, I showed that there could be at least three distinct causes of impulsive behaviour, namely a lower depth of planning, a lower capacity to maintain and process information, and an increased perceived value of immediate rewards

    Incremental Learning for Robot Perception through HRI

    Full text link
    Scene understanding and object recognition is a difficult to achieve yet crucial skill for robots. Recently, Convolutional Neural Networks (CNN), have shown success in this task. However, there is still a gap between their performance on image datasets and real-world robotics scenarios. We present a novel paradigm for incrementally improving a robot's visual perception through active human interaction. In this paradigm, the user introduces novel objects to the robot by means of pointing and voice commands. Given this information, the robot visually explores the object and adds images from it to re-train the perception module. Our base perception module is based on recent development in object detection and recognition using deep learning. Our method leverages state of the art CNNs from off-line batch learning, human guidance, robot exploration and incremental on-line learning

    Learning to look in different environments: an active-vision model which learns and readapts visual routines

    Get PDF
    One of the main claims of the active vision framework is that finding data on the basis of task requirements is more efficient than reconstructing the whole scene by performing a complete visual scan. To be successful, this approach requires that agents learn visual routines to direct overt attention to locations with the information needed to accomplish the task. In ecological conditions, learning such visual routines is difficult due to the partial observability of the world, the changes in the environment, and the fact that learning signals might be indirect. This paper uses a reinforcement-learning actor-critic model to study how visual routines can be formed, and then adapted when the environment changes, in a system endowed with a controllable gaze and reaching capabilities. The tests of the model show that: (a) the autonomously-developed visual routines are strongly dependent on the task and the statistical properties of the environment; (b) when the statistics of the environment change, the performance of the system remains rather stable thanks to the re-use of previously discovered visual routines while the visual exploration policy remains for long time sub-optimal. We conclude that the model has a robust behaviour but the acquisition of an optimal visual exploration policy is particularly hard given its complex dependence on statistical properties of the environment, showing another of the difficulties that adaptive active vision agents must face

    Data-Efficient Learning of Semantic Segmentation

    Get PDF
    Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in unlabelled frames in addition to sparsely labelled frames to predict semantic segmentation. Extensive experiments on the CityScapes and CamVid datasets show that the model can improve accuracy and temporal consistency by using extra unlabelled video frames in training and testing.In the second, third and fourth paper we study active learning for semantic segmentation in an embodied context where navigation is part of the problem. A navigable agent should explore a building and query for the labelling of informative views that increase the visual perception of the agent. In the second paper we introduce the embodied visual active learning problem, and propose and evaluate a range of methods from heuristic baselines to a fully trainable agent using reinforcement learning (RL) on the Matterport3D dataset. We show that the learned agent outperforms several comparable pre-specified baselines. In the third paper we study the embodied visual active learning problem in a lifelong setup, where the visual learning spans the exploration of multiple buildings, and the learning in one scene should influence the active learning in the next e.g. by not annotating already accurately segmented object classes. We introduce new methodology to encourage global exploration of scenes, via an RL-formulation that combines local navigation with global exploration by frontier exploration. We show that the RL-agent can learn adaptable behaviour such as annotating less frequently when it already has explored a number of buildings. Finally we study the embodied visual active learning problem with region-based active learning in the fourth paper. Instead of querying for annotations for a whole image, an agent can query for annotations of just parts of images, and we show that it is significantly more labelling efficient to just annotate regions in the image instead of the full images

    A Portable Active Binocular Robot Vision Architecture for Scene Exploration

    Get PDF
    We present a portable active binocular robot vision archi- tecture that integrates a number of visual behaviours. This vision archi- tecture inherits the abilities of vergence, localisation, recognition and si- multaneous identification of multiple target object instances. To demon- strate the portability of our vision architecture, we carry out qualitative and comparative analysis under two different hardware robotic settings, feature extraction techniques and viewpoints. Our portable active binoc- ular robot vision architecture achieved average recognition rates of 93.5% for fronto-parallel viewpoints and, 83% percentage for anthropomorphic viewpoints, respectively
    • …
    corecore