857 research outputs found

    Attention and Anticipation in Fast Visual-Inertial Navigation

    Get PDF
    We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to estimate its state using an on-board camera and an inertial sensor, without any prior knowledge of the external environment. We consider the case in which the robot can allocate limited resources to VIN, due to tight computational constraints. Therefore, we answer the following question: under limited resources, what are the most relevant visual cues to maximize the performance of visual-inertial navigation? Our approach has four key ingredients. First, it is task-driven, in that the selection of the visual cues is guided by a metric quantifying the VIN performance. Second, it exploits the notion of anticipation, since it uses a simplified model for forward-simulation of robot dynamics, predicting the utility of a set of visual cues over a future time horizon. Third, it is efficient and easy to implement, since it leads to a greedy algorithm for the selection of the most relevant visual cues. Fourth, it provides formal performance guarantees: we leverage submodularity to prove that the greedy selection cannot be far from the optimal (combinatorial) selection. Simulations and real experiments on agile drones show that our approach ensures state-of-the-art VIN performance while maintaining a lean processing time. In the easy scenarios, our approach outperforms appearance-based feature selection in terms of localization errors. In the most challenging scenarios, it enables accurate visual-inertial navigation while appearance-based feature selection fails to track robot's motion during aggressive maneuvers.Comment: 20 pages, 7 figures, 2 table

    On the assessment of landmark salience for human navigation

    Get PDF
    In this paper, we propose a conceptual framework for assessing the salience of landmarks for navigation. Landmark salience is derived as a result of the observer's point of view, both physical and cognitive, the surrounding environment, and the objects contained therein. This is in contrast to the currently held view that salience is an inherent property of some spatial feature. Salience, in our approach, is expressed as a three-valued Saliency Vector. The components that determine this vector are Perceptual Salience, which defines the exogenous (or passive) potential of an object or region for acquisition of visual attention, Cognitive Salience, which is an endogenous (or active) mode of orienting attention, triggered by informative cues providing advance information about the target location, and Contextual Salience, which is tightly coupled to modality and task to be performed. This separation between voluntary and involuntary direction of visual attention in dependence of the context allows defining a framework that accounts for the interaction between observer, environment, and landmark. We identify the low-level factors that contribute to each type of salience and suggest a probabilistic approach for their integration. Finally, we discuss the implications, consider restrictions, and explore the scope of the framewor

    The emergence of active perception - seeking conceptual foundations

    Get PDF
    The aim of this thesis is to explain the emergence of active perception. It takes an interdisciplinary approach, by providing the necessary conceptual foundations for active perception research - the key notions that bridge the conceptual gaps remaining in understanding emergent behaviours of active perception in the context of robotic implementations. On the one hand, the autonomous agent approach to mobile robotics claims that perception is active. On the other hand, while explanations of emergence have been extensively pursued in Artificial Life, these explanations have not yet successfully accounted for active perception.The main question dealt with in this thesis is how active perception systems, as behaviour -based autonomous systems, are capable of providing relatively optimal perceptual guidance in response to environmental challenges, which are somewhat unpredictable. The answer is: task -level emergence on grounds of complicatedly combined computational strategies, but this notion needs further explanation.To study the computational strategies undertaken in active perception re- search, the thesis surveys twelve implementations. On the basis of the surveyed implementations, discussions in this thesis show that the perceptual task executed in support of bodily actions does not arise from the intentionality of a homuncu- lus, but is identified automatically on the basis of the dynamic small mod- ules of particular robotic architectures. The identified tasks are accomplished by quasi -functional modules and quasi- action modules, which maintain transformations of perceptual inputs, compute critical variables, and provide guidance of sensory -motor movements to the most relevant positions for fetching further needed information. Given the nature of these modules, active perception emerges in a different fashion from the global behaviour seen in other autonomous agent research.The quasi- functional modules and quasi- action modules cooperate by estimating the internal cohesion of various sources of information in support of the envisaged task. Specifically, such modules basically reflect various computational facilities for a species to single out the most important characteristics of its ecological niche. These facilities help to achieve internal cohesion, by maintaining a stepwise evaluation over the previously computed information, the required task, and the most relevant features presented in the environment.Apart from the above exposition of active perception, the process of task - level emergence is understood with certain principles extracted from four models of life origin. First, the fundamental structure of active perception is identified as the stepwise computation. Second, stepwise computation is promoted from baseline to elaborate patterns, i.e. from a simple system to a combinatory system. Third, a core requirement for all stepwise computational processes is the comparison between collected and needed information in order to insure the contribution to the required task. Interestingly, this point indicates that active perception has an inherent pragmatist dimension.The understanding of emergence in the present thesis goes beyond the distinc- tion between external processes and internal representations, which some current philosophers argue is required to explain emergence. The additional factors are links of various knowledge sources, in which the role of conceptual foundations is two -fold. On the one hand, those conceptual foundations elucidate how various knowledge sources can be linked. On the other, they make possible an interdisci- plinary view of emergence. Given this two -fold role, this thesis shows the unity of task -level emergence. Thus, the thesis demonstrates a cooperation between sci- ence and philosophy for the purpose of understanding the integrity of emergent cognitive phenomena

    Scene understanding by robotic interactive perception

    Get PDF
    This thesis presents a novel and generic visual architecture for scene understanding by robotic interactive perception. This proposed visual architecture is fully integrated into autonomous systems performing object perception and manipulation tasks. The proposed visual architecture uses interaction with the scene, in order to improve scene understanding substantially over non-interactive models. Specifically, this thesis presents two experimental validations of an autonomous system interacting with the scene: Firstly, an autonomous gaze control model is investigated, where the vision sensor directs its gaze to satisfy a scene exploration task. Secondly, autonomous interactive perception is investigated, where objects in the scene are repositioned by robotic manipulation. The proposed visual architecture for scene understanding involving perception and manipulation tasks has four components: 1) A reliable vision system, 2) Camera-hand eye calibration to integrate the vision system into an autonomous robot’s kinematic frame chain, 3) A visual model performing perception tasks and providing required knowledge for interaction with scene, and finally, 4) A manipulation model which, using knowledge received from the perception model, chooses an appropriate action (from a set of simple actions) to satisfy a manipulation task. This thesis presents contributions for each of the aforementioned components. Firstly, a portable active binocular robot vision architecture that integrates a number of visual behaviours are presented. This active vision architecture has the ability to verge, localise, recognise and simultaneously identify multiple target object instances. The portability and functional accuracy of the proposed vision architecture is demonstrated by carrying out both qualitative and comparative analyses using different robot hardware configurations, feature extraction techniques and scene perspectives. Secondly, a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot are described. For this purpose, the forward kinematic model of the active robot head is derived and the methodology for calibrating and integrating the robot head is described in detail. A rigid calibration methodology has been implemented to provide a closed-form hand-to-eye calibration chain and this has been extended with a mechanism to allow the camera external parameters to be updated dynamically for optimal 3D reconstruction to meet the requirements for robotic tasks such as grasping and manipulating rigid and deformable objects. It is shown from experimental results that the robot head achieves an overall accuracy of fewer than 0.3 millimetres while recovering the 3D structure of a scene. In addition, a comparative study between current RGB-D cameras and our active stereo head within two dual-arm robotic test-beds is reported that demonstrates the accuracy and portability of our proposed methodology. Thirdly, this thesis proposes a visual perception model for the task of category-wise objects sorting, based on Gaussian Process (GP) classification that is capable of recognising objects categories from point cloud data. In this approach, Fast Point Feature Histogram (FPFH) features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide a probability estimate of the identity of the object and serves the key role of modelling perception confidence in the interactive perception cycle. The interaction stage is responsible for invoking the appropriate action skills as required to confirm the identity of an observed object with high confidence as a result of executing multiple perception-action cycles. The recognition accuracy of the proposed perception model has been validated based on simulation input data using both Support Vector Machine (SVM) and GP based multi-class classifiers. Results obtained during this investigation demonstrate that by using a GP-based classifier, it is possible to obtain true positive classification rates of up to 80\%. Experimental validation of the above semi-autonomous object sorting system shows that the proposed GP based interactive sorting approach outperforms random sorting by up to 30\% when applied to scenes comprising configurations of household objects. Finally, a fully autonomous visual architecture is presented that has been developed to accommodate manipulation skills for an autonomous system to interact with the scene by object manipulation. This proposed visual architecture is mainly made of two stages: 1) A perception stage, that is a modified version of the aforementioned visual interaction model, 2) An interaction stage, that performs a set of ad-hoc actions relying on the information received from the perception stage. More specifically, the interaction stage simply reasons over the information (class label and associated probabilistic confidence score) received from perception stage to choose one of the following two actions: 1) An object class has been identified with high confidence, so remove from the scene and place it in the designated basket/bin for that particular class. 2) An object class has been identified with less probabilistic confidence, since from observation and inspired from the human behaviour of inspecting doubtful objects, an action is chosen to further investigate that object in order to confirm the object’s identity by capturing more images from different views in isolation. The perception stage then processes these views, hence multiple perception-action/interaction cycles take place. From an application perspective, the task of autonomous category based objects sorting is performed and the experimental design for the task is described in detail

    Sonar attentive underwater navigation in structured environment

    Get PDF
    One of the fundamental requirements of a persistently Autonomous Underwater Vehicle (AUV) is a robust navigation system. The success of most complex robotic tasks depends on the accuracy of a vehicle’s navigation system. In a basic form, an AUV estimates its position using an on-board navigation sensors through Dead-Reckoning (DR). However DR navigation systems tends to drift in the long run due to accumulated measurement errors. One way of mitigating this problem require the use of Simultaneous Localization and Mapping (SLAM) by concurrently mapping external environment features. The performance of a SLAM navigation system depends on the availability of enough good features in the environment. On the contrary, a typical underwater structured environment (harbour, pier or oilfield) has a limited amount of sonar features in a limited locations, hence exploitation of good features is a key for effective underwater SLAM. This thesis develops a novel attentive sonar line feature based SLAM framework that improves the performance of a SLAM navigation by steering a multibeam sonar sensor,which is mounted on a pan and tilt unit, towards feature-rich regions of the environment. A sonar salience map is generated at each vehicle pose to identify highly informative and stable regions of the environment. Results from a simulated test and real AUV experiment show an attentive SLAM performs better than a passive counterpart by repeatedly visiting good sonar landmarks

    On the assessment of landmark salience for human navigation

    Full text link
    In this paper, we propose a conceptual framework for assessing the salience of landmarks for navigation. Landmark salience is derived as a result of the observer’s point of view, both physical and cognitive, the surrounding environment, and the objects contained therein. This is in contrast to the currently held view that salience is an inherent property of some spatial feature. Salience, in our approach, is expressed as a three-valued Saliency Vector. The components that determine this vector are Perceptual Salience, which defines the exogenous (or passive) potential of an object or region for acquisition of visual attention, Cognitive Salience, which is an endogenous (or active) mode of orienting attention, triggered by informative cues providing advance information about the target location, and Contextual Salience, which is tightly coupled to modality and task to be performed. This separation between voluntary and involuntary direction of visual attention in dependence of the context allows defining a framework that accounts for the interaction between observer, environment, and landmark. We identify the low-level factors that contribute to each type of salience and suggest a probabilistic approach for their integration. Finally, we discuss the implications, consider restrictions, and explore the scope of the framework
    corecore