493 research outputs found

    Deep Affordance-grounded Sensorimotor Object Recognition

    Full text link
    It is well-established by cognitive neuroscience that human perception of objects constitutes a complex process, where object appearance information is combined with evidence about the so-called object "affordances", namely the types of actions that humans typically perform when interacting with them. This fact has recently motivated the "sensorimotor" approach to the challenging task of automatic object recognition, where both information sources are fused to improve robustness. In this work, the aforementioned paradigm is adopted, surpassing current limitations of sensorimotor object recognition research. Specifically, the deep learning paradigm is introduced to the problem for the first time, developing a number of novel neuro-biologically and neuro-physiologically inspired architectures that utilize state-of-the-art neural networks for fusing the available information sources in multiple ways. The proposed methods are evaluated using a large RGB-D corpus, which is specifically collected for the task of sensorimotor object recognition and is made publicly available. Experimental results demonstrate the utility of affordance information to object recognition, achieving an up to 29% relative error reduction by its inclusion.Comment: 9 pages, 7 figures, dataset link included, accepted to CVPR 201

    CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation

    Full text link
    The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP's strong vision-language representations to segment regions from referring expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120% and can successfully assist real-world UIBVS control in an unstructured manipulation environment

    Push to know! -- Visuo-Tactile based Active Object Parameter Inference with Dual Differentiable Filtering

    Full text link
    For robotic systems to interact with objects in dynamic environments, it is essential to perceive the physical properties of the objects such as shape, friction coefficient, mass, center of mass, and inertia. This not only eases selecting manipulation action but also ensures the task is performed as desired. However, estimating the physical properties of especially novel objects is a challenging problem, using either vision or tactile sensing. In this work, we propose a novel framework to estimate key object parameters using non-prehensile manipulation using vision and tactile sensing. Our proposed active dual differentiable filtering (ADDF) approach as part of our framework learns the object-robot interaction during non-prehensile object push to infer the object's parameters. Our proposed method enables the robotic system to employ vision and tactile information to interactively explore a novel object via non-prehensile object push. The novel proposed N-step active formulation within the differentiable filtering facilitates efficient learning of the object-robot interaction model and during inference by selecting the next best exploratory push actions (where to push? and how to push?). We extensively evaluated our framework in simulation and real-robotic scenarios, yielding superior performance to the state-of-the-art baseline.Comment: 8 pages. Accepted at IROS 202

    THE POTENTIATION OF ACTIONS BY VISUAL OBJECTS

    Get PDF
    This thesis examines the relation between visual objects and the actions they afford. It is proposed that viewing an object results in the potentiation of the actions that can be made towards it. The proposal is consistent with neurophysiological evidence that suggests that no clear divide exists between visual and motor representation in the dorsal visual pathway, a processing stream that neuropsychological evidence strongly implicates in the visual control of actions. The experimental work presented examines motor system involvement in visual representation when no intention to perform a particular action is present. It is argued that the representation of action-relevant visual object properties, such as size and orientation, has a motor component. Thus representing the location of a graspable object involves representations of the motor commands necessary to bring the hand to the object. The proposal was examined in a series of eight experiments that employed a Stimulus- Response Compatibility paradigm in which the relation between responses and stimulus properties was never made explicit. Subjects had to make choice reaction time responses that mimicked a component of an action that a viewed object afforded. The action-relevant stimulus property was always irrelevant to response determination and consisted of components of the reach and grasp movement. The results found are not consistent with explanations based on the abstract coding of stimulus-response properties and strongly implicate the involvement of the action system. They provide evidence that merely viewing an object results in the activation of the motor patterns necessary to interact with them. The actions an object affords are an intrinsic part of its visual representation, not merely on account of the association between objects and familiar actions but because the motor system is directly involved in the representation of visuo-spatial object properties

    Perceptual global processing and hierarchically organized affordances : the lack of interaction between vision-for-perception and vision-for-action

    Get PDF
    In visual information processing, two kinds of vision are distinguished: vision-for-perception related to the conscious identifi cation of objects, and vision-for-action that deals with visual control of movements. Neuroscience suggests that these two functions are performed by two separate brain neural systems – the ventral and dorsal pathways (Milner and Goodale, 1995). Two experiments using behavioural measures were conducted with the objective of exploring any potential interaction between these two functions of vision. The aim was to combine in one task methods allowing for the simultaneous capture of both perceptual global processing and affordance extraction and to check whether they infi uence each other. This aim was achieved by employing the paradigms of Navon (1977) and Tucker and Ellis (1998). A compound fi gure was created made up of objects with handles that might or might not have orientation congruent between levels. The results revealed that while the affordance effect occurred every time, the Navon effect appeared only when subjects focused their attention on object elements responsible for inconsistence within compound fi gure. Most importantly, even when these two effects occurred at once, they had no effect on each other. Results from the study failed to confi rm the hypothesis about interaction and gives support to the view that vision-for-perception and vision-for-action tend to act as separate systems

    Towards Contextual Action Recognition and Target Localization with Active Allocation of Attention

    Get PDF
    Exploratory gaze movements are fundamental for gathering the most relevant information regarding the partner during social interactions. We have designed and implemented a system for dynamic attention allocation which is able to actively control gaze movements during a visual action recognition task. During the observation of a partners reaching movement, the robot is able to contextually estimate the goal position of the partner hand and the location in space of the candidate targets, while moving its gaze around with the purpose of optimizing the gathering of information relevant for the task. Experimental results on a simulated environment show that active gaze control provides a relevant advantage with respect to typical passive observation, both in term of estimation precision and of time required for action recognition. © 2012 Springer-Verlag

    ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

    Full text link
    Physical intuition is pivotal for intelligent agents to perform complex tasks. In this paper we investigate the passive acquisition of an intuitive understanding of physical principles as well as the active utilisation of this intuition in the context of generalised object stacking. To this end, we provide: a simulation-based dataset featuring 20,000 stack configurations composed of a variety of elementary geometric primitives richly annotated regarding semantics and structural stability. We train visual classifiers for binary stability prediction on the ShapeStacks data and scrutinise their learned physical intuition. Due to the richness of the training data our approach also generalises favourably to real-world scenarios achieving state-of-the-art stability prediction on a publicly available benchmark of block towers. We then leverage the physical intuition learned by our model to actively construct stable stacks and observe the emergence of an intuitive notion of stackability - an inherent object affordance - induced by the active stacking task. Our approach performs well even in challenging conditions where it considerably exceeds the stack height observed during training or in cases where initially unstable structures must be stabilised via counterbalancing.Comment: revised version to appear at ECCV 201

    The neural bases of event monitoring across domains: a simultaneous ERP-fMRI study.

    Get PDF
    The ability to check and evaluate the environment over time with the aim to detect the occurrence of target stimuli is supported by sustained/tonic as well as transient/phasic control processes, which overall might be referred to as event monitoring. The neural underpinning of sustained control processes involves a fronto-parietal network. However, it has not been well-defined yet whether this cortical circuit acts irrespective of the specific material to be monitored and whether this mediates sustained as well as transient monitoring processes. In the current study, the functional activity of brain during an event monitoring task was investigated and compared between two cognitive domains, whose processing is mediated by differently lateralized areas. Namely, participants were asked to monitor sequences of either faces (supported by right-hemisphere regions) or tools (left-hemisphere). In order to disentangle sustained from transient components of monitoring, a simultaneous EEG-fMRI technique was adopted within a block design. When contrasting monitoring versus control blocks, the conventional fMRI analysis revealed the sustained involvement of bilateral fronto-parietal regions, in both task domains. Event-related potentials (ERPs) showed a more positive amplitude over frontal sites in monitoring compared to control blocks, providing evidence of a transient monitoring component. The joint ERP-fMRI analysis showed that, in the case of face monitoring, these transient processes rely on right-lateralized areas, including the inferior parietal lobule and the middle frontal gyrus. In the case of tools, no fronto-parietal areas correlated with the transient ERP activity, suggesting that in this domain phasic monitoring processes were masked by tonic ones. Overall, the present findings highlight the role of bilateral fronto-parietal regions in sustained monitoring, independently of the specific task requirements, and suggest that right-lateralized areas subtend transient monitoring processes, at least in some task contexts
    • …
    corecore