493 research outputs found
Deep Affordance-grounded Sensorimotor Object Recognition
It is well-established by cognitive neuroscience that human perception of
objects constitutes a complex process, where object appearance information is
combined with evidence about the so-called object "affordances", namely the
types of actions that humans typically perform when interacting with them. This
fact has recently motivated the "sensorimotor" approach to the challenging task
of automatic object recognition, where both information sources are fused to
improve robustness. In this work, the aforementioned paradigm is adopted,
surpassing current limitations of sensorimotor object recognition research.
Specifically, the deep learning paradigm is introduced to the problem for the
first time, developing a number of novel neuro-biologically and
neuro-physiologically inspired architectures that utilize state-of-the-art
neural networks for fusing the available information sources in multiple ways.
The proposed methods are evaluated using a large RGB-D corpus, which is
specifically collected for the task of sensorimotor object recognition and is
made publicly available. Experimental results demonstrate the utility of
affordance information to object recognition, achieving an up to 29% relative
error reduction by its inclusion.Comment: 9 pages, 7 figures, dataset link included, accepted to CVPR 201
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation
The classical human-robot interface in uncalibrated image-based visual
servoing (UIBVS) relies on either human annotations or semantic segmentation
with categorical labels. Both methods fail to match natural human communication
and convey rich semantics in manipulation tasks as effectively as natural
language expressions. In this paper, we tackle this problem by using referring
expression segmentation, which is a prompt-based approach, to provide more
in-depth information for robot perception. To generate high-quality
segmentation predictions from referring expressions, we propose CLIPUNetr - a
new CLIP-driven referring expression segmentation network. CLIPUNetr leverages
CLIP's strong vision-language representations to segment regions from referring
expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to
generate predictions with sharper boundaries and finer structures. Furthermore,
we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to
control robots in real-world environments. In experiments, our method improves
boundary and structure measurements by an average of 120% and can successfully
assist real-world UIBVS control in an unstructured manipulation environment
Push to know! -- Visuo-Tactile based Active Object Parameter Inference with Dual Differentiable Filtering
For robotic systems to interact with objects in dynamic environments, it is
essential to perceive the physical properties of the objects such as shape,
friction coefficient, mass, center of mass, and inertia. This not only eases
selecting manipulation action but also ensures the task is performed as
desired. However, estimating the physical properties of especially novel
objects is a challenging problem, using either vision or tactile sensing. In
this work, we propose a novel framework to estimate key object parameters using
non-prehensile manipulation using vision and tactile sensing. Our proposed
active dual differentiable filtering (ADDF) approach as part of our framework
learns the object-robot interaction during non-prehensile object push to infer
the object's parameters. Our proposed method enables the robotic system to
employ vision and tactile information to interactively explore a novel object
via non-prehensile object push. The novel proposed N-step active formulation
within the differentiable filtering facilitates efficient learning of the
object-robot interaction model and during inference by selecting the next best
exploratory push actions (where to push? and how to push?). We extensively
evaluated our framework in simulation and real-robotic scenarios, yielding
superior performance to the state-of-the-art baseline.Comment: 8 pages. Accepted at IROS 202
THE POTENTIATION OF ACTIONS BY VISUAL OBJECTS
This thesis examines the relation between visual objects and the actions they afford. It
is proposed that viewing an object results in the potentiation of the actions that can be made
towards it. The proposal is consistent with neurophysiological evidence that suggests that
no clear divide exists between visual and motor representation in the dorsal visual pathway,
a processing stream that neuropsychological evidence strongly implicates in the visual
control of actions. The experimental work presented examines motor system involvement
in visual representation when no intention to perform a particular action is present. It is
argued that the representation of action-relevant visual object properties, such as size and
orientation, has a motor component. Thus representing the location of a graspable object
involves representations of the motor commands necessary to bring the hand to the object.
The proposal was examined in a series of eight experiments that employed a Stimulus-
Response Compatibility paradigm in which the relation between responses and stimulus
properties was never made explicit. Subjects had to make choice reaction time responses
that mimicked a component of an action that a viewed object afforded. The action-relevant
stimulus property was always irrelevant to response determination and consisted of
components of the reach and grasp movement. The results found are not consistent with
explanations based on the abstract coding of stimulus-response properties and strongly
implicate the involvement of the action system. They provide evidence that merely viewing
an object results in the activation of the motor patterns necessary to interact with them.
The actions an object affords are an intrinsic part of its visual representation, not merely on
account of the association between objects and familiar actions but because the motor
system is directly involved in the representation of visuo-spatial object properties
Perceptual global processing and hierarchically organized affordances : the lack of interaction between vision-for-perception and vision-for-action
In visual information processing, two kinds of vision are distinguished: vision-for-perception related to the
conscious identifi cation of objects, and vision-for-action that deals with visual control of movements. Neuroscience
suggests that these two functions are performed by two separate brain neural systems – the ventral and dorsal pathways
(Milner and Goodale, 1995). Two experiments using behavioural measures were conducted with the objective of exploring
any potential interaction between these two functions of vision. The aim was to combine in one task methods allowing
for the simultaneous capture of both perceptual global processing and affordance extraction and to check whether they
infi uence each other. This aim was achieved by employing the paradigms of Navon (1977) and Tucker and Ellis (1998).
A compound fi gure was created made up of objects with handles that might or might not have orientation congruent
between levels. The results revealed that while the affordance effect occurred every time, the Navon effect appeared only
when subjects focused their attention on object elements responsible for inconsistence within compound fi gure. Most
importantly, even when these two effects occurred at once, they had no effect on each other. Results from the study failed
to confi rm the hypothesis about interaction and gives support to the view that vision-for-perception and vision-for-action
tend to act as separate systems
Towards Contextual Action Recognition and Target Localization with Active Allocation of Attention
Exploratory gaze movements are fundamental for gathering the most relevant information regarding the partner during social interactions. We have designed and implemented a system for dynamic attention allocation which is able to actively control gaze movements during a visual action recognition task. During the observation of a partners reaching movement, the robot is able to contextually estimate the goal position of the partner hand and the location in space of the candidate targets, while moving its gaze around with the purpose of optimizing the gathering of information relevant for the task. Experimental results on a simulated environment show that active gaze control provides a relevant advantage with respect to typical passive observation, both in term of estimation precision and of time required for action recognition. © 2012 Springer-Verlag
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
Physical intuition is pivotal for intelligent agents to perform complex
tasks. In this paper we investigate the passive acquisition of an intuitive
understanding of physical principles as well as the active utilisation of this
intuition in the context of generalised object stacking. To this end, we
provide: a simulation-based dataset featuring 20,000 stack configurations
composed of a variety of elementary geometric primitives richly annotated
regarding semantics and structural stability. We train visual classifiers for
binary stability prediction on the ShapeStacks data and scrutinise their
learned physical intuition. Due to the richness of the training data our
approach also generalises favourably to real-world scenarios achieving
state-of-the-art stability prediction on a publicly available benchmark of
block towers. We then leverage the physical intuition learned by our model to
actively construct stable stacks and observe the emergence of an intuitive
notion of stackability - an inherent object affordance - induced by the active
stacking task. Our approach performs well even in challenging conditions where
it considerably exceeds the stack height observed during training or in cases
where initially unstable structures must be stabilised via counterbalancing.Comment: revised version to appear at ECCV 201
The neural bases of event monitoring across domains: a simultaneous ERP-fMRI study.
The ability to check and evaluate the environment over time with the aim to detect the occurrence of target stimuli is supported by sustained/tonic as well as transient/phasic control processes, which overall might be referred to as event monitoring. The neural underpinning of sustained control processes involves a fronto-parietal network. However, it has not been well-defined yet whether this cortical circuit acts irrespective of the specific material to be monitored and whether this mediates sustained as well as transient monitoring processes. In the current study, the functional activity of brain during an event monitoring task was investigated and compared between two cognitive domains, whose processing is mediated by differently lateralized areas. Namely, participants were asked to monitor sequences of either faces (supported by right-hemisphere regions) or tools (left-hemisphere). In order to disentangle sustained from transient components of monitoring, a simultaneous EEG-fMRI technique was adopted within a block design. When contrasting monitoring versus control blocks, the conventional fMRI analysis revealed the sustained involvement of bilateral fronto-parietal regions, in both task domains. Event-related potentials (ERPs) showed a more positive amplitude over frontal sites in monitoring compared to control blocks, providing evidence of a transient monitoring component. The joint ERP-fMRI analysis showed that, in the case of face monitoring, these transient processes rely on right-lateralized areas, including the inferior parietal lobule and the middle frontal gyrus. In the case of tools, no fronto-parietal areas correlated with the transient ERP activity, suggesting that in this domain phasic monitoring processes were masked by tonic ones. Overall, the present findings highlight the role of bilateral fronto-parietal regions in sustained monitoring, independently of the specific task requirements, and suggest that right-lateralized areas subtend transient monitoring processes, at least in some task contexts
The cognitive representation of action: modulation effects between action and perception as mediated by the event cosing
Gaspare Galati, Teresa Scalisi, Pierluigi Zoccolott
- …