151,122 research outputs found
Recommended from our members
Embodied learning for visual recognition
The field of visual recognition in recent years has come to rely on large expensively curated and manually labeled "bags of disembodied images". In the wake of this, my focus has been on understanding and exploiting alternate "free" sources of supervision available to visual learning agents that are situated within real environments. For example, even simply moving from orderless image collections to continuous visual observations offers opportunities to understand the dynamics and other physical properties of the visual world. Further, embodied agents may have the abilities to move around their environment and/or effect changes within it, in which case these abilities offer new means to acquire useful supervision. In this dissertation, I present my work along this and related directions.Electrical and Computer Engineerin
SeekNet: Improved Human Instance Segmentation via Reinforcement Learning Based Optimized Robot Relocation
Amodal recognition is the ability of the system to detect occluded objects.
Most state-of-the-art Visual Recognition systems lack the ability to perform
amodal recognition. Few studies have achieved amodal recognition through
passive prediction or embodied recognition approaches. However, these
approaches suffer from challenges in real-world applications, such as dynamic
objects. We propose SeekNet, an improved optimization method for amodal
recognition through embodied visual recognition. Additionally, we implement
SeekNet for social robots, where there are multiple interactions with crowded
humans. Hence, we focus on occluded human detection & tracking and showcase the
superiority of our algorithm over other baselines. We also experiment with
SeekNet to improve the confidence of COVID-19 symptoms pre-screening algorithms
using our efficient embodied recognition system
Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition
Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7{\%}, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available
Recommended from our members
Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions
Copyright © 2008 Society for Neuroscience and the authors. The The Journal of Neuroscience uses a Creative Commons Attribution-NonCommercial-ShareAlike licence: http://creativecommons.org/licenses/by-nc-sa/4.0/.Theories of embodied cognition propose that recognizing facial expressions requires visual processing followed by simulation of the somatovisceral responses associated with the perceived expression. To test this proposal, we targeted the right occipital face area (rOFA) and the face region of right somatosensory cortex (rSC) with repetitive transcranial magnetic stimulation (rTMS) while participants discriminated facial expressions. rTMS selectively impaired discrimination of facial expressions at both sites but had no effect on a matched face identity task. Site specificity within the rSC was demonstrated by targeting rTMS at the face and finger regions while participants performed the expression discrimination task. rTMS targeted at the face region impaired task performance relative to rTMS targeted at the finger region. To establish the temporal course of visual and somatosensory contributions to expression processing, double-pulse TMS was delivered at different times to rOFA and rSC during expression discrimination. Accuracy dropped when pulses were delivered at 60–100 ms at rOFA and at 100–140 and 130–170 ms at rSC. These sequential impairments at rOFA and rSC support embodied accounts of expression recognition as well as hierarchical models of face processing. The results also demonstrate that nonvisual cortical areas contribute during early stages of expression processing.Biotechnology and Biological Sciences Research Counci
Multimodal Speech Recognition for Language-Guided Embodied Agents
Benchmarks for language-guided embodied agents typically assume text-based
instructions, but deployed agents will encounter spoken instructions. While
Automatic Speech Recognition (ASR) models can bridge the input gap, erroneous
ASR transcripts can hurt the agents' ability to complete tasks. In this work,
we propose training a multimodal ASR model to reduce errors in transcribing
spoken instructions by considering the accompanying visual context. We train
our model on a dataset of spoken instructions, synthesized from the ALFRED task
completion dataset, where we simulate acoustic noise by systematically masking
spoken words. We find that utilizing visual observations facilitates masked
word recovery, with multimodal ASR models recovering up to 30% more masked
words than unimodal baselines. We also find that a text-trained embodied agent
successfully completes tasks more often by following transcribed instructions
from multimodal ASR models. github.com/Cylumn/embodied-multimodal-asrComment: 5 pages, 5 figures, 24th ISCA Interspeech Conference (INTERSPEECH
2023
An embodied model for handwritten digits recognition in a cognitive robot
This paper presents an embodied model for recognition of handwritten digits in a cognitive developmental robot scenario. Inspired by neuro-psychological data, the model integrates three modules: a stacked auto-encoder network to process the visual information, a feedforward neural controller for the fingers, and a generalized regression network that associates number digits to finger configurations. Results from developmental learning experiments show an improvement in the digits' recognition rate thanks to the inclusion of the robot fingers in the training especially in its early stages (epochs) or with a low number of examples. This behaviour can be linked to that observed in psychological studies with children, who seem to benefit of finger counting only in the initial stage of mathematical learning. These results suggest the potential of the embodied approach to favour the creation of a psychologically plausible developmental model for mathematical cognition in robots and to support the creation of more complex models of human-like behaviours
The Whole World in Your Hand: Active and Interactive Segmentation
Object segmentation is a fundamental problem
in computer vision and a powerful resource for
development. This paper presents three embodied approaches to the visual segmentation of objects. Each approach to segmentation is aided
by the presence of a hand or arm in the proximity of the object to be segmented. The first
approach is suitable for a robotic system, where
the robot can use its arm to evoke object motion. The second method operates on a wearable system, viewing the world from a human's
perspective, with instrumentation to help detect
and segment objects that are held in the wearer's
hand. The third method operates when observing
a human teacher, locating periodic motion (finger/arm/object waving or tapping) and using it
as a seed for segmentation. We show that object segmentation can serve as a key resource for
development by demonstrating methods that exploit high-quality object segmentations to develop
both low-level vision capabilities (specialized feature detectors) and high-level vision capabilities
(object recognition and localization)
- …