Search CORE

151,122 research outputs found

Recommended from our members

Embodied learning for visual recognition

Author: Jayaraman Dinesh
Publication venue
Publication date: 05/02/2018
Field of study

The field of visual recognition in recent years has come to rely on large expensively curated and manually labeled "bags of disembodied images". In the wake of this, my focus has been on understanding and exploiting alternate "free" sources of supervision available to visual learning agents that are situated within real environments. For example, even simply moving from orderless image collections to continuous visual observations offers opportunities to understand the dynamics and other physical properties of the visual world. Further, embodied agents may have the abilities to move around their environment and/or effect changes within it, in which case these abilities offer new means to acquire useful supervision. In this dissertation, I present my work along this and related directions.Electrical and Computer Engineerin

Texas ScholarWorks

SeekNet: Improved Human Instance Segmentation via Reinforcement Learning Based Optimized Robot Relocation

Author: Bera Aniket
Manoghar Bala Murali
Narayanan Venkatraman
RV Rama Prashanth
Publication venue
Publication date: 17/11/2020
Field of study

Amodal recognition is the ability of the system to detect occluded objects. Most state-of-the-art Visual Recognition systems lack the ability to perform amodal recognition. Few studies have achieved amodal recognition through passive prediction or embodied recognition approaches. However, these approaches suffer from challenges in real-world applications, such as dynamic objects. We propose SeekNet, an improved optimization method for amodal recognition through embodied visual recognition. Additionally, we implement SeekNet for social robots, where there are multiple interactions with crowded humans. Hence, we focus on occluded human detection & tracking and showcase the superiority of our algorithm over other baselines. We also experiment with SeekNet to improve the confidence of COVID-19 symptoms pre-screening algorithms using our efficient embodied recognition system

arXiv.org e-Print Archive

Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition

Author: Caputo Barbara
Carlucci FABIO MARIA
Colosi Mirco
D'Innocente Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7{\%}, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions

Author: Duchaine BC
Garrido L
Pitcher D
Walsh V
Publication venue: 'Society for Neuroscience'
Publication date: 03/09/2008
Field of study

Copyright © 2008 Society for Neuroscience and the authors. The The Journal of Neuroscience uses a Creative Commons Attribution-NonCommercial-ShareAlike licence: http://creativecommons.org/licenses/by-nc-sa/4.0/.Theories of embodied cognition propose that recognizing facial expressions requires visual processing followed by simulation of the somatovisceral responses associated with the perceived expression. To test this proposal, we targeted the right occipital face area (rOFA) and the face region of right somatosensory cortex (rSC) with repetitive transcranial magnetic stimulation (rTMS) while participants discriminated facial expressions. rTMS selectively impaired discrimination of facial expressions at both sites but had no effect on a matched face identity task. Site specificity within the rSC was demonstrated by targeting rTMS at the face and finger regions while participants performed the expression discrimination task. rTMS targeted at the face region impaired task performance relative to rTMS targeted at the finger region. To establish the temporal course of visual and somatosensory contributions to expression processing, double-pulse TMS was delivered at different times to rOFA and rSC during expression discrimination. Accuracy dropped when pulses were delivered at 60–100 ms at rOFA and at 100–140 and 130–170 ms at rSC. These sequential impairments at rOFA and rSC support embodied accounts of expression recognition as well as hierarchical models of face processing. The results also demonstrate that nonvisual cortical areas contribute during early stages of expression processing.Biotechnology and Biological Sciences Research Counci

Brunel University Research Archive

Multimodal Speech Recognition for Language-Guided Embodied Agents

Author: Ahn Seoho
Chang Allen
Monga Aarav
Srinivasan Tejas
Thomason Jesse
Zhu Xiaoyuan
Publication venue
Publication date: 31/05/2023
Field of study

Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions. While Automatic Speech Recognition (ASR) models can bridge the input gap, erroneous ASR transcripts can hurt the agents' ability to complete tasks. In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructions by considering the accompanying visual context. We train our model on a dataset of spoken instructions, synthesized from the ALFRED task completion dataset, where we simulate acoustic noise by systematically masking spoken words. We find that utilizing visual observations facilitates masked word recovery, with multimodal ASR models recovering up to 30% more masked words than unimodal baselines. We also find that a text-trained embodied agent successfully completes tasks more often by following transcribed instructions from multimodal ASR models. github.com/Cylumn/embodied-multimodal-asrComment: 5 pages, 5 figures, 24th ISCA Interspeech Conference (INTERSPEECH 2023

arXiv.org e-Print Archive

An embodied model for handwritten digits recognition in a cognitive robot

Author: Di Nuovo Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/02/2017
Field of study

This paper presents an embodied model for recognition of handwritten digits in a cognitive developmental robot scenario. Inspired by neuro-psychological data, the model integrates three modules: a stacked auto-encoder network to process the visual information, a feedforward neural controller for the fingers, and a generalized regression network that associates number digits to finger configurations. Results from developmental learning experiments show an improvement in the digits' recognition rate thanks to the inclusion of the robot fingers in the training especially in its early stages (epochs) or with a low number of examples. This behaviour can be linked to that observed in psychological studies with children, who seem to benefit of finger counting only in the initial stage of mathematical learning. These results suggest the potential of the embodied approach to favour the creation of a psychologically plausible developmental model for mathematical cognition in robots and to support the creation of more complex models of human-like behaviours

Crossref

Sheffield Hallam University Research Archive

The Whole World in Your Hand: Active and Interactive Segmentation

Author: Arsenio Artur
Fitzpatrick Paul
Kemp Charles C.
Metta Giorgio
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2003
Field of study

Object segmentation is a fundamental problem in computer vision and a powerful resource for development. This paper presents three embodied approaches to the visual segmentation of objects. Each approach to segmentation is aided by the presence of a hand or arm in the proximity of the object to be segmented. The first approach is suitable for a robotic system, where the robot can use its arm to evoke object motion. The second method operates on a wearable system, viewing the world from a human's perspective, with instrumentation to help detect and segment objects that are held in the wearer's hand. The third method operates when observing a human teacher, locating periodic motion (finger/arm/object waving or tapping) and using it as a seed for segmentation. We show that object segmentation can serve as a key resource for development by demonstrating methods that exploit high-quality object segmentations to develop both low-level vision capabilities (specialized feature detectors) and high-level vision capabilities (object recognition and localization)

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive