Search CORE

69,437 research outputs found

An Active Pattern Recognition Architecture for Mobile Robots

Author: Aguilar J. Mario
Contreras-Vidal Jose L.
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/01/1993
Field of study

An active, attentionally-modulated recognition architecture is proposed for object recognition and scene analysis. The proposed architecture forms part of navigation and trajectory planning modules for mobile robots. Key characteristics of the system include movement planning and execution based on environmental factors and internal goal definitions. Real-time implementation of the system is based on space-variant representation of the visual field, as well as an optimal visual processing scheme utilizing separate and parallel channels for the extraction of boundaries and stimulus qualities. A spatial and temporal grouping module (VWM) allows for scene scanning, multi-object segmentation, and featural/object priming. VWM is used to modulate a tn~ectory formation module capable of redirecting the focus of spatial attention. Finally, an object recognition module based on adaptive resonance theory is interfaced through VWM to the visual processing module. The system is capable of using information from different modalities to disambiguate sensory input.Defense Advanced Research Projects Agency (90-0083); Office of Naval Research (N00014-92-J-1309); Consejo Nacional de Ciencia y Tecnología (63462

Boston University Institutional Repository (OpenBU)

Telepath: Understanding Users from a Human Vision Perspective in Large-Scale Recommender Systems

Author: He Yang
Hu Jinghe
Li Mantian
Wang Yu
Wu Aohan
Xu Jixing
Yan Weipeng P.
Publication venue
Publication date: 04/09/2017
Field of study

Designing an e-commerce recommender system that serves hundreds of millions of active users is a daunting challenge. From a human vision perspective, there're two key factors that affect users' behaviors: items' attractiveness and their matching degree with users' interests. This paper proposes Telepath, a vision-based bionic recommender system model, which understands users from such perspective. Telepath is a combination of a convolutional neural network (CNN), a recurrent neural network (RNN) and deep neural networks (DNNs). Its CNN subnetwork simulates the human vision system to extract key visual signals of items' attractiveness and generate corresponding activations. Its RNN and DNN subnetworks simulate cerebral cortex to understand users' interest based on the activations generated from browsed items. In practice, the Telepath model has been launched to JD's recommender system and advertising system. For one of the major item recommendation blocks on the JD app, click-through rate (CTR), gross merchandise value (GMV) and orders have increased 1.59%, 8.16% and 8.71% respectively. For several major ads publishers of JD demand-side platform, CTR, GMV and return on investment have increased 6.58%, 61.72% and 65.57% respectively by the first launch, and further increased 2.95%, 41.75% and 41.37% respectively by the second launch.Comment: 8 pages, 11 figures, 1 tabl

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

Author: Fink G. A.
Fritsch J.
McGuire P. C.
Ritter H.
Roethling F.
Sagerer G.
Steil J. J.
Wachsmuth S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.Comment: 7 pages, 8 figure

arXiv.org e-Print Archive

Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

Author: Grauman Kristen
Jayaraman Dinesh
Publication venue
Publication date: 21/12/2017
Field of study

It is common to implicitly assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is itself a major challenge. We address the problem of learning to look around: if a visual agent has the ability to voluntarily acquire new views to observe its environment, how can it learn efficient exploratory behaviors to acquire informative observations? We propose a reinforcement learning solution, where the agent is rewarded for actions that reduce its uncertainty about the unobserved portions of its environment. Based on this principle, we develop a recurrent neural network-based approach to perform active completion of panoramic natural scenes and 3D object shapes. Crucially, the learned policies are not tied to any recognition task nor to the particular semantic content seen during training. As a result, 1) the learned "look around" behavior is relevant even for new tasks in unseen environments, and 2) training data acquisition involves no manual labeling. Through tests in diverse settings, we demonstrate that our approach learns useful generic policies that transfer to new unseen tasks and environments. Completion episodes are shown at https://goo.gl/BgWX3W

arXiv.org e-Print Archive

Who am I talking with? A face memory for social robots

Author: Hanheide Marc
Lang Christian
Sagerer Gerhard
Wrede Sebastian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In order to provide personalized services and to develop human-like interaction capabilities robots need to rec- ognize their human partner. Face recognition has been studied in the past decade exhaustively in the context of security systems and with significant progress on huge datasets. However, these capabilities are not in focus when it comes to social interaction situations. Humans are able to remember people seen for a short moment in time and apply this knowledge directly in their engagement in conversation. In order to equip a robot with capabilities to recall human interlocutors and to provide user- aware services, we adopt human-human interaction schemes to propose a face memory on the basis of active appearance models integrated with the active memory architecture. This paper presents the concept of the interactive face memory, the applied recognition algorithms, and their embedding into the robot’s system architecture. Performance measures are discussed for general face databases as well as scenario-specific datasets

CiteSeerX

Publications at Bielefeld University

How Does Our Visual System Achieve Shift and Size Invariance?

Author: Wiskott Laurenz
Publication venue: Oxford University Press
Publication date: 01/01/2004
Field of study

The question of shift and size invariance in the primate visual system is discussed. After a short review of the relevant neurobiology and psychophysics, a more detailed analysis of computational models is given. The two main types of networks considered are the dynamic routing circuit model and invariant feature networks, such as the neocognitron. Some specific open questions in context of these models are raised and possible solutions discussed

CogPrints Cognitive Sciences Eprint Archive

Towards binocular active vision in a robot head system

Author: Aragon-Camarasa G.
Haitham F.
Siebert J.P.
Publication venue
Publication date: 01/09/2008
Field of study

This paper presents the first results of an investigation and pilot study into an active, binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognizing objects in a highly-cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the ‘searchlight metaphor’ of biological systems. We present results of a first pilot investigation that yield a maximum vergence error of 6.4 pixels, while seven of nine known objects were recognized in a high-cluttered environment. Finally a “stepping stone” visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the Field of View resulting from any individual saccade

Enlighten