69,437 research outputs found

    An Active Pattern Recognition Architecture for Mobile Robots

    Full text link
    An active, attentionally-modulated recognition architecture is proposed for object recognition and scene analysis. The proposed architecture forms part of navigation and trajectory planning modules for mobile robots. Key characteristics of the system include movement planning and execution based on environmental factors and internal goal definitions. Real-time implementation of the system is based on space-variant representation of the visual field, as well as an optimal visual processing scheme utilizing separate and parallel channels for the extraction of boundaries and stimulus qualities. A spatial and temporal grouping module (VWM) allows for scene scanning, multi-object segmentation, and featural/object priming. VWM is used to modulate a tn~ectory formation module capable of redirecting the focus of spatial attention. Finally, an object recognition module based on adaptive resonance theory is interfaced through VWM to the visual processing module. The system is capable of using information from different modalities to disambiguate sensory input.Defense Advanced Research Projects Agency (90-0083); Office of Naval Research (N00014-92-J-1309); Consejo Nacional de Ciencia y Tecnología (63462

    Telepath: Understanding Users from a Human Vision Perspective in Large-Scale Recommender Systems

    Full text link
    Designing an e-commerce recommender system that serves hundreds of millions of active users is a daunting challenge. From a human vision perspective, there're two key factors that affect users' behaviors: items' attractiveness and their matching degree with users' interests. This paper proposes Telepath, a vision-based bionic recommender system model, which understands users from such perspective. Telepath is a combination of a convolutional neural network (CNN), a recurrent neural network (RNN) and deep neural networks (DNNs). Its CNN subnetwork simulates the human vision system to extract key visual signals of items' attractiveness and generate corresponding activations. Its RNN and DNN subnetworks simulate cerebral cortex to understand users' interest based on the activations generated from browsed items. In practice, the Telepath model has been launched to JD's recommender system and advertising system. For one of the major item recommendation blocks on the JD app, click-through rate (CTR), gross merchandise value (GMV) and orders have increased 1.59%, 8.16% and 8.71% respectively. For several major ads publishers of JD demand-side platform, CTR, GMV and return on investment have increased 6.58%, 61.72% and 65.57% respectively by the first launch, and further increased 2.95%, 41.75% and 41.37% respectively by the second launch.Comment: 8 pages, 11 figures, 1 tabl

    Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

    Full text link
    A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.Comment: 7 pages, 8 figure

    Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

    Full text link
    It is common to implicitly assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is itself a major challenge. We address the problem of learning to look around: if a visual agent has the ability to voluntarily acquire new views to observe its environment, how can it learn efficient exploratory behaviors to acquire informative observations? We propose a reinforcement learning solution, where the agent is rewarded for actions that reduce its uncertainty about the unobserved portions of its environment. Based on this principle, we develop a recurrent neural network-based approach to perform active completion of panoramic natural scenes and 3D object shapes. Crucially, the learned policies are not tied to any recognition task nor to the particular semantic content seen during training. As a result, 1) the learned "look around" behavior is relevant even for new tasks in unseen environments, and 2) training data acquisition involves no manual labeling. Through tests in diverse settings, we demonstrate that our approach learns useful generic policies that transfer to new unseen tasks and environments. Completion episodes are shown at https://goo.gl/BgWX3W

    Who am I talking with? A face memory for social robots

    Get PDF
    In order to provide personalized services and to develop human-like interaction capabilities robots need to rec- ognize their human partner. Face recognition has been studied in the past decade exhaustively in the context of security systems and with significant progress on huge datasets. However, these capabilities are not in focus when it comes to social interaction situations. Humans are able to remember people seen for a short moment in time and apply this knowledge directly in their engagement in conversation. In order to equip a robot with capabilities to recall human interlocutors and to provide user- aware services, we adopt human-human interaction schemes to propose a face memory on the basis of active appearance models integrated with the active memory architecture. This paper presents the concept of the interactive face memory, the applied recognition algorithms, and their embedding into the robot’s system architecture. Performance measures are discussed for general face databases as well as scenario-specific datasets

    How Does Our Visual System Achieve Shift and Size Invariance?

    Get PDF
    The question of shift and size invariance in the primate visual system is discussed. After a short review of the relevant neurobiology and psychophysics, a more detailed analysis of computational models is given. The two main types of networks considered are the dynamic routing circuit model and invariant feature networks, such as the neocognitron. Some specific open questions in context of these models are raised and possible solutions discussed

    Towards binocular active vision in a robot head system

    Get PDF
    This paper presents the first results of an investigation and pilot study into an active, binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognizing objects in a highly-cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the ‘searchlight metaphor’ of biological systems. We present results of a first pilot investigation that yield a maximum vergence error of 6.4 pixels, while seven of nine known objects were recognized in a high-cluttered environment. Finally a “stepping stone” visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the Field of View resulting from any individual saccade
    corecore