4,438 research outputs found

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation

    Get PDF
    In this paper, we are interested in understanding how customers perceive fashion recommendations, in particular when observing a proposed combination of garments to compose an outfit. Automatically understanding how a suggested item is perceived, without any kind of active engagement, is in fact an essential block to achieve interactive applications. We propose a pixel-landmark mutual enhanced framework for implicit preference estimation, named PLM-IPE, which is capable of inferring the user's implicit preferences exploiting visual cues, without any active or conscious engagement. PLM-IPE consists of three key modules: pixel-based estimator, landmark-based estimator and mutual learning based optimization. The former two modules work on capturing the implicit reaction of the user from the pixel level and landmark level, respectively. The last module serves to transfer knowledge between the two parallel estimators. Towards evaluation, we collected a real-world dataset, named SentiGarment, which contains 3,345 facial reaction videos paired with suggested outfits and human labeled reaction scores. Extensive experiments show the superiority of our model over state-of-the-art approaches

    Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

    Full text link
    We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence "regions" salient to the current world state by using multiple abstractions of the input sentence. In contrast to existing methods, our model uses no specialized linguistic resources (e.g., parsers) or task-specific annotations (e.g., seed lexicons). It is therefore generalizable, yet still achieves the best results reported to-date on a benchmark single-sentence dataset and competitive results for the limited-training multi-sentence setting. We analyze our model through a series of ablations that elucidate the contributions of the primary components of our model.Comment: To appear at AAAI 2016 (and an extended version of a NIPS 2015 Multimodal Machine Learning workshop paper

    Neural Correlates of Social Behavior in Mushroom Body Extrinsic Neurons of the Honeybee Apis mellifera

    Get PDF
    The social behavior of honeybees (Apis mellifera) has been extensively investigated, but little is known about its neuronal correlates. We developed a method that allowed us to record extracellularly from mushroom body extrinsic neurons (MB ENs) in a freely moving bee within a small but functioning mini colony of approximately 1,000 bees. This study aimed to correlate the neuronal activity of multimodal high-order MB ENs with social behavior in a close to natural setting. The behavior of all bees in the colony was video recorded. The behavior of the recorded animal was compared with other hive mates and no significant differences were found. Changes in the spike rate appeared before, during or after social interactions. The time window of the strongest effect on spike rate changes ranged from 1 s to 2 s before and after the interaction, depending on the individual animal and recorded neuron. The highest spike rates occurred when the experimental animal was situated close to a hive mate. The variance of the spike rates was analyzed as a proxy for high order multi-unit processing. Comparing randomly selected time windows with those in which the recorded animal performed social interactions showed a significantly increased spike rate variance during social interactions. The experimental set-up employed for this study offers a powerful opportunity to correlate neuronal activity with intrinsically motivated behavior of socially interacting animals. We conclude that the recorded MB ENs are potentially involved in initiating and controlling social interactions in honeybees

    A Review on Human-Computer Interaction and Intelligent Robots

    Get PDF
    In the field of artificial intelligence, human–computer interaction (HCI) technology and its related intelligent robot technologies are essential and interesting contents of research. From the perspective of software algorithm and hardware system, these above-mentioned technologies study and try to build a natural HCI environment. The purpose of this research is to provide an overview of HCI and intelligent robots. This research highlights the existing technologies of listening, speaking, reading, writing, and other senses, which are widely used in human interaction. Based on these same technologies, this research introduces some intelligent robot systems and platforms. This paper also forecasts some vital challenges of researching HCI and intelligent robots. The authors hope that this work will help researchers in the field to acquire the necessary information and technologies to further conduct more advanced research

    Toward an affect-sensitive multimodal human-computer interaction

    No full text
    The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI -- an automatic personalized analyzer of a user's nonverbal affective feedback

    Practical aspects of designing and developing a multimodal embodied agent

    Get PDF
    2021 Spring.Includes bibliographical references.This thesis reviews key elements that went into the design and construction of the CSU CwC Embodied agent, also known as the Diana System. The Diana System has been developed over five years by a joint team of researchers at three institutions – Colorado State University, Brandeis University and the University of Florida. Over that time, I contributed to this overall effort and in this thesis, I present a practical review of key elements involved in designing and constructing the system. Particular attention is paid to Diana's multimodal capabilities that engage asynchronously and concurrently to support realistic interactions with the user. Diana can communicate in visual as well as auditory modalities. She can understand a variety of hand gestures for object manipulation, deixis, etc. and can gesture in return. Diana can also hold a conversation with the user in spoken and/or written English. Gestures and speech are often at play simultaneously, supplementing and complementing each other. Diana conveys her attention through several non-verbal cues like slower blinking when inattentive, keeping her gaze on the subject of her attention, etc. Finally, her ability to express emotions with facial expressions adds another crucial human element to any user interaction with the system. Central to Diana's capabilities is a blackboard architecture coordinating a hierarchy of modular components, each controlling a part of Diana's perceptual, cognitive, and motor abilities. The modular design facilitates contributions from multiple disciplines, namely VoxSim/VoxML with Text-to-speech/Automatic Speech Recognition systems for natural language understanding, deep neural networks for gesture recognition, 3D computer animation systems, etc. – all integrated within the Unity game engine to create an embodied, intelligent agent that is Diana. The primary contribution of this thesis is to provide a detailed explanation of Diana's internal working along with a thorough background of the research that supports these technologies
    • …
    corecore