32,704 research outputs found

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    Real Time Animation of Virtual Humans: A Trade-off Between Naturalness and Control

    Get PDF
    Virtual humans are employed in many interactive applications using 3D virtual environments, including (serious) games. The motion of such virtual humans should look realistic (or ‘natural’) and allow interaction with the surroundings and other (virtual) humans. Current animation techniques differ in the trade-off they offer between motion naturalness and the control that can be exerted over the motion. We show mechanisms to parametrize, combine (on different body parts) and concatenate motions generated by different animation techniques. We discuss several aspects of motion naturalness and show how it can be evaluated. We conclude by showing the promise of combinations of different animation paradigms to enhance both naturalness and control

    Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for

    Get PDF
    In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age of the child, even when utterance length and differences in articulation rate between subjects are controlled for. In this paper we show on utterance level in spontaneous Swedish speech that i) for the youngest children, articulation rate in CDS is lower than in adult-directed speech (ADS), ii) there is a significant negative correlation between articulation rate and surprisal (the negative log probability) in ADS, and iii) the increase in articulation rate in Swedish CDS as a function of the age of the child holds, even when surprisal along with utterance length and differences in articulation rate between speakers are controlled for. These results indicate that adults adjust their articulation rate to make it fit the linguistic capacity of the child.Comment: 5 pages, Interspeech 201

    Fast, invariant representation for human action in the visual system

    Get PDF
    Humans can effortlessly recognize others' actions in the presence of complex transformations, such as changes in viewpoint. Several studies have located the regions in the brain involved in invariant action recognition, however, the underlying neural computations remain poorly understood. We use magnetoencephalography (MEG) decoding and a dataset of well-controlled, naturalistic videos of five actions (run, walk, jump, eat, drink) performed by different actors at different viewpoints to study the computational steps used to recognize actions across complex transformations. In particular, we ask when the brain discounts changes in 3D viewpoint relative to when it initially discriminates between actions. We measure the latency difference between invariant and non-invariant action decoding when subjects view full videos as well as form-depleted and motion-depleted stimuli. Our results show no difference in decoding latency or temporal profile between invariant and non-invariant action recognition in full videos. However, when either form or motion information is removed from the stimulus set, we observe a decrease and delay in invariant action decoding. Our results suggest that the brain recognizes actions and builds invariance to complex transformations at the same time, and that both form and motion information are crucial for fast, invariant action recognition

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

    Synthesis of variable dancing styles based on a compact spatiotemporal representation of dance

    Get PDF
    Dance as a complex expressive form of motion is able to convey emotion, meaning and social idiosyncrasies that opens channels for non-verbal communication, and promotes rich cross-modal interactions with music and the environment. As such, realistic dancing characters may incorporate crossmodal information and variability of the dance forms through compact representations that may describe the movement structure in terms of its spatial and temporal organization. In this paper, we propose a novel method for synthesizing beatsynchronous dancing motions based on a compact topological model of dance styles, previously captured with a motion capture system. The model was based on the Topological Gesture Analysis (TGA) which conveys a discrete three-dimensional point-cloud representation of the dance, by describing the spatiotemporal variability of its gestural trajectories into uniform spherical distributions, according to classes of the musical meter. The methodology for synthesizing the modeled dance traces back the topological representations, constrained with definable metrical and spatial parameters, into complete dance instances whose variability is controlled by stochastic processes that considers both TGA distributions and the kinematic constraints of the body morphology. In order to assess the relevance and flexibility of each parameter into feasibly reproducing the style of the captured dance, we correlated both captured and synthesized trajectories of samba dancing sequences in relation to the level of compression of the used model, and report on a subjective evaluation over a set of six tests. The achieved results validated our approach, suggesting that a periodic dancing style, and its musical synchrony, can be feasibly reproduced from a suitably parametrized discrete spatiotemporal representation of the gestural motion trajectories, with a notable degree of compression
    corecore