359 research outputs found

    Learning to recognize parallel combinations of human motion primitives with linguistic descriptions using non-negative matrix factorization

    Get PDF
    International audienceWe present an approach, based on non-negative matrix factorization, for learning to recognize parallel combinations of initially unknown human motion primitives, associated with ambiguous sets of linguistic labels during training. In the training phase, the learner observes a human producing complex motions which are parallel combinations of initially unknown motion primitives. Each time the human shows a complex motion, he also provides high-level linguistic descriptions, consisting of a set of labels giving the name of the primitives inside the complex motion. From the observation of multi-modal combinations of high-level labels with high-dimensional continuous unsegmented values representing complex motions, the learner must later on be able to recognize, through the production of the adequate set of labels, which are the motion primitives in a novel complex motion produced by a human, even if those combinations were never observed during training. We explain how this problem, as well as natural extensions, can be addressed using non-negative matrix factorization. Then, we show in an experiment in which a learner has to recognize the primitive motions of complex human dance choreographies, that this technique allows the system to infer with good performance the combinatorial structure of parallel combinations of unknown primitives

    Learning Semantic Components from Subsymbolic Multimodal Perception

    Get PDF
    International audiencePerceptual systems often include sensors from several modalities. However, existing robots do not yet sufficiently discover patterns that are spread over the flow of multimodal data they receive. In this paper we present a framework that learns a dictionary of words from full spoken utterances, together with a set of gestures from human demonstrations and the semantic connection between words and gestures. We explain how to use a nonnegative matrix factorization algorithm to learn a dictionary of components that represent meaningful elements present in the multimodal perception, without providing the system with a symbolic representation of the semantics. We illustrate this framework by showing how a learner discovers word-like components from observation of gestures made by a human together with spoken descriptions of the gestures, and how it captures the semantic association between the two

    Learning the Combinatorial Structure of Demonstrated Behaviors with Inverse Feedback Control

    Get PDF
    International audienceIn many applications, such as virtual agents or humanoid robots, it is difficult to represent complex human behaviors and the full range of skills necessary to achieve them. Real life human behaviors are often the combination of several parts and never reproduced in the exact same way. In this work we introduce a new algorithm that is able to learn behaviors by assuming that the observed complex motions can be represented in a smaller dictionary of concurrent tasks. We present an optimization formalism and show how we can learn simultaneously the dictionary and the mixture coefficients that represent each demonstration. We present results on a idealized model where a set of potential functions represents human objectives or preferences for achieving a task

    SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model

    Full text link
    To realize human-like robot intelligence, a large-scale cognitive architecture is required for robots to understand the environment through a variety of sensors with which they are equipped. In this paper, we propose a novel framework named Serket that enables the construction of a large-scale generative model and its inference easily by connecting sub-modules to allow the robots to acquire various capabilities through interaction with their environments and others. We consider that large-scale cognitive models can be constructed by connecting smaller fundamental models hierarchically while maintaining their programmatic independence. Moreover, connected modules are dependent on each other, and parameters are required to be optimized as a whole. Conventionally, the equations for parameter estimation have to be derived and implemented depending on the models. However, it becomes harder to derive and implement those of a larger scale model. To solve these problems, in this paper, we propose a method for parameter estimation by communicating the minimal parameters between various modules while maintaining their programmatic independence. Therefore, Serket makes it easy to construct large-scale models and estimate their parameters via the connection of modules. Experimental results demonstrated that the model can be constructed by connecting modules, the parameters can be optimized as a whole, and they are comparable with the original models that we have proposed

    A SENSORY-MOTOR LINGUISTIC FRAMEWORK FOR HUMAN ACTIVITY UNDERSTANDING

    Get PDF
    We empirically discovered that the space of human actions has a linguistic structure. This is a sensory-motor space consisting of the evolution of joint angles of the human body in movement. The space of human activity has its own phonemes, morphemes, and sentences. We present a Human Activity Language (HAL) for symbolic non-arbitrary representation of sensory and motor information of human activity. This language was learned from large amounts of motion capture data. Kinetology, the phonology of human movement, finds basic primitives for human motion (segmentation) and associates them with symbols (symbolization). This way, kinetology provides a symbolic representation for human movement that allows synthesis, analysis, and symbolic manipulation. We introduce a kinetological system and propose five basic principles on which such a system should be based: compactness, view-invariance, reproducibility, selectivity, and reconstructivity. We demonstrate the kinetological properties of our sensory-motor primitives. Further evaluation is accomplished with experiments on compression and decompression of motion data. The morphology of a human action relates to the inference of essential parts of movement (morpho-kinetology) and its structure (morpho-syntax). To learn morphemes and their structure, we present a grammatical inference methodology and introduce a parallel learning algorithm to induce a grammar system representing a single action. The algorithm infers components of the grammar system as a subset of essential actuators, a CFG grammar for the language of each component representing the motion pattern performed in a single actuator, and synchronization rules modeling coordination among actuators. The syntax of human activities involves the construction of sentences using action morphemes. A sentence may range from a single action morpheme (nuclear syntax) to a sequence of sets of morphemes. A single morpheme is decomposed into analogs of lexical categories: nouns, adjectives, verbs, and adverbs. The sets of morphemes represent simultaneous actions (parallel syntax) and a sequence of movements is related to the concatenation of activities (sequential syntax). We demonstrate this linguistic framework on real motion capture data from a large scale database containing around 200 different actions corresponding to English verbs associated with voluntary meaningful observable movement

    Sensorimotor input as a language generalisation tool: a neurorobotics model for generation and generalisation of noun-verb combinations with sensorimotor inputs

    Get PDF
    The paper presents a neurorobotics cognitive model explaining the understanding and generalisation of nouns and verbs combinations when a vocal command consisting of a verb-noun sentence is provided to a humanoid robot. The dataset used for training was obtained from object manipulation tasks with a humanoid robot platform; it includes 9 motor actions and 9 objects placing placed in 6 different locations), which enables the robot to learn to handle real-world objects and actions. Based on the multiple time-scale recurrent neural networks, this study demonstrates its generalisation capability using a large data-set, with which the robot was able to generalise semantic representation of novel combinations of noun-verb sentences, and therefore produce the corresponding motor behaviours. This generalisation process is done via the grounding process: different objects are being interacted, and associated, with different motor behaviours, following a learning approach inspired by developmental language acquisition in infants. Further analyses of the learned network dynamics and representations also demonstrate how the generalisation is possible via the exploitation of this functional hierarchical recurrent network

    SEMANTIC ANALYSIS AND UNDERSTANDING OF HUMAN BEHAVIOUR IN VIDEO STREAMING

    Get PDF
    This thesis investigates the semantic analysis of the human behaviour captured by video streaming, both from the theoretical and technological points of view. The video analysis based on the semantic content is in fact still an open issue for the computer vision research community, especially when real-time analysis of complex scenes is concerned. Automated video analysis can be described and performed at different abstraction levels, from the pixel analysis up to the human behaviour understanding. Similarly, the organisation of computer vision systems is often hierarchical with low-level image processing techniques feeding into tracking algorithms and, then, into higher level scene analysis and/or behaviour analysis modules. Each level of this hierarchy has its open issues, among which the main ones are: - motion and object detection: dynamic background modelling, ghosts, suddenly changes in illumination conditions; - object tracking: modelling and estimating the dynamics of moving objects, presence of occlusions; - human behaviour identification: human behaviour patterns are characterized by ambiguity, inconsistency and time-variance. Researchers proposed various approaches which partially address some aspects of the above issues from the perspective of the semantic analysis and understanding of the video streaming. Many progresses were achieved, but usually not in a comprehensive way and often without reference to the actual operating situations. A popular class of approaches has been devised to enhance the quality of the semantic analysis by exploiting some background knowledge about scene and/or the human behaviour, thus narrowing the huge variety of possible behavioural patterns by focusing on a specific narrow domain. In general, the main drawback of the existing approaches to semantic analysis of the human behaviour, even in narrow domains, is inefficiency due to the high computational complexity related to the complex models representing the dynamics of the moving objects and the patterns of the human behaviours. In this perspective this thesis explores an innovative, original approach to human behaviour analysis and understanding by using the syntactical symbolic analysis of images and video streaming described by means of strings of symbols. A symbol is associated to each area of the analysed scene. When a moving object enters an area, the corresponding symbol is appended to the string describing the motion. This approach allows for characterizing the motion of a moving object with a word composed by symbols. By studying and classifying these words we can categorize and understand the various behaviours. The main advantage of this approach consists in the simplicity of the scene and motion descriptions so that the behaviour analysis will have limited computational complexity due to the intrinsic nature both of the representations and the related operations used to manipulate them. Besides, the structure of the representations is well suited for possible parallel processing, thus allowing for speeding up the analysis when appropriate hardware architectures are used. The theoretical background, the original theoretical results underlying this approach, the human behaviour analysis methodology, the possible implementations, and the related performance are presented and discussed in the thesis. To show the effectiveness of the proposed approach, a demonstrative system has been implemented and applied to a real indoor environment with valuable results. Furthermore, this thesis proposes an innovative method to improve the overall performance of the object tracking algorithm. This method is based on using two cameras to record the same scene from different point of view without introducing any constraint on cameras\u2019 position. The image fusion task is performed by solving the correspondence problem only for few relevant points. This approach reduces the problem of partial occlusions in crowded scenes. Since this method works at a level lower than that of semantic analysis, it can be applied also in other systems for human behaviour analysis and it can be seen as an optional method to improve the semantic analysis (because it reduces the problem of partial occlusions)

    Cognitive Robots for Social Interactions

    Get PDF
    One of my goals is to work towards developing Cognitive Robots, especially with regard to improving the functionalities that facilitate the interaction with human beings and their surrounding objects. Any cognitive system designated for serving human beings must be capable of processing the social signals and eventually enable efficient prediction and planning of appropriate responses. My main focus during my PhD study is to bridge the gap between the motoric space and the visual space. The discovery of the mirror neurons ([RC04]) shows that the visual perception of human motion (visual space) is directly associated to the motor control of the human body (motor space). This discovery poses a large number of challenges in different fields such as computer vision, robotics and neuroscience. One of the fundamental challenges is the understanding of the mapping between 2D visual space and 3D motoric control, and further developing building blocks (primitives) of human motion in the visual space as well as in the motor space. First, I present my study on the visual-motoric mapping of human actions. This study aims at mapping human actions in 2D videos to 3D skeletal representation. Second, I present an automatic algorithm to decompose motion capture (MoCap) sequences into synergies along with the times at which they are executed (or "activated") for each joint. Third, I proposed to use the Granger Causality as a tool to study the coordinated actions performed by at least two units. Recent scientific studies suggest that the above "action mirroring circuit" might be tuned to action coordination rather than single action mirroring. Fourth, I present the extraction of key poses in visual space. These key poses facilitate the further study of the "action mirroring circuit". I conclude the dissertation by describing the future of cognitive robotics study
    corecore