6 research outputs found

    Learning Hierarchical Models of Complex Daily Activities from Annotated Videos

    Get PDF
    Effective recognition of complex long-term activities is becoming an increasingly important task in artificial intelligence. In this paper, we propose a novel approach for building models of complex long-term activities. First, we automatically learn the hierarchical structure of activities by learning about the 'parent-child' relation of activity components from a video using the variability in annotations acquired using multiple annotators. This variability allows for extracting the inherent hierarchical structure of the activity in a video. We consolidate hierarchical structures of the same activity from different videos into a unified stochastic grammar describing the overall activity. We then describe an inference mechanism to interpret new instances of activities. We use three datasets, which have been annotated by multiple annotators, of daily activity videos to demonstrate the effectiveness of our system

    Grounding of Human Environments and Activities for Autonomous Robots

    Get PDF
    With the recent proliferation of robotic applications in domestic and industrial scenarios, it is vital for robots to continually learn about their environments and about the humans they share their environments with. In this paper, we present a framework for autonomous, unsupervised learning from various sensory sources of useful human ‘concepts’; including colours, people names, usable objects and simple activities. This is achieved by integrating state-of-the-art object segmentation, pose estimation, activity analysis and language grounding into a continual learning framework. Learned concepts are grounded to natural language if commentary is available, allowing the robot to communicate in a human-understandable way. We show, using a challenging, real-world dataset of human activities, that our framework is able to extract useful concepts, ground natural language descriptions to them, and, as a proof-of-concept, to generate simple sentences from templates to describe people and activities

    Multiscale Topological Trajectory Classification with Persistent Homology

    Get PDF
    Abstract—Topological approaches to studying equivalence classes of trajectories in a configuration space have recently received attention in robotics since they allow a robot to reason about trajectories at a high level of abstraction. While recent work has approached the problem of topological motion planning under the assumption that the configuration space and obsta-cles within it are explicitly described in a noise-free manner, we focus on trajectory classification and present a sampling-based approach which can handle noise, which is applicable to general configuration spaces and which relies only on the availability of collision free samples. Unlike previous sampling-based approaches in robotics which use graphs to capture information about the path-connectedness of a configuration space, we construct a multiscale approximation of neighbor-hoods of the collision free configurations based on filtrations of simplicial complexes. Our approach thereby extracts additional homological information which is essential for a topological trajectory classification. By computing a basis for the first persistent homology groups, we obtain a multiscale classification algorithm for trajectories in configuration spaces of arbitrary dimension. We furthermore show how an augmented filtration of simplicial complexes based on a cost function can be defined to incorporate additional constraints. We present an evaluation of our approach in 2, 3, 4 and 6 dimensional configuration spaces in simulation and using a Baxter robot. I

    Beyond RMSE: Do machine-learned models of road user interaction produce human-like behavior?

    Get PDF
    Autonomous vehicles use a variety of sensors and machine-learned models to predict the behavior of surrounding road users. Most of the machine-learned models in the literature focus on quantitative error metrics like the root mean square error (RMSE) to learn and report their models' capabilities. This focus on quantitative error metrics tends to ignore the more important behavioral aspect of the models, raising the question of whether these models really predict human-like behavior. Thus, we propose to analyze the output of machine-learned models much like we would analyze human data in conventional behavioral research. We introduce quantitative metrics to demonstrate presence of three different behavioral phenomena in a naturalistic highway driving dataset: 1) The kinematics-dependence of who passes a merging point first 2) Lane change by an on-highway vehicle to accommodate an on-ramp vehicle 3) Lane changes by vehicles on the highway to avoid lead vehicle conflicts. Then, we analyze the behavior of three machine-learned models using the same metrics. Even though the models' RMSE value differed, all the models captured the kinematic-dependent merging behavior but struggled at varying degrees to capture the more nuanced courtesy lane change and highway lane change behavior. Additionally, the collision aversion analysis during lane changes showed that the models struggled to capture the physical aspect of human driving: leaving adequate gap between the vehicles. Thus, our analysis highlighted the inadequacy of simple quantitative metrics and the need to take a broader behavioral perspective when analyzing machine-learned models of human driving predictions

    Natural Language Grounding and Grammar Induction for Robotic Manipulation Commands

    No full text
    We present a cognitively plausible system capable of acquiring knowledge in language and vision from pairs of short video clips and linguistic descriptions. The aim of this work is to teach a robot manipulator how to execute natural language commands by demonstration. This is achieved by first learning a set of visual `concepts' that abstract the visual feature spaces into concepts that have human-level meaning. Second, learning the mapping/grounding between words and the extracted visual concepts. Third, inducing grammar rules via a semantic representation known as Robot Control Language (RCL). We evaluate our approach against state-of-the-art supervised and unsupervised grounding and grammar induction systems, and show that a robot can learn to execute never seen-before commands from pairs of unlabelled linguistic and visual inputs
    corecore