191 research outputs found

    Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control

    Full text link
    Recent progress on physics-based character animation has shown impressive breakthroughs on human motion synthesis, through imitating motion capture data via deep reinforcement learning. However, results have mostly been demonstrated on imitating a single distinct motion pattern, and do not generalize to interactive tasks that require flexible motion patterns due to varying human-object spatial configurations. To bridge this gap, we focus on one class of interactive tasks -- sitting onto a chair. We propose a hierarchical reinforcement learning framework which relies on a collection of subtask controllers trained to imitate simple, reusable mocap motions, and a meta controller trained to execute the subtasks properly to complete the main task. We experimentally demonstrate the strength of our approach over different non-hierarchical and hierarchical baselines. We also show that our approach can be applied to motion prediction given an image input. A supplementary video can be found at https://youtu.be/3CeN0OGz2cA.Comment: Accepted to AAAI 202

    Visual Recognition and Synthesis of Human-Object Interactions

    Full text link
    The ability to perceive and understand people's actions enables humans to efficiently communicate and collaborate in society. Endowing machines with such ability is an important step for building assistive and socially-aware robots. Despite such significance, the problem poses a great challenge and the current state of the art is still nowhere close to human-level performance. This dissertation drives progress on visual action understanding in the scope of human-object interactions (HOI), a major branch of human actions that dominates our everyday life. Specifically, we address the challenges of two important tasks: visual recognition and visual synthesis. The first part of this dissertation considers the recognition task. The main bottleneck of current research is a lack of proper benchmark, since existing action datasets contain only a small number of categories with limited diversity. To this end, we set out to construct a large-scale benchmark for HOI recognition. We first tackle the problem of establishing the vocabulary for human-object interactions, by investigating a variety of automatic approaches as well as a crowdsourcing approach that collects human labeled categories. Given the vocabulary, we then construct a large-scale image dataset of human-object interactions by annotating web images through online crowdsourcing. The new "HICO" dataset surpasses prior datasets in term of both the number of images and action categories by one order of magnitude. The introduction of HICO enables us to benchmark state-of-the-art recognition approaches and also shed light on new challenges in the realm of large-scale HOI recognition. We further discover that visual features of humans, objects, as well as their spatial relations play a central role in the representation of interaction, and the combination of three can improve the recognition outcome. The second part of this dissertation considers the synthesis task, and focuses particularly on the synthesis of body motion. The central goal is: given an image of a scene, synthesize the course of an action conditioned on the observed scene. Such capability can predict possible actions afforded by the scene, and will facilitate efficient reactions in human-robot interactions. We investigate two types of synthesis tasks: semantic-driven synthesis and goal-driven synthesis. For semantic-driven synthesis, we study the forecasting of human dynamics from a static image. We propose a novel deep neural network architecture that extracts semantic information from the image and use it to predict future body movement. For goal-directed synthesis, we study the synthesis of motion defined by human-object interactions. We focus on one particular class of interactions—a person sitting onto a chair. To ensure realistic motion from physical interactions, we leverage a physics simulated environment that contains a humanoid and chair model. We propose a novel reinforcement learning framework, and show that the synthesized motion can generalize to different initial human-chair configurations. At the end of this dissertation, we also contribute a new approach to temporal action localization, an essential task in video action understanding. We address the shortcomings of prior Faster R-CNN based approaches, and show state-of-the-art performance on standard benchmarks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/150045/1/ywchao_1.pd

    Robotic object manipulation via hierarchical and affordance learning

    Get PDF
    With the rise of computation power and machine learning techniques, a shift of research interest is happening to roboticists. Against this background, this thesis seeks to develop or enhance learning-based grasping and manipulation systems. This thesis first proposes a method, named A2, to improve the sample efficiency of end-to-end deep reinforcement learning algorithms for long horizon, multi-step and sparse reward manipulation. The named A2 comes from the fact that it uses Abstract demonstrations to guide the learning process and Adaptively adjusts exploration according to online performances. Experiments in a series of multi-step grid world tasks and manipulation tasks demonstrate significant performance gains over baselines. Then, this thesis develops a hierarchical reinforcement learning approach towards solving the long-horizon manipulation tasks. Specifically, the proposed universal option framework integrates the knowledge-sharing advantage of goal-conditioned reinforcement learning into hierarchical reinforcement learning. An analysis of the parallel training non-stationarity problem is also conducted, and the A2 method is employed to address the issue. Experiments in a series of continuous multi-step, multi-outcome block stacking tasks demonstrate significant performance gains as well as reductions of memory and repeated computation over baselines. Finally, this thesis studies the interplay between grasp generation and manipulation motion generation, arguing that selecting a good grasp before manipulation is essential for contact-rich manipulation tasks. A theory of general affordances based on the reinforcement learning paradigm is developed and used to represent the relationship between grasp generation and manipulation performances. This leads to the general affordance-aware manipulation framework, which selects task-agnostic grasps for downstream manipulation based on the predicted manipulation performances. Experiments on a series of contact-rich hook separation tasks prove the effectiveness of the proposed framework and showcase significant performance gains by filtering away unsatisfactory grasps

    Agents for educational games and simulations

    Get PDF
    This book consists mainly of revised papers that were presented at the Agents for Educational Games and Simulation (AEGS) workshop held on May 2, 2011, as part of the Autonomous Agents and MultiAgent Systems (AAMAS) conference in Taipei, Taiwan. The 12 full papers presented were carefully reviewed and selected from various submissions. The papers are organized topical sections on middleware applications, dialogues and learning, adaption and convergence, and agent applications

    The role of approximate negators in modeling the automatic detection of negation in tweets

    Get PDF
    Although improvements have been made in the performance of sentiment analysis tools, the automatic detection of negated text (which affects negative sentiment prediction) still presents challenges. More research is needed on new forms of negation beyond prototypical negation cues such as “not” or “never.” The present research reports findings on the role of a set of words called “approximate negators,” namely “barely,” “hardly,” “rarely,” “scarcely,” and “seldom,” which, in specific occasions (such as attached to a word from the non-affirmative adverb “any” family), can operationalize negation styles not yet explored. Using a corpus of 6,500 tweets, human annotation allowed for the identification of 17 recurrent usages of these words as negatives (such as “very seldom”) which, along with findings from the literature, helped engineer specific features that guided a machine learning classifier in predicting negated tweets. The machine learning experiments also modeled negation scope (i.e. in which specific words are negated in the text) by employing lexical and dependency graph information. Promising results included F1 values for negation detection ranging from 0.71 to 0.89 and scope detection from 0.79 to 0.88. Future work will be directed to the application of these findings in automatic sentiment classification, further exploration of patterns in data (such as part-of-speech recurrences for these new types of negation), and the investigation of sarcasm, formal language, and exaggeration as themes that emerged from observations during corpus annotation

    A hotkey interaction technique that promotes hotkeys

    Get PDF
    Hotkeys provide fast interactions to support expert performance. Compared to the traditional pointer-based selection of commands, hotkeys have the advantage in reducing task completion time. However, research shows that users have a tendency of favoring menu selections. This is partially caused by how hotkeys are displayed in most linear and toolbar menus. This thesis provides a review of key findings from literature that aim to promote hotkeys. On the base of these findings, this thesis develops design criteria for hotkey displays that promote hotkey use. This thesis also proposes a new interaction technique which displays hotkeys on the keyboard. Finally, a cognitive model is constructed to describe a user’s decision-making process of choosing between hotkeys and pointer-based selections when this new hotkey display technique is presented
    • …
    corecore