13 research outputs found

    A Cloud based Reinforcement Learning Framework for humanoid grasping

    Get PDF
    This work presents an innovative approach to a common task on robotics: grasping a set of objects. Autonomously grasping a previously unknown object still remains a challenging problem. This Thesis presents a new framework, inspired by the classical sense-model-act architecture and the knowledge processing of Cognitive Robotics. The framework tries to generalize the grasping task toope

    Benchmarking Deep Reinforcement Learning for Continuous Control

    Get PDF
    Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201

    Making friends on the fly : advances in ad hoc teamwork

    Get PDF
    textGiven the continuing improvements in design and manufacturing processes in addition to improvements in artificial intelligence, robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, it is expected that they will encounter and interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that any standards will lag behind the state-of-the-art protocols and robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. We argue that agents that effectively reason about ad hoc teamwork need to exhibit three capabilities: 1) robustness to teammate variety, 2) robustness to diverse tasks, and 3) fast adaptation. This thesis focuses on addressing all three of these challenges. In particular, this thesis introduces algorithms for quickly adapting to unknown teammates that enable agents to react to new teammates without extensive observations. The majority of existing multiagent algorithms focus on scenarios where all agents share coordination and communication protocols. While previous research on ad hoc teamwork considers some of these three challenges, this thesis introduces a new algorithm, PLASTIC, that is the first to address all three challenges in a single algorithm. PLASTIC adapts quickly to unknown teammates by reusing knowledge it learns about previous teammates and exploiting any expert knowledge available. Given this knowledge, PLASTIC selects which previous teammates are most similar to the current ones online and uses this information to adapt to their behaviors. This thesis introduces two instantiations of PLASTIC. The first is a model-based approach, PLASTIC-Model, that builds models of previous teammates' behaviors and plans online to determine the best course of action. The second uses a policy-based approach, PLASTIC-Policy, in which it learns policies for cooperating with past teammates and selects from among these policies online. Furthermore, we introduce a new transfer learning algorithm, TwoStageTransfer, that allows transferring knowledge from many past teammates while considering how similar each teammate is to the current ones. We theoretically analyze the computational tractability of PLASTIC-Model in a number of scenarios with unknown teammates. Additionally, we empirically evaluate PLASTIC in three domains that cover a spread of possible settings. Our evaluations show that PLASTIC can learn to communicate with unknown teammates using a limited set of messages, coordinate with externally-created teammates that do not reason about ad hoc teams, and act intelligently in domains with continuous states and actions. Furthermore, these evaluations show that TwoStageTransfer outperforms existing transfer learning algorithms and enables PLASTIC to adapt even better to new teammates. We also identify three dimensions that we argue best describe ad hoc teamwork scenarios. We hypothesize that these dimensions are useful for analyzing similarities among domains and determining which can be tackled by similar algorithms in addition to identifying avenues for future research. The work presented in this thesis represents an important step towards enabling agents to adapt to unknown teammates in the real world. PLASTIC significantly broadens the robustness of robots to their teammates and allows them to quickly adapt to new teammates by reusing previously learned knowledge.Computer Science

    Law and Ethics of Morally Significant Machines: The case for pre-emptive prevention

    Get PDF
    Interest in the ethics of Artificial Intelligence systems is dominated by the question of how these sorts of technologies will benefit or harm human individuals and societies. Much less attention is given to the ethics of our interaction with AI systems from the perspective of what may harm or benefit the systems themselves. Despite this, there is potential for future AI systems to be designed in a way that makes them either morally significant entities, or gives them the tools with which to develop degrees of moral significance, perhaps even personhood in the moral sense. This thesis proposes how certain contemporary paradigms in AI might in the future create a morally significant machine, perhaps even a machine person; one which can be harmed to a degree similar to ourselves. This type of system would be the first technology towards which the design of law and policy would be obliged to consider not just human best interests, but the best interests of the technology itself: how it is designed, what we can use it for, what can be done to it, and what we are duty-bound to provide it with. The thesis proposes a wide range of legal and social problems that the invention of such a system would engender, particularly in relation to paradigms like property, legal personality, and rights of both positive and negative nature. It also explores the fraught line-drawing problem of establishing which systems matter and which do not, and what the legal implications of this would be. It establishes that the net demands such a machine would place upon humans informs an argument that there should be a pre-emptive policy to prevent their creation, so as to mitigate harms to both human society and the machines themselves. When closely examined, the reality of a social partnership between persons – both human and machine – is too problematic and too profoundly challenging to the conception of anthropocentric hegemony to be justifiable
    corecore