921 research outputs found

    Decoupling Behavior, Perception, and Control for Autonomous Learning of Affordances

    Get PDF
    ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Presented at the 2013 IEEE International Conference on Robotics and Automation (ICRA), 6-10 May 2013, Karlsruhe, Germany.DOI: 10.1109/ICRA.2013.6631290A novel behavior representation is introduced that permits a robot to systematically explore the best methods by which to successfully execute an affordance-based behavior for a particular object. The approach decomposes affordance-based behaviors into three components. We first define controllers that specify how to achieve a desired change in object state through changes in the agent’s state. For each controller we develop at least one behavior primitive that determines how the controller outputs translate to specific movements of the agent. Additionally we provide multiple perceptual proxies that define the representation of the object that is to be computed as input to the controller during execution. A variety of proxies may be selected for a given controller and a given proxy may provide input for more than one controller. When developing an appropriate affordance-based behavior strategy for a given object, the robot can systematically vary these elements as well as note the impact of additional task variables such as location in the workspace. We demonstrate the approach using a PR2 robot that explores different combinations of controller, behavior primitive, and proxy to perform a push or pull positioning behavior on a selection of household objects, learning which methods best work for each object

    Towards Cognitive Bots: Architectural Research Challenges

    Full text link
    Software bots operating in multiple virtual digital platforms must understand the platforms' affordances and behave like human users. Platform affordances or features differ from one application platform to another or through a life cycle, requiring such bots to be adaptable. Moreover, bots in such platforms could cooperate with humans or other software agents for work or to learn specific behavior patterns. However, present-day bots, particularly chatbots, other than language processing and prediction, are far from reaching a human user's behavior level within complex business information systems. They lack the cognitive capabilities to sense and act in such virtual environments, rendering their development a challenge to artificial general intelligence research. In this study, we problematize and investigate assumptions in conceptualizing software bot architecture by directing attention to significant architectural research challenges in developing cognitive bots endowed with complex behavior for operation on information systems. As an outlook, we propose alternate architectural assumptions to consider in future bot design and bot development frameworks

    Building Affordance Relations for Robotic Agents - A Review

    Get PDF

    The Problem of Mental Action

    Get PDF
    In mental action there is no motor output to be controlled and no sensory input vector that could be manipulated by bodily movement. It is therefore unclear whether this specific target phenomenon can be accommodated under the predictive processing framework at all, or if the concept of “active inference” can be adapted to this highly relevant explanatory domain. This contribution puts the phenomenon of mental action into explicit focus by introducing a set of novel conceptual instruments and developing a first positive model, concentrating on epistemic mental actions and epistemic self-control. Action initiation is a functionally adequate form of self-deception; mental actions are a specific form of predictive control of effective connectivity, accompanied and possibly even functionally mediated by a conscious “epistemic agent model”. The overall process is aimed at increasing the epistemic value of pre-existing states in the conscious self-model, without causally looping through sensory sheets or using the non-neural body as an instrument for active inference

    Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer

    Get PDF
    An important goal of research in Deep Reinforcement Learning in mobile robotics is to train agents capable of solving complex tasks, which require a high level of scene understanding and reasoning from an egocentric perspective. When trained from simulations, optimal environments should satisfy a currently unobtainable combination of high-fidelity photographic observations, massive amounts of different environment configurations and fast simulation speeds. In this paper we argue that research on training agents capable of complex reasoning can be simplified by decoupling from the requirement of high fidelity photographic observations. We present a suite of tasks requiring complex reasoning and exploration in continuous, partially observable 3D environments. The objective is to provide challenging scenarios and a robust baseline agent architecture that can be trained on mid-range consumer hardware in under 24h. Our scenarios combine two key advantages: (i) they are based on a simple but highly efficient 3D environment (ViZDoom) which allows high speed simulation (12000fps); (ii) the scenarios provide the user with a range of difficulty settings, in order to identify the limitations of current state of the art algorithms and network architectures. We aim to increase accessibility to the field of Deep-RL by providing baselines for challenging scenarios where new ideas can be iterated on quickly. We argue that the community should be able to address challenging problems in reasoning of mobile agents without the need for a large compute infrastructure

    Tool mastering today – an interdisciplinary perspective

    Get PDF
    Tools have coined human life, living conditions, and culture. Recognizing the cognitive architecture underlying tool use would allow us to comprehend its evolution, development, and physiological basis. However, the cognitive underpinnings of tool mastering remain little understood in spite of long-time research in neuroscientific, psychological, behavioral and technological fields. Moreover, the recent transition of tool use to the digital domain poses new challenges for explaining the underlying processes. In this interdisciplinary review, we propose three building blocks of tool mastering: (A) perceptual and motor abilities integrate to tool manipulation knowledge, (B) perceptual and cognitive abilities to functional tool knowledge, and (C) motor and cognitive abilities to means-end knowledge about tool use. This framework allows for integrating and structuring research findings and theoretical assumptions regarding the functional architecture of tool mastering via behavior in humans and non-human primates, brain networks, as well as computational and robotic models. An interdisciplinary perspective also helps to identify open questions and to inspire innovative research approaches. The framework can be applied to studies on the transition from classical to modern, non-mechanical tools and from analogue to digital user-tool interactions in virtual reality, which come with increased functional opacity and sensorimotor decoupling between tool user, tool, and target. By working towards an integrative theory on the cognitive architecture of the use of tools and technological assistants, this review aims at stimulating future interdisciplinary research avenues

    PlanT: Explainable Planning Transformers via Object-Level Representations

    Get PDF
    Planning an optimal route in a complex environment requires efficientreasoning about the surrounding scene. While human drivers prioritize importantobjects and ignore details not relevant to the decision, learning-basedplanners typically extract features from dense, high-dimensional gridrepresentations containing all vehicle and road context information. In thispaper, we propose PlanT, a novel approach for planning in the context ofself-driving that uses a standard transformer architecture. PlanT is based onimitation learning with a compact object-level input representation. On theLongest6 benchmark for CARLA, PlanT outperforms all prior methods (matching thedriving score of the expert) while being 5.3x faster than equivalentpixel-based planning baselines during inference. Combining PlanT with anoff-the-shelf perception module provides a sensor-based driving system that ismore than 10 points better in terms of driving score than the existing state ofthe art. Furthermore, we propose an evaluation protocol to quantify the abilityof planners to identify relevant objects, providing insights regarding theirdecision-making. Our results indicate that PlanT can focus on the most relevantobject in the scene, even when this object is geometrically distant.<br

    RLAD: Reinforcement Learning from Pixels for Autonomous Driving in Urban Environments

    Full text link
    Current approaches of Reinforcement Learning (RL) applied in urban Autonomous Driving (AD) focus on decoupling the perception training from the driving policy training. The main reason is to avoid training a convolution encoder alongside a policy network, which is known to have issues related to sample efficiency, degenerated feature representations, and catastrophic self-overfitting. However, this paradigm can lead to representations of the environment that are not aligned with the downstream task, which may result in suboptimal performances. To address this limitation, this paper proposes RLAD, the first Reinforcement Learning from Pixels (RLfP) method applied in the urban AD domain. We propose several techniques to enhance the performance of an RLfP algorithm in this domain, including: i) an image encoder that leverages both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers; ii) WayConv1D, which is a waypoint encoder that harnesses the 2D geometrical information of the waypoints using 1D convolutions; and iii) an auxiliary loss to increase the significance of the traffic lights in the latent representation of the environment. Experimental results show that RLAD significantly outperforms all state-of-the-art RLfP methods on the NoCrash benchmark. We also present an infraction analysis on the NoCrash-regular benchmark, which indicates that RLAD performs better than all other methods in terms of both collision rate and red light infractions

    A Reference Software Architecture for Social Robots

    Full text link
    Social Robotics poses tough challenges to software designers who are required to take care of difficult architectural drivers like acceptability, trust of robots as well as to guarantee that robots establish a personalised interaction with their users. Moreover, in this context recurrent software design issues such as ensuring interoperability, improving reusability and customizability of software components also arise. Designing and implementing social robotic software architectures is a time-intensive activity requiring multi-disciplinary expertise: this makes difficult to rapidly develop, customise, and personalise robotic solutions. These challenges may be mitigated at design time by choosing certain architectural styles, implementing specific architectural patterns and using particular technologies. Leveraging on our experience in the MARIO project, in this paper we propose a series of principles that social robots may benefit from. These principles lay also the foundations for the design of a reference software architecture for Social Robots. The ultimate goal of this work is to establish a common ground based on a reference software architecture to allow to easily reuse robotic software components in order to rapidly develop, implement, and personalise Social Robots
    corecore