921 research outputs found
Decoupling Behavior, Perception, and Control for Autonomous Learning of Affordances
©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Presented at the 2013 IEEE International Conference on Robotics and Automation (ICRA), 6-10 May 2013, Karlsruhe, Germany.DOI: 10.1109/ICRA.2013.6631290A novel behavior representation is introduced that
permits a robot to systematically explore the best methods by
which to successfully execute an affordance-based behavior for
a particular object. The approach decomposes affordance-based
behaviors into three components. We first define controllers that specify how to achieve a desired change in object state through changes in the agent’s state. For each controller we
develop at least one
behavior primitive
that determines how
the controller outputs translate to specific movements of the
agent. Additionally we provide multiple
perceptual proxies
that define the representation of the object that is to be computed as input to the controller during execution. A variety of proxies may be selected for a given controller and a given proxy may
provide input for more than one controller. When developing an appropriate affordance-based behavior strategy for a given object, the robot can systematically vary these elements as
well as note the impact of additional task variables such as location in the workspace. We demonstrate the approach
using a PR2 robot that explores different combinations of
controller, behavior primitive, and proxy to perform a push or
pull positioning behavior on a selection of household objects,
learning which methods best work for each object
Towards Cognitive Bots: Architectural Research Challenges
Software bots operating in multiple virtual digital platforms must understand
the platforms' affordances and behave like human users. Platform affordances or
features differ from one application platform to another or through a life
cycle, requiring such bots to be adaptable. Moreover, bots in such platforms
could cooperate with humans or other software agents for work or to learn
specific behavior patterns. However, present-day bots, particularly chatbots,
other than language processing and prediction, are far from reaching a human
user's behavior level within complex business information systems. They lack
the cognitive capabilities to sense and act in such virtual environments,
rendering their development a challenge to artificial general intelligence
research. In this study, we problematize and investigate assumptions in
conceptualizing software bot architecture by directing attention to significant
architectural research challenges in developing cognitive bots endowed with
complex behavior for operation on information systems. As an outlook, we
propose alternate architectural assumptions to consider in future bot design
and bot development frameworks
The Problem of Mental Action
In mental action there is no motor output to be controlled and no sensory input vector that could be manipulated by bodily movement. It is therefore unclear whether this specific target phenomenon can be accommodated under the predictive processing framework at all, or if the concept of “active inference” can be adapted to this highly relevant explanatory domain. This contribution puts the phenomenon of mental action into explicit focus by introducing a set of novel conceptual instruments and developing a first positive model, concentrating on epistemic mental actions and epistemic self-control. Action initiation is a functionally adequate form of self-deception; mental actions are a specific form of predictive control of effective connectivity, accompanied and possibly even functionally mediated by a conscious “epistemic agent model”. The overall process is aimed at increasing the epistemic value of pre-existing states in the conscious self-model, without causally looping through sensory sheets or using the non-neural body as an instrument for active inference
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer
An important goal of research in Deep Reinforcement Learning in mobile
robotics is to train agents capable of solving complex tasks, which require a
high level of scene understanding and reasoning from an egocentric perspective.
When trained from simulations, optimal environments should satisfy a currently
unobtainable combination of high-fidelity photographic observations, massive
amounts of different environment configurations and fast simulation speeds. In
this paper we argue that research on training agents capable of complex
reasoning can be simplified by decoupling from the requirement of high fidelity
photographic observations. We present a suite of tasks requiring complex
reasoning and exploration in continuous, partially observable 3D environments.
The objective is to provide challenging scenarios and a robust baseline agent
architecture that can be trained on mid-range consumer hardware in under 24h.
Our scenarios combine two key advantages: (i) they are based on a simple but
highly efficient 3D environment (ViZDoom) which allows high speed simulation
(12000fps); (ii) the scenarios provide the user with a range of difficulty
settings, in order to identify the limitations of current state of the art
algorithms and network architectures. We aim to increase accessibility to the
field of Deep-RL by providing baselines for challenging scenarios where new
ideas can be iterated on quickly. We argue that the community should be able to
address challenging problems in reasoning of mobile agents without the need for
a large compute infrastructure
Tool mastering today – an interdisciplinary perspective
Tools have coined human life, living conditions, and culture. Recognizing the cognitive architecture underlying tool use would allow us to comprehend its evolution, development, and physiological basis. However, the cognitive underpinnings of tool mastering remain little understood in spite of long-time research in neuroscientific, psychological, behavioral and technological fields. Moreover, the recent transition of tool use to the digital domain poses new challenges for explaining the underlying processes. In this interdisciplinary review, we propose three building blocks of tool mastering: (A) perceptual and motor abilities integrate to tool manipulation knowledge, (B) perceptual and cognitive abilities to functional tool knowledge, and (C) motor and cognitive abilities to means-end knowledge about tool use. This framework allows for integrating and structuring research findings and theoretical assumptions regarding the functional architecture of tool mastering via behavior in humans and non-human primates, brain networks, as well as computational and robotic models. An interdisciplinary perspective also helps to identify open questions and to inspire innovative research approaches. The framework can be applied to studies on the transition from classical to modern, non-mechanical tools and from analogue to digital user-tool interactions in virtual reality, which come with increased functional opacity and sensorimotor decoupling between tool user, tool, and target. By working towards an integrative theory on the cognitive architecture of the use of tools and technological assistants, this review aims at stimulating future interdisciplinary research avenues
PlanT: Explainable Planning Transformers via Object-Level Representations
Planning an optimal route in a complex environment requires efficientreasoning about the surrounding scene. While human drivers prioritize importantobjects and ignore details not relevant to the decision, learning-basedplanners typically extract features from dense, high-dimensional gridrepresentations containing all vehicle and road context information. In thispaper, we propose PlanT, a novel approach for planning in the context ofself-driving that uses a standard transformer architecture. PlanT is based onimitation learning with a compact object-level input representation. On theLongest6 benchmark for CARLA, PlanT outperforms all prior methods (matching thedriving score of the expert) while being 5.3x faster than equivalentpixel-based planning baselines during inference. Combining PlanT with anoff-the-shelf perception module provides a sensor-based driving system that ismore than 10 points better in terms of driving score than the existing state ofthe art. Furthermore, we propose an evaluation protocol to quantify the abilityof planners to identify relevant objects, providing insights regarding theirdecision-making. Our results indicate that PlanT can focus on the most relevantobject in the scene, even when this object is geometrically distant.<br
RLAD: Reinforcement Learning from Pixels for Autonomous Driving in Urban Environments
Current approaches of Reinforcement Learning (RL) applied in urban Autonomous
Driving (AD) focus on decoupling the perception training from the driving
policy training. The main reason is to avoid training a convolution encoder
alongside a policy network, which is known to have issues related to sample
efficiency, degenerated feature representations, and catastrophic
self-overfitting. However, this paradigm can lead to representations of the
environment that are not aligned with the downstream task, which may result in
suboptimal performances. To address this limitation, this paper proposes RLAD,
the first Reinforcement Learning from Pixels (RLfP) method applied in the urban
AD domain. We propose several techniques to enhance the performance of an RLfP
algorithm in this domain, including: i) an image encoder that leverages both
image augmentations and Adaptive Local Signal Mixing (A-LIX) layers; ii)
WayConv1D, which is a waypoint encoder that harnesses the 2D geometrical
information of the waypoints using 1D convolutions; and iii) an auxiliary loss
to increase the significance of the traffic lights in the latent representation
of the environment. Experimental results show that RLAD significantly
outperforms all state-of-the-art RLfP methods on the NoCrash benchmark. We also
present an infraction analysis on the NoCrash-regular benchmark, which
indicates that RLAD performs better than all other methods in terms of both
collision rate and red light infractions
A Reference Software Architecture for Social Robots
Social Robotics poses tough challenges to software designers who are required
to take care of difficult architectural drivers like acceptability, trust of
robots as well as to guarantee that robots establish a personalised interaction
with their users. Moreover, in this context recurrent software design issues
such as ensuring interoperability, improving reusability and customizability of
software components also arise.
Designing and implementing social robotic software architectures is a
time-intensive activity requiring multi-disciplinary expertise: this makes
difficult to rapidly develop, customise, and personalise robotic solutions.
These challenges may be mitigated at design time by choosing certain
architectural styles, implementing specific architectural patterns and using
particular technologies.
Leveraging on our experience in the MARIO project, in this paper we propose a
series of principles that social robots may benefit from. These principles lay
also the foundations for the design of a reference software architecture for
Social Robots. The ultimate goal of this work is to establish a common ground
based on a reference software architecture to allow to easily reuse robotic
software components in order to rapidly develop, implement, and personalise
Social Robots
- …