16 research outputs found
Self-organisation of internal models in autonomous robots
Internal Models (IMs) play a significant role in autonomous robotics. They are mechanisms
able to represent the input-output characteristics of the sensorimotor loop. In
developmental robotics, open-ended learning of skills and knowledge serves the purpose
of reaction to unexpected inputs, to explore the environment and to acquire new
behaviours. The development of the robot includes self-exploration of the state-action
space and learning of the environmental dynamics.
In this dissertation, we explore the properties and benefits of the self-organisation
of robot behaviour based on the homeokinetic learning paradigm. A homeokinetic
robot explores the environment in a coherent way without prior knowledge of its
configuration or the environment itself. First, we propose a novel approach to self-organisation
of behaviour by artificial curiosity in the sensorimotor loop. Second, we
study how different forward models settings alter the behaviour of both exploratory
and goal-oriented robots. Diverse complexity, size and learning rules are compared
to assess the importance in the robotâs exploratory behaviour. We define the self-organised
behaviour performance in terms of simultaneous environment coverage and
best prediction of future sensori inputs. Among the findings, we have encountered
that models with a fast response and a minimisation of the prediction error by local
gradients achieve the best performance.
Third, we study how self-organisation of behaviour can be exploited to learn IMs
for goal-oriented tasks. An IM acquires coherent self-organised behaviours that are
then used to achieve high-level goals by reinforcement learning (RL). Our results
demonstrate that learning of an inverse model in this context yields faster reward maximisation
and a higher final reward. We show that an initial exploration of the environment
in a goal-less yet coherent way improves learning.
In the same context, we analyse the self-organisation of central pattern generators
(CPG) by reward maximisation. Our results show that CPGs can learn favourable
reward behaviour on high-dimensional robots using the self-organised interaction between
degrees of freedom. Finally, we examine an on-line dual control architecture
where we combine an Actor-Critic RL and the homeokinetic controller. With this
configuration, the probing signal is generated by the exertion of the embodied robot
experience with the environment. This set-up solves the problem of designing task-dependant
probing signals by the emergence of intrinsically motivated comprehensible
behaviour. Faster improvement of the reward signal compared to classic RL is
achievable with this configuration
Higher coordination with less control - A result of information maximization in the sensorimotor loop
This work presents a novel learning method in the context of embodied
artificial intelligence and self-organization, which has as few assumptions and
restrictions as possible about the world and the underlying model. The learning
rule is derived from the principle of maximizing the predictive information in
the sensorimotor loop. It is evaluated on robot chains of varying length with
individually controlled, non-communicating segments. The comparison of the
results shows that maximizing the predictive information per wheel leads to a
higher coordinated behavior of the physically connected robots compared to a
maximization per robot. Another focus of this paper is the analysis of the
effect of the robot chain length on the overall behavior of the robots. It will
be shown that longer chains with less capable controllers outperform those of
shorter length and more complex controllers. The reason is found and discussed
in the information-geometric interpretation of the learning process
The motivational role of affect in an ecological model
Drawing from empirical literature on ecological psychology, affective neuroscience, and philosophy of mind, this article describes a model of affect-as-motivation in the intentional bond between organism and environment. An epistemological justification for the motivating role of emotions is provided through articulating the perceptual context of emotions as embodied, situated, and functional, and positing perceptual salience as a biasing signal in an affordance competition model. The motivational role of affect is pragmatically integrated into discussions of action selection in the neurosciences
Using MapReduce Streaming for Distributed Life Simulation on the Cloud
Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conwayâs life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MRâs applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithmsâ performance on Amazonâs Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
Self-Motivated Composition of Strategic Action Policies
In the last 50 years computers have made dramatic progress in their capabilities, but at the same time their failings have demonstrated that we, as designers, do not yet understand the nature of intelligence. Chess playing, for example, was long offered up as an example of the unassailability of the human mind to Artificial Intelligence, but now a chess engine on a smartphone can beat a grandmaster. Yet, at the same time, computers struggle to beat amateur players in simpler games, such as Stratego, where sheer processing power cannot substitute for a lack of deeper understanding.
The task of developing that deeper understanding is overwhelming, and has previously been underestimated. There are many threads and all must be investigated. This dissertation explores one of those threads, namely asking the question âHow might an artificial agent decide on a sensible course of action, without being told what to do?â.
To this end, this research builds upon empowerment, a universal utility which provides an entirely general method for allowing an agent to measure the preferability of one state over another. Empowerment requires no explicit goals, and instead favours states that maximise an agentâs control over its environment.
Several extensions to the empowerment framework are proposed, which drastically increase the array of scenarios to which it can be applied, and allow it to evaluate actions in addition to states. These extensions are motivated by concepts such as bounded rationality, sub-goals, and anticipated future utility.
In addition, the novel concept of strategic affinity is proposed as a general method for measuring the strategic similarity between two (or more) potential sequences of actions. It does this in a general fashion, by examining how similar the distribution of future possible states would be in the case of enacting either sequence. This allows an agent to group action sequences, even in an unknown task space, into âstrategiesâ.
Strategic affinity is combined with the empowerment extensions to form soft-horizon empowerment, which is capable of composing action policies in a variety of unknown scenarios.
A Pac-Man-inspired prey game and the Gamblerâs Problem are used to demonstrate this selfmotivated action selection, and a Sokoban inspired box-pushing scenario is used to highlight the capability to pick strategically diverse actions.
The culmination of this is that soft-horizon empowerment demonstrates a variety of âintuitiveâ behaviours, which are not dissimilar to what we might expect a human to try.
This line of thinking demonstrates compelling results, and it is suggested there are a couple of avenues for immediate further research.
One of the most promising of these would be applying the self-motivated methodology and strategic affinity method to a wider range of scenarios, with a view to developing improved heuristic approximations that generate similar results. A goal of replicating similar results, whilst reducing the computational overhead, could help drive an improved understanding of how we may get closer to replicating a human-like approach