1,745 research outputs found
Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions
An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis form the proposed exploration-exploitation strategy. Firstly, a new information measure is derived from the change in the variance volume surrounding the Gaussian process estimate. This measure of information gain is used to define the exploration reward of an observation. Secondly, a nonmyopic information value is presented that captures both the immediate exploration reward due to taking an action as well as future exploration opportunities that result. Finally, this information value is combined with the state-action value of SARSA() through a dynamic weighting factor to produce an exploration-exploitation management scheme for resource-constrained learning systems. The proposed learning strategy encourages either exploratory or exploitative behaviour depending on the requirements of the learning task and the available resources. The performance of the learning algorithms presented in this thesis is compared against other SARSA() methods. Results show that actively directing exploration to regions of the state-action space with high uncertainty improves the rate of learning, while dynamic management of the exploration-exploitation behaviour according to the available resources produces prudent learning behaviour in resource-constrained systems
Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions
An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis form the proposed exploration-exploitation strategy. Firstly, a new information measure is derived from the change in the variance volume surrounding the Gaussian process estimate. This measure of information gain is used to define the exploration reward of an observation. Secondly, a nonmyopic information value is presented that captures both the immediate exploration reward due to taking an action as well as future exploration opportunities that result. Finally, this information value is combined with the state-action value of SARSA() through a dynamic weighting factor to produce an exploration-exploitation management scheme for resource-constrained learning systems. The proposed learning strategy encourages either exploratory or exploitative behaviour depending on the requirements of the learning task and the available resources. The performance of the learning algorithms presented in this thesis is compared against other SARSA() methods. Results show that actively directing exploration to regions of the state-action space with high uncertainty improves the rate of learning, while dynamic management of the exploration-exploitation behaviour according to the available resources produces prudent learning behaviour in resource-constrained systems
A society of mind approach to cognition and metacognition in a cognitive architecture
This thesis investigates the concept of mind as a control system using the "Society of Agents" metaphor. "Society of Agents" describes collective behaviours of simple and intelligent agents. "Society of Mind" is more than a collection of task-oriented and deliberative agents; it is a powerful concept for mind research and can benefit from the use of metacognition. The aim is to develop a self configurable computational model using the concept of metacognition. A six tiered SMCA (Society of Mind Cognitive Architecture) control model is designed that relies on a society of agents operating using metrics associated with the principles of artificial economics in animal cognition. This research investigates the concept of metacognition as a powerful catalyst for control, unify and self-reflection. Metacognition is used on BDI models with respect to planning, reasoning, decision making, self reflection, problem solving, learning and the general process of cognition to improve performance.One perspective on how to develop metacognition in a SMCA model is based on the differentiation between metacognitive strategies and metacomponents or metacognitive aids. Metacognitive strategies denote activities such as metacomphrension (remedial action) and metamanagement (self management) and schema training (meaning full learning over cognitive structures). Metacomponents are aids for the representation of thoughts. To develop an efficient, intelligent and optimal agent through the use of metacognition requires the design of a multiple layered control model which includes simple to complex levels of agent action and behaviours. This SMCA model has designed and implemented for six layers which includes reflexive, reactive, deliberative (BDI), learning (Q-Ieamer), metacontrol and metacognition layers
Cognitive Architectures for Language Agents
Recent efforts have incorporated large language models (LLMs) with external
resources (e.g., the Internet) or internal control flows (e.g., prompt
chaining) for tasks requiring grounding or reasoning. However, these efforts
have largely been piecemeal, lacking a systematic framework for constructing a
fully-fledged language agent. To address this challenge, we draw on the rich
history of agent design in symbolic artificial intelligence to develop a
blueprint for a new wave of cognitive language agents. We first show that LLMs
have many of the same properties as production systems, and recent efforts to
improve their grounding or reasoning mirror the development of cognitive
architectures built around production systems. We then propose Cognitive
Architectures for Language Agents (CoALA), a conceptual framework to
systematize diverse methods for LLM-based reasoning, grounding, learning, and
decision making as instantiations of language agents in the framework. Finally,
we use the CoALA framework to highlight gaps and propose actionable directions
toward more capable language agents in the future.Comment: 16 pages of main content, 10 pages of references, 5 figures. Equal
contribution among the first two authors, order decided by coin flip. A
CoALA-based repo of recent work on language agents:
https://github.com/ysymyth/awesome-language-agent
How does rumination impact cognition? A first mechanistic model.
Rumination is a process of uncontrolled, narrowly-foused neg- ative thinking that is often self-referential, and that is a hall- mark of depression. Despite its importance, little is known about its cognitive mechanisms. Rumination can be thought of as a specific, constrained form of mind-wandering. Here, we introduce a cognitive model of rumination that we devel- oped on the basis of our existing model of mind-wandering. The rumination model implements the hypothesis that rumina- tion is caused by maladaptive habits of thought. These habits of thought are modelled by adjusting the number of memory chunks and their associative structure, which changes the se- quence of memories that are retrieved during mind-wandering, such that during rumination the same set of negative memo- ries is retrieved repeatedly. The implementation of habits of thought was guided by empirical data from an experience sam- pling study in healthy and depressed participants. On the ba- sis of this empirically-derived memory structure, our model naturally predicts the declines in cognitive task performance that are typically observed in depressed patients. This study demonstrates how we can use cognitive models to better un- derstand the cognitive mechanisms underlying rumination and depression
- …