Search CORE

242,327 research outputs found

Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Author: A Ben-Tal
A Nilim
Alborz Geramifard
CW Anderson
D Bertsimas
DA Castanon
H-L Choi
Jonathan P. How
Joshua Redding
LF Bertuccelli
MG Lagoudakis
O Mihatsch
P Geibel
R Beard
R Olfati-Saber
RI Brafman
RS Sutton
S Berman
S Zhu
W Ren
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2012
Field of study

Planning for multi-agent systems such as task assignment for teams of limited-fuel unmanned aerial vehicles (UAVs) is challenging due to uncertainties in the assumed models and the very large size of the planning space. Researchers have developed fast cooperative planners based on simple models (e.g., linear and deterministic dynamics), yet inaccuracies in assumed models will impact the resulting performance. Learning techniques are capable of adapting the model and providing better policies asymptotically compared to cooperative planners, yet they often violate the safety conditions of the system due to their exploratory nature. Moreover they frequently require an impractically large number of interactions to perform well. This paper introduces the intelligent Cooperative Control Architecture (iCCA) as a framework for combining cooperative planners and reinforcement learning techniques. iCCA improves the policy of the cooperative planner, while reduces the risk and sample complexity of the learner. Empirical results in gridworld and task assignment for fuel-limited UAV domains with problem sizes up to 9 billion state-action pairs verify the advantage of iCCA over pure learning and planning strategies

DSpace@MIT

Crossref

Learning Classical Planning Strategies with Policy Gradient

Author: Alrajeh Dalal
Gomoluch Pawel
Russo Alessandra
Publication venue
Publication date: 07/02/2019
Field of study

A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner's state. Then, we train the system on randomly generated problems from five IPC domains using three different performance metrics. Our experimental results show that the learner is able to discover domain-specific search strategies, improving the planner's performance relative to the baselines of plain best-first search and a uniform policy.Comment: Accepted for ICAPS 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Simulating the use of macro-actions through action reordering

Author: Coles A.I.
Smith A.J.
Publication venue
Publication date: 01/01/2006
Field of study

The use of macro-actions in planning introduces a trade-off.. Macro-actions can offer search guidance by suggesting sequences of actions; but can potentially make search more expensive by increasing the branching factor. In this paper we present a technique for simulating the use of macro actions by altering the order in which actions are considered for application during enforced hill-climbing search. Actions are ordered based on the number of times they have occurred, in past solution plans, following the last action added to the plan. We demonstrate that the action-reordering technique used can offer improved search performance without the negative performance impacts often observed when using macro-actions

University of Strathclyde Institutional Repository

Evolving macro-actions for planning

Author: Levine J.
Newton M. A. H.
Publication venue
Publication date: 01/01/2007
Field of study

Domain re-engineering through macro-actions (i.e. macros) provides one potential avenue for research into learning for planning. However, most existing work learns macros that are reusable plan fragments and so observable from planner behaviours online or plan characteristics offline. Also, there are learning methods that learn macros from domain analysis. Nevertheless, most of these methods explore restricted macro spaces and exploit specific features of planners or domains. But, the learning examples, especially that are used to acquire previous experiences, might not cover many aspects of the system, or might not always reflect that better choices have been made during the search. Moreover, any specific properties are not likely to be common with many planners or domains. This paper presents an offline evolutionary method that learns macros for arbitrary planners and domains. Our method explores a wider macro space and learns macros that are somehow not observable from the examples. Our method also represents a generalised macro learning framework as it does not discover or utilise any specific structural properties of planners or domains

CiteSeerX

University of Strathclyde Institutional Repository

Strategic principles and capacity building for a whole-of-systems approaches to physical activity

Author: Allender Steve
Bellow Bill
Cavill Nick
Copeland Rob
Shearn Katie
Publication venue: The Australian Prevention Partnership Centre and The University of Sydney.
Publication date: 30/04/2020
Field of study

Sheffield Hallam University Research Archive

Planning through Automatic Portfolio Configuration: The PbP Approach

Author: Gerevini Alfonso Emilio
Saetti Alessandro
Vallati Mauro
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2014
Field of study

In the field of domain-independent planning, several powerful planners implementing different techniques have been developed. However, no one of these systems outperforms all others in every known benchmark domain. In this work, we propose a multi-planner approach that automatically configures a portfolio of planning techniques for each given domain. The configuration process for a given domain uses a set of training instances to: (i) compute and analyze some alternative sets of macro-actions for each planner in the portfolio identifying a (possibly empty) useful set, (ii) select a cluster of planners, each one with the identified useful set of macro-actions, that is expected to perform best, and (iii) derive some additional information for configuring the execution scheduling of the selected planners at planning time. The resulting planning system, called PbP (Portfolio- based Planner), has two variants focusing on speed and plan quality. Different versions of PbP entered and won the learning track of the sixth and seventh International Planning Competitions. In this paper, we experimentally analyze PbP considering planning speed and plan quality in depth. We provide a collection of results that help to understand PbP�s behavior, and demonstrate the effectiveness of our approach to configuring a portfolio of planners with macro-actions

Crossref

Archivio istituzionale della ricerca - Università di Brescia

University of Huddersfield Repository

Huddersfield Research Portal