Search CORE

402 research outputs found

Temporal Difference Learning in Complex Domains

Author: Smith Martin C.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/1999
Field of study

PhDThis thesis adapts and improves on the methods of TD(k) (Sutton 1988) that were successfully used for backgammon (Tesauro 1994) and applies them to other complex games that are less amenable to simple pattem-matching approaches. The games investigated are chess and shogi, both of which (unlike backgammon) require significant amounts of computational effort to be expended on search in order to achieve expert play. The improved methods are also tested in a non-game domain. In the chess domain, the adapted TD(k) method is shown to successfully learn the relative values of the pieces, and matches using these learnt piece values indicate that they perform at least as well as piece values widely quoted in elementary chess books. The adapted TD(X) method is also shown to work well in shogi, considered by many researchers to be the next challenge for computer game-playing, and for which there is no standardised set of piece values. An original method to automatically set and adjust the major control parameters used by TD(k) is presented. The main performance advantage comes from the learning rate adjustment, which is based on a new concept called temporal coherence. Experiments in both chess and a random-walk domain show that the temporal coherence algorithm produces both faster learning and more stable values than both human-chosen parameters and an earlier method for learning rate adjustment. The methods presented in this thesis allow programs to learn with as little input of external knowledge as possible, exploring the domain on their own rather than by being taught. Further experiments show that the method is capable of handling many hundreds of weights, and that it is not necessary to perform deep searches during the leaming phase in order to learn effective weight

Queen Mary Research Online

Temoral Difference Learning in Complex Domains

Author: Smith Martin C.
Publication venue
Publication date: 30/12/2013
Field of study

Submitted to the University of London for the Degree of Doctor of Philosophy in Computer Scienc

Queen Mary Research Online

Decentralized Multi-Agent Production Control through Economic Model Bidding for Matrix Production Systems

Author: Kiefer Lars
Kuhnle Andreas
Lanza Gisela
May Marvin Carl
Stricker Nicole
Publication venue: Elsevier
Publication date: 15/02/2021
Field of study

Due to increasing demand for unique products, large variety in product portfolios and the associated rise in individualization, the efficient use of resources in traditional line production dwindles. One answer to these new challenges is the application of matrix-shaped layouts with multiple production cells, called Matrix Production Systems. The cycle time independence and redundancy of production cell capabilities within a Matrix Production System enable individual production paths per job for Flexible Mass Customisation. However, the increased degrees of freedom strengthen the need for reliable production control systems compared to traditional production systems such as line production. Beyond reliability a need for intelligent production within a smart factory in order to ensure goal-oriented production control under ever-changing manufacturing conditions can be ascertained. Learning-based methods can leverage condition-based reactions for goal-oriented production control. While centralized control performs well in single-objective situations, it is hard to achieve contradictory targets for individual products or resources. Hence, in order to master these challenges, a production control concept based on a decentralized multi-agent bidding system is presented. In this price-based model, individual production agents - jobs, production cells and transport system - interact based on an economic model and attempt to maximize monetary revenues. Evaluating the application of learning and priority-based control policies shows that decentralized multi-agent production control can outperform traditional approaches for certain control objectives. The introduction of decentralized multi-agent reinforcement learning systems is a starting point for further research in this area of intelligent production control within smart manufacturing

KITopen

Conditional Partial Plans for Rational Situated Agents Capable of Deductive Reasoning and Inductive Learning

Author: Nowaczyk Slawomir
Publication venue
Publication date: 01/01/2008
Field of study

Rational, autonomous agents that are able to achieve their goals in dynamic, partially observable environments are the ultimate dream of Artificial Intelligence research since its beginning. The goal of this PhD thesis is to propose, develop and evaluate a framework well suited for creating intelligent agents that would be able to learn from experience, thus becoming more efficient at solving their tasks. We aim to create an agent able to function in adverse environments that it only partially understands. We are convinced that symbolic knowledge representations are the best way to achieve such versatility. In order to balance deliberation and acting, our agent needs to be emph{time-aware}, i.e. it needs to have the means to estimate its own reasoning and acting time. One of the crucial challenges is to ensure smooth interactions between the agent's internal reasoning mechanism and the learning system used to improve its behaviour. In order to address it, our agent will create several different conditional partial plans and reason about the potential usefulness of each one. Moreover it will generalise whatever experience it gathers and use it when solving subsequent, similar, problem instances. In this thesis we present on the conceptual level an architecture for rational agents, as well as implementation-based experimental results confirming that a successful lifelong learning of an autonomous artificial agent can be achieved using it

Lund University Publications

Learning programs with magic values

Author: Cropper Andrew
Hocquette Céline
Publication venue
Publication date: 01/10/2022
Field of study

A magic value in a program is a constant symbol that is essential for the execution of the program but has no clear explanation for its choice. Learning programs with magic values is difficult for existing program synthesis approaches. To overcome this limitation, we introduce an inductive logic programming approach to efficiently learn programs with magic values. Our experiments on diverse domains, including program synthesis, drug design, and game playing, show that our approach can (i) outperform existing approaches in terms of predictive accuracies and learning times, (ii) learn magic values from infinite domains, such as the value of pi, and (iii) scale to domains with millions of constant symbols

arXiv.org e-Print Archive

Mind the Gap between Demand and Supply. A behavioral perspective on demand forecasting

Author: Protzner Stefanie
Publication venue: Erasmus University Rotterdam (EUR)
Publication date: 08/01/2016
Field of study

EUR Research Repository

Efficient instance and hypothesis space revision in Meta-Interpretive Learning

Author: Hocquette Céline
Publication venue: Computing, Imperial College London
Publication date: 01/05/2022
Field of study

Inductive Logic Programming (ILP) is a form of Machine Learning. The goal of ILP is to induce hypotheses, as logic programs, that generalise training examples. ILP is characterised by a high expressivity, generalisation ability and interpretability. Meta-Interpretive Learning (MIL) is a state-of-the-art sub-field of ILP. However, current MIL approaches have limited efficiency: the sample and learning complexity respectively are polynomial and exponential in the number of clauses. My thesis is that improvements over the sample and learning complexity can be achieved in MIL through instance and hypothesis space revision. Specifically, we investigate 1) methods that revise the instance space, 2) methods that revise the hypothesis space and 3) methods that revise both the instance and the hypothesis spaces for achieving more efficient MIL. First, we introduce a method for building training sets with active learning in Bayesian MIL. Instances are selected maximising the entropy. We demonstrate this method can reduce the sample complexity and supports efficient learning of agent strategies. Second, we introduce a new method for revising the MIL hypothesis space with predicate invention. Our method generates predicates bottom-up from the background knowledge related to the training examples. We demonstrate this method is complete and can reduce the learning and sample complexity. Finally, we introduce a new MIL system called MIGO for learning optimal two-player game strategies. MIGO learns from playing: its training sets are built from the sequence of actions it chooses. Moreover, MIGO revises its hypothesis space with Dependent Learning: it first solves simpler tasks and can reuse any learned solution for solving more complex tasks. We demonstrate MIGO significantly outperforms both classical and deep reinforcement learning. The methods presented in this thesis open exciting perspectives for efficiently learning theories with MIL in a wide range of applications including robotics, modelling of agent strategies and game playing.Open Acces

Spiral - Imperial College Digital Repository

Mind the Gap between Demand and Supply. A behavioral perspective on demand forecasting

Author: Protzner Stefanie
Publication venue: Erasmus University Rotterdam (EUR)
Publication date: 08/01/2016
Field of study

EUR Research Repository

Sparkle: toward accessible meta-algorithmics for improving the state of the art in solving challenging problems

Author: Blom K. van der
Hoos H.H.
Luo C.
Rook J.G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2022
Field of study

Many fields of computational science advance through improvements in the algorithms used for solving key problems. These advancements are often facilitated by benchmarks and competitions that enable performance comparisons and rankings of solvers. Simultaneously, meta-algorithmic techniques, such as automated algorithm selection and configuration, enable performance improvements by utilizing the complementary strengths of different algorithms or configurable algorithm components. In fact, meta-algorithms have become major drivers in advancing the state of the art in solving many prominent computational problems. However, meta-algorithmic techniques are complex and difficult to use correctly, while their incorrect use may reduce their efficiency, or in extreme cases, even lead to performance losses. Here, we introduce the Sparkle platform, which aims to make meta-algorithmic techniques more accessible to nonexpert users, and to make these techniques more broadly available in the context of competitions, to further enable the assessment and advancement of the true state of the art in solving challenging computational problems. To achieve this, Sparkle implements standard protocols for algorithm selection and configuration that support easy and correct use of these techniques. Following an experiment, Sparkle generates a report containing results, problem instances, algorithms, and other relevant information, for convenient use in scientific publications.Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications

University of Twente Research Information