Search CORE

98 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

A combinatorial flow-based formulation for temporal bin packing problems

Author: Furini F.
Martinovic J.
Strasdat N.
Valerio de Carvalho J.
Publication venue
Publication date: 01/01/2023
Field of study

We consider two neighboring generalizations of the classical bin packing problem: the temporal bin packing problem (TBPP) and the temporal bin packing problem with ﬁre-ups (TBPP-FU). In both cases, the task is to arrange a set of given jobs, characterized by a resource consumption and an activity window, on homogeneous servers of limited capacity. To keep operational costs but also energy consumption low, TBPP is concerned with minimizing the number of servers in use, whereas TBPP-FU additionally takes into account the switch-on processes required for their operation. Either way, challenging integer optimization problems are obtained, which can differ signiﬁcantly from each other despite the seemingly only marginal variation of the problems. In the literature, a branch-and-price method enriched with many preprocessing steps (for TBPP) and compact formulations (for TBPP-FU), beneﬁting from numerous reduction methods, have emerged as, currently, the most promising solution methods. In this paper, we introduce, in a sense, a uniﬁed solution framework for both problems (and, in fact, a wide variety of further interval scheduling applications) based on graph theory. Any scientiﬁc contributions in this direction failed so far because of the exponential size of the associated networks. The approach we present in this article does not change the theoretical exponentiality itself, but it can make it controllable by clever construction of the resulting graphs. In particular, for the ﬁrst time all classical benchmark instances (and even larger ones) for the two problems can be solved – in times that signiﬁcantly improve those of the previous approaches

Archivio della ricerca- Università di Roma La Sapienza

Coping polynomially with numerous but identical elements within planning problems

Author: Kanovich Max
Vauzeilles Jacqueline
Publication venue: Lecture Notes in Computer Science
Publication date: 01/01/2003
Field of study

Since the typical AI problem of making a plan of the actions to be performed by a robot so that it could get into a set of final situations, if it started with a certain initial situation, is generally exponential (it is even EXPTIME-complete in the case of games `Robot against Nature'), the planners are very sensitive to the number of variables, the inherent symmetry of the problem, and the nature of the logic formalisms being used. The paper shows that linear logic provides a convenient tool for representing planning problems. In particular, the paper focuses on planning problems with an unbounded number of functionally identical objects. We show that for such problems linear logic is especially effective and leads to dramatic contraction of the search space (polynomial instead of exponential). The paper addresses the key issue: ``How to automatically recognize functions similarity among objects and break the extreme combinatorial explosion caused by this symmetry,'' by means of replacing the unbounded number of specific names of objects with one generic name and contracting thereby the exponential search space over `real' objects to a small polynomial search space but over the `generic' one, with providing a more abstract formulation whose solutions are proved to be directly translatable into (optimal) polytime solutions to the original planning problem

HAL-Paris 13

35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

Author: STACS
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/02/2018
Field of study

Digitale Bibliothek Thüringen

A non-deterministic approach to dynamic layout planning of flexible manufacturing systems

Author: Hatami Khosrowshahi S. R.
Publication venue
Publication date
Field of study

A new approach to the dynamic layout planning problem is proposed which provides solutions to highly variable material flow patterns occurring over a multi-period planning horizon and is especially suitable for flexible manufacturing systems. A non-deterministic environment is considered in which there is assumed to be uncertainty in the future material flow data. The performance of the method is assessed by comparing the solution produced by this method with a set of data provided in the literature for which the claimed optimal solution is known. There is close agreement with the stated solution and the result is obtained with a fraction of the computational effort. The computational efficiency is due to a new construction method to generate static layout solutions. This method uses an algorithm in which the number of stages is proportional to the number of facilities rather than an exponentional relationship as found in most other methods. The method also uses an element of forward planning to ensure that early location assignments provide minimum restriction to assignments made later in the procedure. Results of extensive tests show that the new static layout planning procedure produces solutions generally better than existing construction techniques and comparable with improvement techniques such as CRAFT. The execution speed of the procedure makes it possible to solve large scale problems ( >30 )in very short time scales on Microcomputers. Incorporation of the fast new construction method into dynamic layout planning allows decision making concerning when and how to re-layout facilities in response to changes in predicted material flow

Warwick Research Archives Portal Repository

Models and algorithms for telecommunication network design.

Author: Leensel Robert Laurentius Maria Johannes van de
Publication venue
Publication date
Field of study

Research Papers in Economics

Tools and Techniques for Decision Tree Learning

Author: Elomaa Tapio
Publication venue: Helsingfors universitet
Publication date: 01/05/1996
Field of study

Decision tree learning is an important field of machine learning. In this study we examine both formal and practical aspects of decision tree learning. We aim at answering to two important needs: The need for better motivated decision tree learners and an environment facilitating experimentation with inductive learning algorithms. As results we obtain new practical tools and useful techniques for decision tree learning. First, we derive the practical decision tree learner Rank based on the Findmin protocol of Ehrenfeucht and Haussler. The motivation for the changes introduced to the method comes from empirical experience, but we prove the correctness of the modifications in the probably approximately correct learning framework. The algorithm is enhanced by extending it to operate in the multiclass situations, making it capable of working within the incremental setting, and providing noise tolerance into it. Together these modifications entail practicability through a formal development..

CiteSeerX

Helsingin yliopiston digitaalinen arkisto