69 research outputs found

    Risk-sensitive reinforcement learning applied to control under constraints

    Get PDF
    In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1

    Synthese von Steuerprogrammen durch Klassifizierungslernen am Beispiel der Stabilisierungssteuerung von Nachrichtensatelliten

    Get PDF
    Verfahren des Maschinellen Lernens haben heute eine Reife erreicht, die zu ersten erfolgreichen industriellen Anwendungen geführt hat. In der Prozessdiagnose und -steuerung ermöglichen Lernverfahren die Klassifikation und Bewertung von Betriebszuständen, d.h. eine Grobmodellierung eines Prozesses, wenn dieser nicht oder nur teilweise mathematisch beschreibbar ist. Ausserdem gestatten Lernverfahren die automatische Generierung von Klassifizierungsprozeduren, die deterministisch abgearbeitet werden und daher für die Belange der Echtzeitdiagnose und -steuerung u.U. zeiteffektiver als Inferenzmechanismen auf logischer bzw. Produktionsregelbasis sind, da letztere immer mit zeitaufwendigen Suchprozessen verbunden sind

    Induktion von rekursiven Programmschemata und analoges Lernen

    No full text
    Wir stellen einen Ansatz zum Erwerb von Problemlösefertigkeiten vor, der auf dem im Bereich ``Automatisches Programmieren'' entwickelten Prinzip der Induktion rekursiver Programmschemata basiert. Dieser Ansatz ermöglicht es, drei Ebenen der Generalisierung zu modellieren: In einem ersten Schritt wird aufgrund konkreter Problemlöseerfahrungen ein bedingtes (initiales) Programm aufgebaut, das über die Problemlösungen für bereits explorierte Anfangszustände eines Problems generalisiert (learning by doing). Im zweiten Schritt wird aus diesem initialen Programm ein rekursives Programmschema inferiert. Dies entspricht einer Generalisierung über rekursiv aufzählbare Problemräume. In einem dritten Schritt kann von der konkreten Bedeutung der Operationssymbole eines Programmschemas abstrahiert werden. Die Struktur des rekursiven Programmschemas generalisiert über die Klasse strukturgleicher Probleme (learning by analogy). Mit diesem Ansatz ist es möglich, learning by doing und learning by analogy in einem einheitlichen Rahmen zu beschreiben. Dabei wird nicht nur die Nutzung, sondern auch der Aufbau von (Programm-) Schemata modelliert. Schließlich liefert der Ansatz der induktiven Programmsynthese eine theoretisch fundierte formale Basis für den Erwerb von Problemlösefertigkeiten

    Skill acquisition can be regarded as program synthesis: An integrative approach to learning by doing and learning by analogy

    No full text
    In this paper we propose an approach to skill acquisition which is based on a technique for inductive program synthesis developed in the domain of automatic programming. This approach enables us to model skill acquisition as generalization on three levels: In a first step, learning by doing is performed by generalizing over problem states which were explored when solving a given problem. This process is similar to compilation or chunking of production rules. But in contrast to these approaches, we represent procedural knowledge as conditional programs. In a second step, descriptive generalization of the initial conditional program is performed. A recursive program scheme is constructed which generalizes over recursive enumerable problem spaces. In a third step, learning by analogy is performed by abstracting from the concrete semantic of the operation symbols contained in a recursive program scheme. The abstract scheme represents the class of structurally identical problems. By describing, how problem schemes can be constructed as generalization over knowledge gained during solving concrete problems, our approach gives an unifying framework for describing learning by doing and learning by analogy. Additionally, we consider the acquisition of some types of motor and process control behavior as a special variant of the acquisition of problem solving skills, and demonstrate, how acquisition of behavioral skills can be integrated in our framework

    Relational Learning with Decision Trees

    No full text
    Abstract. In this paper, we describe two different learning tasks for relational structures. When learning a classifier for structures, the relational structures in the training sets are classified as a whole. Contrarily, when learning a context dependent classifier for elementary objects, the elementary objects of the relational structures in the training set are classified. We investigate the question how such classifications can be induced automatically from a given training set containing classified structures or classified elementary objects respectively. We present an algorithm based on fast graph isomorphism testing that allows the description of the objects in the training set by automatically constructed attributes. This allows us to employ wellknown methods of decision tree induction to construct a hypothesis. We describe new simplification and structure reconstruction techniques for the learned structural decision tree. We present the system INDIGO and evaluate it on the Mesh and the Mutagenicity Data.
    corecore