5 research outputs found

    Argumentation accelerated reinforcement learning

    Get PDF
    Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people’s (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations — SARSA(λ)-based AARL and MAXQ-based AARL — in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.Open Acces

    Argumentation for machine learning: a survey

    Get PDF
    Existing approaches using argumentation to aid or improve machine learning differ in the type of machine learning technique they consider, in their use of argumentation and in their choice of argumentation framework and semantics. This paper presents a survey of this relatively young field highlighting, in particular, its achievements to date, the applications it has been used for as well as the benefits brought about by the use of argumentation, with an eye towards its future

    Postulates for logic-based argumentation systems

    Get PDF
    International audienceLogic-based argumentation systems are developed for reasoning with inconsistent information. Starting from a knowledge base encoded in a logical language, they define arguments and attacks between them using the consequence operator associated with the language. Finally, a semantics is used for evaluating the arguments. In this paper, we focus on systems that are based on deductive logics and that use Dung's semantics. We investigate rationality postulates that such systems should satisfy. We define five intuitive postulates: consistency and closure under the consequence operator of the underlying logic of the set of conclusions of arguments of each extension, closure under sub-arguments and exhaustiveness of the extensions, and a free precedence postulate ensuring that the free formulas of the knowledge base (i.e., the ones that are not involved in inconsistency) are conclusions of arguments in every extension. We study the links between the postulates and explore conditions under which they are guaranteed or violated

    Proceedings of The Multi-Agent Logics, Languages, and Organisations Federated Workshops (MALLOW 2010)

    Get PDF
    http://ceur-ws.org/Vol-627/allproceedings.pdfInternational audienceMALLOW-2010 is a third edition of a series initiated in 2007 in Durham, and pursued in 2009 in Turin. The objective, as initially stated, is to "provide a venue where: the cost of participation was minimum; participants were able to attend various workshops, so fostering collaboration and cross-fertilization; there was a friendly atmosphere and plenty of time for networking, by maximizing the time participants spent together"

    Lernbeiträge im Rahmen einer kognitiven Architektur für die intelligente Prozessführung

    Get PDF
    In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär um die Merkmalsextraktion, das Reinforcement Learning und das Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus. Wichtige Beiträge sind dabei verschiedene residuumsbasierte Ansätze zur hybriden Merkmalsselektion, ein Algorithmus zur Behandlung des Explorations-Exploitation-Dilemmas in kontinuierlichen Aktionsräumen, Untersuchungen zum Rewarddekompositionsproblem, sowie die Verzahnung der einzelnen Komponenten einer funktionierenden Architektur. Der experimentelle Nachweis, dass das vorgestellte System die Lösung für reale Probleme erlernen kann, wird am herausfordernden Szenario der intelligenten Feuerungsführung erbracht. Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten Kraftwerks eingesetzt. Dabei wurden Ergebnisse erzielt, die bisher existierende Systeme und auch menschliche Experten deutlich übertreffen.In this thesis, important aspects of a cognitive architecture for learning control tasks are discussed. Highlighted are the topics of feature extraction, reinforcement learning and learning management in the context of the perception-action-cycle. The contributions in the field of feature extraction utilize informationtheoretic measures such as mutual information to formulate new hybrid feature extraction algorithms. Finding features that are explicitly linked with the errors made by a learning system are the focus. It is shown this approach based on residuals is superior to classical methods. Another topic of interest is the estimation of mutual information in the context of feature extraction. State of the art reinforcement learning methods are investigated for their suitability for challenging applications. This work addresses issues of learning management, such as the exploration-exploitation dilemma, the plasticity-stability dilemma and the reward decomposition problem. New contributions are made in the form of the diffusion tree-based reinforcement learning algorithm and the SMILE approach. Likewise, an architectural extension is proposed to organize the learning process. It uses a process map as the core piece to achieve this organization. Experimental evidence that the proposed system can learn the solution to real problems is presented in the challenging scenario of intelligent combustion control. The system is used to learn a control strategy in a coal-fired power plant. The achieved results surpass existing systems and human experts.In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär um die Merkmalsextraktion, das Reinforcement Learning und das Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus. Für die Merkmalsextraktion werden dabei mit Hilfe informationstheoretischer Größen, wie der Transinformation, neue hybride Merkmalsextraktionsverfahren vorgestellt. Neuartig ist dabei der Ansatz, Merkmale zu suchen, die explizit mit den gemachten Fehlern eines lernenden Systems verknüpft sind. Es wird gezeigt, dass diese residuumsbasierten Ansätze klassischen Methoden überlegen sind. Es wird ebenfalls untersucht, welche Schätzverfahren für die Bestimmung der Transinformation im Sinne der Merkmalsextraktion geeignet sind. Als Entscheidungsinstanz der Gesamtarchitektur werden aktuelle Reinforcement Learning Verfahren auf ihre Eignung für komplexe Anwendungen hin untersucht. Dabei wird auch auf Probleme des Lernmanagements, wie das Explorations-Exploitations-Dilemma, das Stabilitäts-Plastizitäts-Dilemma und das Rewarddekompositionsproblem eingegangen. Neue Beiträge werden dabei in Form des Diffusionsbaumbasiertes Reinforcement Learning und des SMILE-Algorithmus geliefert. Ebenso wird eine Architekturerweiterung zum Organisieren der Lernprozesse vorgeschlagen, welche im Kern um eine Prozesskarte angeordnet ist. Der experimentelle Nachweis, dass das vorgestellte System die Lösung für reale Probleme erlernen kann, wird am herausfordernden Szenarioder intelligenten Feuerungsführung erbracht. Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten Kraftwerks eingesetzt, wobei Ergebnisse erzielt werden, die bisher existierende Systeme und auch menschliche Experten übertreffen
    corecore