5 research outputs found
Argumentation accelerated reinforcement learning
Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people’s (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations — SARSA(λ)-based AARL and MAXQ-based AARL — in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.Open Acces
Argumentation for machine learning: a survey
Existing approaches using argumentation to aid or improve machine learning differ in the type of machine learning technique they consider, in their use of argumentation and in their choice of argumentation framework and semantics. This paper presents a survey of this relatively young field highlighting, in particular, its achievements to date, the applications it has been used for as well as the benefits brought about by the use of argumentation, with an eye towards its future
Postulates for logic-based argumentation systems
International audienceLogic-based argumentation systems are developed for reasoning with inconsistent information. Starting from a knowledge base encoded in a logical language, they define arguments and attacks between them using the consequence operator associated with the language. Finally, a semantics is used for evaluating the arguments. In this paper, we focus on systems that are based on deductive logics and that use Dung's semantics. We investigate rationality postulates that such systems should satisfy. We define five intuitive postulates: consistency and closure under the consequence operator of the underlying logic of the set of conclusions of arguments of each extension, closure under sub-arguments and exhaustiveness of the extensions, and a free precedence postulate ensuring that the free formulas of the knowledge base (i.e., the ones that are not involved in inconsistency) are conclusions of arguments in every extension. We study the links between the postulates and explore conditions under which they are guaranteed or violated
Proceedings of The Multi-Agent Logics, Languages, and Organisations Federated Workshops (MALLOW 2010)
http://ceur-ws.org/Vol-627/allproceedings.pdfInternational audienceMALLOW-2010 is a third edition of a series initiated in 2007 in Durham, and pursued in 2009 in Turin. The objective, as initially stated, is to "provide a venue where: the cost of participation was minimum; participants were able to attend various workshops, so fostering collaboration and cross-fertilization; there was a friendly atmosphere and plenty of time for networking, by maximizing the time participants spent together"
Lernbeiträge im Rahmen einer kognitiven Architektur für die intelligente Prozessführung
In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär um die Merkmalsextraktion, das Reinforcement Learning und das Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus.
Wichtige Beiträge sind dabei verschiedene residuumsbasierte Ansätze zur hybriden Merkmalsselektion, ein Algorithmus zur Behandlung des Explorations-Exploitation-Dilemmas in kontinuierlichen Aktionsräumen, Untersuchungen zum Rewarddekompositionsproblem, sowie die Verzahnung der einzelnen Komponenten einer funktionierenden Architektur.
Der experimentelle Nachweis, dass das vorgestellte System die Lösung für reale Probleme erlernen kann, wird am herausfordernden Szenario der intelligenten Feuerungsführung erbracht. Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten Kraftwerks eingesetzt. Dabei wurden Ergebnisse erzielt, die bisher existierende Systeme und auch menschliche Experten deutlich übertreffen.In this thesis, important aspects of a cognitive architecture for learning
control tasks are discussed. Highlighted are the topics of feature
extraction, reinforcement learning and learning management in the context
of the perception-action-cycle. The contributions in the field of feature
extraction utilize informationtheoretic measures such as mutual information
to formulate new hybrid feature extraction algorithms. Finding features
that are explicitly linked with the errors made by a learning system are
the focus. It is shown this approach based on residuals is superior to
classical methods. Another topic of interest is the estimation of mutual
information in the context of feature extraction. State of the art
reinforcement learning methods are investigated for their suitability for
challenging applications. This work addresses issues of learning
management, such as the exploration-exploitation dilemma, the
plasticity-stability dilemma and the reward decomposition problem. New
contributions are made in the form of the diffusion tree-based
reinforcement learning algorithm and the SMILE approach. Likewise, an
architectural extension is proposed to organize the learning process. It
uses a process map as the core piece to achieve this organization.
Experimental evidence that the proposed system can learn the solution to
real problems is presented in the challenging scenario of intelligent
combustion control. The system is used to learn a control strategy in a
coal-fired power plant. The achieved results surpass existing systems and
human experts.In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur
für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär
um die Merkmalsextraktion, das Reinforcement Learning und das
Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus. FĂĽr die
Merkmalsextraktion werden dabei mit Hilfe informationstheoretischer
Größen, wie der Transinformation, neue hybride
Merkmalsextraktionsverfahren vorgestellt. Neuartig ist dabei der Ansatz,
Merkmale zu suchen, die explizit mit den gemachten Fehlern eines lernenden
Systems verknĂĽpft sind. Es wird gezeigt, dass diese residuumsbasierten
Ansätze klassischen Methoden überlegen sind. Es wird ebenfalls
untersucht, welche Schätzverfahren für die Bestimmung der
Transinformation im Sinne der Merkmalsextraktion geeignet sind. Als
Entscheidungsinstanz der Gesamtarchitektur werden aktuelle Reinforcement
Learning Verfahren auf ihre Eignung fĂĽr komplexe Anwendungen hin
untersucht. Dabei wird auch auf Probleme des Lernmanagements, wie das
Explorations-Exploitations-Dilemma, das Stabilitäts-Plastizitäts-Dilemma
und das Rewarddekompositionsproblem eingegangen. Neue Beiträge werden
dabei in Form des Diffusionsbaumbasiertes Reinforcement Learning und des
SMILE-Algorithmus geliefert. Ebenso wird eine Architekturerweiterung zum
Organisieren der Lernprozesse vorgeschlagen, welche im Kern um eine
Prozesskarte angeordnet ist. Der experimentelle Nachweis, dass das
vorgestellte System die Lösung für reale Probleme erlernen kann, wird am
herausfordernden Szenarioder intelligenten FeuerungsfĂĽhrung erbracht.
Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten
Kraftwerks eingesetzt, wobei Ergebnisse erzielt werden, die bisher
existierende Systeme und auch menschliche Experten ĂĽbertreffen