Search CORE

594 research outputs found

Learning from Scarce Experience

Author: Peshkin Leonid
Shelton Christian R.
Publication venue
Publication date: 01/01/2002
Field of study

Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the results of following that very policy. This requires a large number of interactions with the environment as different polices are considered. We present a family of algorithms based on likelihood ratio estimation that use data gathered when executing one policy (or collection of policies) to estimate the value of a different policy. The algorithms combine estimation and optimization stages. The former utilizes experience to build a non-parametric representation of an optimized function. The latter performs optimization on this estimate. We show positive empirical results and provide the sample complexity bound.Comment: 8 pages 4 figure

arXiv.org e-Print Archive

CiteSeerX

Reinforcement learning for efficient network penetration testing

Author: Chen T.
Ghanem M.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Penetration testing (also known as pentesting or PT) is a common practice for actively assessing the defenses of a computer network by planning and executing all possible attacks to discover and exploit existing vulnerabilities. Current penetration testing methods are increasingly becoming non-standard, composite and resource-consuming despite the use of evolving tools. In this paper, we propose and evaluate an AI-based pentesting system which makes use of machine learning techniques, namely reinforcement learning (RL) to learn and reproduce average and complex pentesting activities. The proposed system is named Intelligent Automated Penetration Testing System (IAPTS) consisting of a module that integrates with industrial PT frameworks to enable them to capture information, learn from experience, and reproduce tests in future similar testing cases. IAPTS aims to save human resources while producing much-enhanced results in terms of time consumption, reliability and frequency of testing. IAPTS takes the approach of modeling PT environments and tasks as a partially observed Markov decision process (POMDP) problem which is solved by POMDP-solver. Although the scope of this paper is limited to network infrastructures PT planning and not the entire practice, the obtained results support the hypothesis that RL can enhance PT beyond the capabilities of any human PT expert in terms of time consumed, covered attacking vectors, accuracy and reliability of the outputs. In addition, this work tackles the complex problem of expertise capturing and re-use by allowing the IAPTS learning module to store and re-use PT policies in the same way that a human PT expert would learn but in a more efficient way

London Met Repository

University of Liverpool Repository

City Research Online

A principled information valuation for communications during multi-agent coordination

Author: Gerding Enrico
Jennings Nick
Williamson Simon
Publication venue
Publication date: 01/01/2008
Field of study

Decentralised coordination in multi-agent systems is typically achieved using communication. However, in many cases, communication is expensive to utilise because there is limited bandwidth, it may be dangerous to communicate, or communication may simply be unavailable at times. In this context, we argue for a rational approach to communication --- if it has a cost, the agents should be able to calculate a value of communicating. By doing this, the agents can balance the need to communicate with the cost of doing so. In this research, we present a novel model of rational communication that uses information theory to value communications, and employ this valuation in a decision theoretic coordination mechanism. A preliminary empirical evaluation of the benefits of this approach is presented in the context of the RoboCupRescue simulator

Southampton (e-Prints Soton)

Monte Carlo Planning method estimates planning horizons during interactive social exchange

Author: Dayan P
Hula A
Montague PR
Publication venue
Publication date: 01/06/2015
Field of study

Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference

UCL Discovery

MPG.PuRe

Efficient Bayesian Planning

Author: Grover Divya
Publication venue
Publication date: 01/01/2022
Field of study

Artificial Intelligence (AI) is a long-studied and yet very active field of research. The list of things differentiating humans from AI grows thinner but the dream of an artificial general intelligence remains elusive. Sequential Decision Making is a subfield of AI that poses a seemingly benign question ``How to act optimally in an unknown environment?\u27\u27. This requires the AI agent to learn about its environment as well as plan an action sequence given its current knowledge about it. The two common problem settings are partial observability and unknown environment dynamics. Bayesian planning deals with these issues by simultaneously defining a single planning problem which considers the simultaneous effects of an action on both learning and goal search. The technique involves dealing with infinite tree data structures which are hard to store but essential for computing the optimal plan. Finally, we consider the minimax setting where the Bayesian prior is chosen by an adversary and therefore a worst case policy needs to be found.In this thesis, we present novel Bayesian planning algorithms. First, we propose DSS (Deeper, Sparser Sampling) for the case of unknown environment dynamics. It is a meta-algorithm derived from a simple insight about the Bayes rule, which beats the state-of-the-art across the board from discrete to continuous state settings. A theoretical analysis provides a high probability bound on its performance. Our analysis is different from previous approaches in the literature in terms of problem formulation and formal guarantees. The result also contrasts with those of previous comparable BRL algorithms, which typically provide asymptotic convergence guarantees. Suitable Bayesian models and their corresponding planners are proposed for implementing the discrete and continuous versions of DSS. We then address the issue of partial observability via our second algorithm, FMP (Finite Memory Planner). This uses depth-dependent partitioning of the infinite planning tree. Experimental results demonstrate comparable performance to the current state-of-the-art for both discrete and continuous settings. Finally, we propose algorithms for finding the best policy for the worst case belief in the Minimax Bayesian setting

Chalmers Research

Monte Carlo Planning method estimates planning horizons during interactive social exchange

Author: Dayan Peter
Hula Andreas
Montague P. Read
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/05/2015
Field of study

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central