Search CORE

300 research outputs found

Convergent Learning Algorithms for Unknown Reward Games

Author: Alex Rogers
Archie C. Chapman
David S. Leslie
Foster D. P.
Gottlob G.
Nicholas R. Jennings
Wolpert D. H.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Essays in learning, optimization and game theory

Author: Duvocelle Benoit (Georges Philippe)
Publication venue: 'University of Maastricht'
Publication date: 01/01/2021
Field of study

Maastricht University Research Portal

A survey of random processes with reinforcement

Author: Pemantle Robin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

The models surveyed include generalized P\'{o}lya urns, reinforced random walks, interacting urn models, and continuous reinforced processes. Emphasis is on methods and results, with sketches provided of some proofs. Applications are discussed in statistics, biology, economics and a number of other areas.Comment: Published at http://dx.doi.org/10.1214/07-PS094 in the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Faster Algorithms for Quantitative Analysis of Markov Chains and Markov Decision Processes with Small Treewidth

Author: Asadi Ali
Chatterjee Krishnendu
Goharshady Amir Kafshdar
Mohammadi Kiarash
Pavlogiannis Andreas
Publication venue
Publication date: 06/04/2020
Field of study

Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) are two standard formalisms in system analysis. Their main associated quantitative objectives are hitting probabilities, discounted sum, and mean payoff. Although there are many techniques for computing these objectives in general MCs/MDPs, they have not been thoroughly studied in terms of parameterized algorithms, particularly when treewidth is used as the parameter. This is in sharp contrast to qualitative objectives for MCs, MDPs and graph games, for which treewidth-based algorithms yield significant complexity improvements. In this work, we show that treewidth can also be used to obtain faster algorithms for the quantitative problems. For an MC with

n

states and

m

transitions, we show that each of the classical quantitative objectives can be computed in

O((n+m)\cdot t^2)

time, given a tree decomposition of the MC that has width

t

. Our results also imply a bound of

O(\kappa\cdot (n+m)\cdot t^2)

for each objective on MDPs, where

\kappa

is the number of strategy-iteration refinements required for the given input and objective. Finally, we make an experimental evaluation of our new algorithms on low-treewidth MCs and MDPs obtained from the DaCapo benchmark suite. Our experimental results show that on MCs and MDPs with small treewidth, our algorithms outperform existing well-established methods by one or more orders of magnitude

arXiv.org e-Print Archive

Hal-Diderot

Multi-Robot Path Planning for Persistent Monitoring in Stochastic and Adversarial Environments

Author: Asghar Ahmad Bilal
Publication venue: 'University of Waterloo'
Publication date: 15/04/2020
Field of study

In this thesis, we study multi-robot path planning problems for persistent monitoring tasks. The goal of such persistent monitoring tasks is to deploy a team of cooperating mobile robots in an environment to continually observe locations of interest in the environment. Robots patrol the environment in order to detect events arriving at the locations of the environment. The events stay at those locations for a certain amount of time before leaving and can only be detected if one of the robots visits the location of an event while the event is there. In order to detect all possible events arriving at a vertex, the maximum time spent by the robots between visits to that vertex should be less than the duration of the events arriving at that vertex. We consider the problem of finding the minimum number of robots to satisfy these revisit time constraints, also called latency constraints. The decision version of this problem is PSPACE-complete. We provide an O(log p) approximation algorithm for this problem where p is the ratio of the maximum and minimum latency constraints. We also present heuristic algorithms to solve the problem and show through simulations that a proposed orienteering-based heuristic algorithm gives better solutions than the approximation algorithm. We additionally provide an algorithm for the problem of minimizing the maximum weighted latency given a fixed number of robots. In case the event stay durations are not fixed but are drawn from a known distribution, we consider the problem of maximizing the expected number of detected events. We motivate randomized patrolling paths for such scenarios and use Markov chains to represent those random patrolling paths. We characterize the expected number of detected events as a function of the Markov chains used for patrolling and show that the objective function is submodular for randomly arriving events. We propose an approximation algorithm for the case where the event durations for all the vertices is a constant. We also propose a centralized and an online distributed algorithm to find the random patrolling policies for the robots. We also consider the case where the events are adversarial and can choose where and when to appear in order to maximize their chances of remaining undetected. The last problem we study in this thesis considers events triggered by a learning adversary. The adversary has a limited time to observe the patrolling policy before it decides when and where events should appear. We study the single robot version of this problem and model this problem as a multi-stage two player game. The adversary observes the patroller’s actions for a finite amount of time to learn the patroller’s strategy and then either chooses a location for the event to appear or reneges based on its confidence in the learned strategy. We characterize the expected payoffs for the players and propose a search algorithm to find a patrolling policy in such scenarios. We illustrate the trade off between hard to learn and hard to attack strategies through simulations

University of Waterloo's Institutional Repository

Intelligent Agents for Active Malware Analysis

Author: SARTEA RICCARDO
Publication venue
Publication date: 01/01/2020
Field of study

The main contribution of this thesis is to give a novel perspective on Active Malware Analysis modeled as a decision making process between intelligent agents. We propose solutions aimed at extracting the behaviors of malware agents with advanced Artificial Intelligence techniques. In particular, we devise novel action selection strategies for the analyzer agents that allow to analyze malware by selecting sequences of triggering actions aimed at maximizing the information acquired. The goal is to create informative models representing the behaviors of the malware agents observed while interacting with them during the analysis process. Such models can then be used to effectively compare a malware against others and to correctly identify the malware famil

Catalogo dei prodotti della ricerca

Multi-Automata Learning

Author: Nowe Ann
Peeters Maarten
Verbeeck Katja
Vrancx Peter
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

IntechOpen