Search CORE

34 research outputs found

Opponent modelling in the game of tron using reinforcement learning

Author: Drugan M.M.
Knegt S.J.L.
Wiering M.A.
Publication venue: SciTePress
Publication date: 01/01/2018
Field of study

In this paper we propose the use of vision grids as state representation to learn to play the game Tron using neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to predict the opponent’s next move. The learned model of the opponent is subsequently used in Monte-Carlo roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a certain action. Finally, we compare the performance using two different activation functions in the multi-layer perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly increase learning speed and in most cases this also increases the agent’s performance compared to when the full grid is used as state representation. Finally, the opponent modelling technique allows the agent to learn a predictive model of the opponent’s actions, which in combination with Monte-Carlo roll-outs significantly increases the agent’s performance

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Local Optimal Sets and Bounded Archiving on Multi-objective NK-Landscapes with Correlated Objectives

Author: H.E. Aguirre
K. Bringmann
M.M. Drugan
J. Dubois-Lacoste
M. Laumanns
M. López-Ibáñez
T. Lust
L. Paquete
L. Paquete
L. Paquete
S. Verel
E. Zitzler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The properties of local optimal solutions in multi-objective combinatorial optimization problems are crucial for the effectiveness of local search algorithms, particularly when these algorithms are based on Pareto dominance. Such local search algorithms typically return a set of mutually nondominated Pareto local optimal (PLO) solutions, that is, a PLO-set. This paper investigates two aspects of PLO-sets by means of experiments with Pareto local search (PLS). First, we examine the impact of several problem characteristics on the properties of PLO-sets for multi-objective NK-landscapes with correlated objectives. In particular, we report that either increasing the number of objectives or decreasing the correlation between objectives leads to an exponential increment on the size of PLO-sets, whereas the variable correlation has only a minor effect. Second, we study the running time and the quality reached when using bounding archiving methods to limit the size of the archive handled by PLS, and thus, the maximum size of the PLO-set found. We argue that there is a clear relationship between the running time of PLS and the difficulty of a problem instance.Comment: appears in Parallel Problem Solving from Nature - PPSN XIII, Ljubljana : Slovenia (2014

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Local Optimal Sets and Bounded Archiving on Multi-objective NK-Landscapes with Correlated Objectives

Author: E. Zitzler
H.E. Aguirre
J. Dubois-Lacoste
K. Bringmann
L. Paquete
L. Paquete
L. Paquete
M. Laumanns
M. López-Ibáñez
M.M. Drugan
S. Verel
T. Lust
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

A Bayesian model for anomaly detection in SQL databases for security systems

Author: Drugan M.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We focus on automatic anomaly detection in SQL databases for security systems. Many logs of database systems, here the Townhall database, contain detailed information about users, like the SQL queries and the response of the database. A database is a list of log instances, where each log instance is a Cartesian product of feature values with an attached anomaly score. All log instances with the anomaly score in the top percentile are identified as anomalous. Our contribution is multi-folded. We define a model for anomaly detection of SQL databases that learns the structure of Bayesian networks from data. Our method for automatic feature extraction generates the maximal spanning tree to detect the strongest similarities between features. Novel anomaly scores based on the joint probability distribution of the database features and the log-likelihood of the maximal spanning tree detect both point and contextual anomalies. Multiple anomaly scores are combined within a robust anomaly analysis algorithm. We validate our method on the Townhall database showing the performance of our anomaly detection algorithm

Repository TU/e

Pure OAI Repository

Synergies between evolutionary algorithms and reinforcement learning

Author: Drugan M.M.
Publication venue: Association for Computing Machinery, Inc
Publication date: 11/07/2015
Field of study

A recent trend in evolutionary algorithms (EAs) transfers expertise from and to other areas of machine learning. An interesting novel symbiosis considers: i) reinforcement learning (RL), which learns on-line and off-line difficult dynamic elaborated tasks requiring lots of computational resources, and ii) EAs with the main strength its eloquence and computational efficiency. These two techniques address the same problem of reward maximization in difficult environments that can include stochasticity. Sometimes, they exchange techniques in order to improve their theoretical and empirical efficiency, like computational speed for on-line learning, and robust behaviour for the off-line optimisation algorithms. For example, multi-objective RL uses tuples of rewards instead of a single reward value and techniques from multi-objective EAs should be integrated for an efficient exploration/exploitation trade-off. The problem of selecting the best genetic operator is similar to the problem an agent faces when choosing between alternatives in achieving its goal of maximising its cumulative expected reward. Practical approaches select the RL method that solve the best online operator selection problem.</p

Pure OAI Repository

Generating QAP instances with known optimum solution and additively decomposable cost function

Author: Drugan M.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Quadratic assignment problems (QAPs) is a NP-hard combinatorial optimization problem. QAPs are often used to compare the performance of meta-heuristics. In this paper, we propose a QAP problem instance generator that can be used for benchmarking for heuristic algorithms. Our QAP generator combines small size QAPs with known optimum solution into a larger size QAP instance. We call these instances composite QAPs (cQAPs), and we show that the cost function of cQAPs is additively decomposable. We give mild conditions for which a cQAP instance has known optimum solution. We generate cQAP instances using uniform distributions with different bounds for the component QAPs and for the rest of the cQAP elements. Numerical and analytical techniques that measure the difficulty of the cQAP instances in comparison with other QAPs from the literature are introduced. These methods point out that some cQAP instances are difficult for local search with many local optimum of various values, low epistasis and non-trivial asymptotic behaviour

Repository TU/e

Pure OAI Repository

Approximative Pareto front identification

Author: Drugan M.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Pure OAI Repository

Stochastic pareto local search for many objective quadratic assignment problem instances

Author: Drugan M.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/09/2015
Field of study

Optimising in many-objective search spaces, i.e. search spaces with more than three objectives, is a challenging task. Scalarization functions transform the multi-objective search space into a single objective search space. In order to scale-up optimisation in many-objective search spaces, we use Cartesian product of scalarization functions or simpler product functions to reduce the number of objectives of the search space. Stochastic product local search (SprLS) uses product functions to evaluate solutions within a local search run with the goal of generating the entire Pareto front. To improve the performance of SprLS algorithms: 1) we pursuit a fixed set the product function that improves the most the performance of the algorithm, or 2) we adapt the direction of the component scalarization functions using solutions in the current (possible suboptimal) Pareto front. We compare the performance of local search algorithms on many-objective quadratic assignment instances with correlated flow matrices

Pure OAI Repository

Correlated Gaussian multi-objective multi-armed bandit across arms algorithm

Author: Drugan M.M.
Yahyaa S.Q.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 07/01/2016
Field of study

Stochastic multi-objective multi-Armed bandit problem, (MOMAB), is a stochastic multi-Armed problem where each arm generates a vector of rewards instead of a single scalar reward. The goal of (MOMAB) is to minimize the regret of playing suboptimal arms while playing fairly the Pareto optimal arms. In this paper, we consider Gaussian correlation across arms in (MOMAB), meaning that the generated reward vector of an arm gives us information not only about that arm itself but also on all the available arms. We call this framework the correlated-MOMAB problem. We extended Gittins index policy to correlated (MOMAB) because Gittins index has been used before to model the correlation between arms. We empirically compared Gittins index policy with multi-objective upper confidence bound policy on a test suite of correlated-MOMAB problems. We conclude that the performance of these policies depend on the number of arms and objectives.</p

Pure OAI Repository

Comparing exploration strategies for Q-learning in random stochastic mazes

Author: Drugan M.M.
Tijsma A.
Wiering M.A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/02/2017
Field of study

Balancing the ratio between exploration and exploitation is an important problem in reinforcement learning. This paper evaluates four different exploration strategies combined with Q-learning using random stochastic mazes to investigate their performances. We will compare: UCB-1, softmax, greedy, and pursuit. For this purpose we adapted the UCB-1 and pursuit strategies to be used in the Q-learning algorithm. The mazes consist of a single optimal goal state and two suboptimal goal states that lie closer to the starting position of the agent, which makes efficient exploration an important part of the learning agent. Furthermore, we evaluate two different kinds of reward functions, a normalized one with rewards between 0 and 1, and an unnormalized reward function that penalizes the agent for each step with a negative reward. We have performed an extensive grid-search to find the best parameters for each method and used the best parameters on novel randomly generated maze problems of different sizes. The results show that softmax exploration outperforms the other strategies, although it is harder to tune its temperature parameter. The worst performing exploration strategy is greedy