34 research outputs found

    Opponent modelling in the game of tron using reinforcement learning

    Get PDF
    In this paper we propose the use of vision grids as state representation to learn to play the game Tron using neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to predict the opponent’s next move. The learned model of the opponent is subsequently used in Monte-Carlo roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a certain action. Finally, we compare the performance using two different activation functions in the multi-layer perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly increase learning speed and in most cases this also increases the agent’s performance compared to when the full grid is used as state representation. Finally, the opponent modelling technique allows the agent to learn a predictive model of the opponent’s actions, which in combination with Monte-Carlo roll-outs significantly increases the agent’s performance

    Local Optimal Sets and Bounded Archiving on Multi-objective NK-Landscapes with Correlated Objectives

    Get PDF
    The properties of local optimal solutions in multi-objective combinatorial optimization problems are crucial for the effectiveness of local search algorithms, particularly when these algorithms are based on Pareto dominance. Such local search algorithms typically return a set of mutually nondominated Pareto local optimal (PLO) solutions, that is, a PLO-set. This paper investigates two aspects of PLO-sets by means of experiments with Pareto local search (PLS). First, we examine the impact of several problem characteristics on the properties of PLO-sets for multi-objective NK-landscapes with correlated objectives. In particular, we report that either increasing the number of objectives or decreasing the correlation between objectives leads to an exponential increment on the size of PLO-sets, whereas the variable correlation has only a minor effect. Second, we study the running time and the quality reached when using bounding archiving methods to limit the size of the archive handled by PLS, and thus, the maximum size of the PLO-set found. We argue that there is a clear relationship between the running time of PLS and the difficulty of a problem instance.Comment: appears in Parallel Problem Solving from Nature - PPSN XIII, Ljubljana : Slovenia (2014

    Local Optimal Sets and Bounded Archiving on Multi-objective NK-Landscapes with Correlated Objectives

    Get PDF
    The properties of local optimal solutions in multi-objective combinatorial optimization problems are crucial for the effectiveness of local search algorithms, particularly when these algorithms are based on Pareto dominance. Such local search algorithms typically return a set of mutually nondominated Pareto local optimal (PLO) solutions, that is, a PLO-set. This paper investigates two aspects of PLO-sets by means of experiments with Pareto local search (PLS). First, we examine the impact of several problem characteristics on the properties of PLO-sets for multi-objective NK-landscapes with correlated objectives. In particular, we report that either increasing the number of objectives or decreasing the correlation between objectives leads to an exponential increment on the size of PLO-sets, whereas the variable correlation has only a minor effect. Second, we study the running time and the quality reached when using bounding archiving methods to limit the size of the archive handled by PLS, and thus, the maximum size of the PLO-set found. We argue that there is a clear relationship between the running time of PLS and the difficulty of a problem instance.Comment: appears in Parallel Problem Solving from Nature - PPSN XIII, Ljubljana : Slovenia (2014

    A Bayesian model for anomaly detection in SQL databases for security systems

    Get PDF
    We focus on automatic anomaly detection in SQL databases for security systems. Many logs of database systems, here the Townhall database, contain detailed information about users, like the SQL queries and the response of the database. A database is a list of log instances, where each log instance is a Cartesian product of feature values with an attached anomaly score. All log instances with the anomaly score in the top percentile are identified as anomalous. Our contribution is multi-folded. We define a model for anomaly detection of SQL databases that learns the structure of Bayesian networks from data. Our method for automatic feature extraction generates the maximal spanning tree to detect the strongest similarities between features. Novel anomaly scores based on the joint probability distribution of the database features and the log-likelihood of the maximal spanning tree detect both point and contextual anomalies. Multiple anomaly scores are combined within a robust anomaly analysis algorithm. We validate our method on the Townhall database showing the performance of our anomaly detection algorithm

    Synergies between evolutionary algorithms and reinforcement learning

    No full text
    A recent trend in evolutionary algorithms (EAs) transfers expertise from and to other areas of machine learning. An interesting novel symbiosis considers: i) reinforcement learning (RL), which learns on-line and off-line difficult dynamic elaborated tasks requiring lots of computational resources, and ii) EAs with the main strength its eloquence and computational efficiency. These two techniques address the same problem of reward maximization in difficult environments that can include stochasticity. Sometimes, they exchange techniques in order to improve their theoretical and empirical efficiency, like computational speed for on-line learning, and robust behaviour for the off-line optimisation algorithms. For example, multi-objective RL uses tuples of rewards instead of a single reward value and techniques from multi-objective EAs should be integrated for an efficient exploration/exploitation trade-off. The problem of selecting the best genetic operator is similar to the problem an agent faces when choosing between alternatives in achieving its goal of maximising its cumulative expected reward. Practical approaches select the RL method that solve the best online operator selection problem.</p

    Generating QAP instances with known optimum solution and additively decomposable cost function

    Get PDF
    Quadratic assignment problems (QAPs) is a NP-hard combinatorial optimization problem. QAPs are often used to compare the performance of meta-heuristics. In this paper, we propose a QAP problem instance generator that can be used for benchmarking for heuristic algorithms. Our QAP generator combines small size QAPs with known optimum solution into a larger size QAP instance. We call these instances composite QAPs (cQAPs), and we show that the cost function of cQAPs is additively decomposable. We give mild conditions for which a cQAP instance has known optimum solution. We generate cQAP instances using uniform distributions with different bounds for the component QAPs and for the rest of the cQAP elements. Numerical and analytical techniques that measure the difficulty of the cQAP instances in comparison with other QAPs from the literature are introduced. These methods point out that some cQAP instances are difficult for local search with many local optimum of various values, low epistasis and non-trivial asymptotic behaviour

    Approximative Pareto front identification

    No full text

    Stochastic pareto local search for many objective quadratic assignment problem instances

    No full text
    Optimising in many-objective search spaces, i.e. search spaces with more than three objectives, is a challenging task. Scalarization functions transform the multi-objective search space into a single objective search space. In order to scale-up optimisation in many-objective search spaces, we use Cartesian product of scalarization functions or simpler product functions to reduce the number of objectives of the search space. Stochastic product local search (SprLS) uses product functions to evaluate solutions within a local search run with the goal of generating the entire Pareto front. To improve the performance of SprLS algorithms: 1) we pursuit a fixed set the product function that improves the most the performance of the algorithm, or 2) we adapt the direction of the component scalarization functions using solutions in the current (possible suboptimal) Pareto front. We compare the performance of local search algorithms on many-objective quadratic assignment instances with correlated flow matrices

    Correlated Gaussian multi-objective multi-armed bandit across arms algorithm

    No full text
    Stochastic multi-objective multi-Armed bandit problem, (MOMAB), is a stochastic multi-Armed problem where each arm generates a vector of rewards instead of a single scalar reward. The goal of (MOMAB) is to minimize the regret of playing suboptimal arms while playing fairly the Pareto optimal arms. In this paper, we consider Gaussian correlation across arms in (MOMAB), meaning that the generated reward vector of an arm gives us information not only about that arm itself but also on all the available arms. We call this framework the correlated-MOMAB problem. We extended Gittins index policy to correlated (MOMAB) because Gittins index has been used before to model the correlation between arms. We empirically compared Gittins index policy with multi-objective upper confidence bound policy on a test suite of correlated-MOMAB problems. We conclude that the performance of these policies depend on the number of arms and objectives.</p

    Comparing exploration strategies for Q-learning in random stochastic mazes

    No full text
    Balancing the ratio between exploration and exploitation is an important problem in reinforcement learning. This paper evaluates four different exploration strategies combined with Q-learning using random stochastic mazes to investigate their performances. We will compare: UCB-1, softmax, greedy, and pursuit. For this purpose we adapted the UCB-1 and pursuit strategies to be used in the Q-learning algorithm. The mazes consist of a single optimal goal state and two suboptimal goal states that lie closer to the starting position of the agent, which makes efficient exploration an important part of the learning agent. Furthermore, we evaluate two different kinds of reward functions, a normalized one with rewards between 0 and 1, and an unnormalized reward function that penalizes the agent for each step with a negative reward. We have performed an extensive grid-search to find the best parameters for each method and used the best parameters on novel randomly generated maze problems of different sizes. The results show that softmax exploration outperforms the other strategies, although it is harder to tune its temperature parameter. The worst performing exploration strategy is greedy
    corecore