216 research outputs found

    Automatic discovery of ranking formulas for playing with multi-armed bandits

    Full text link
    We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for example addition, subtraction, the max operator, the average values of rewards collected by an arm, their standard deviation etc., and by exploiting this grammar to generate and test a large number of formulas. The systematic search for good candidate formulas is carried out by a built-on-purpose optimization algorithm used to navigate inside this large set of candidate formulas towards those that give high performances when using them on some multi-armed bandit problems. We have applied this approach on a set of bandit problems made of Bernoulli, Gaussian and truncated Gaussian distributions and have identi ed a few simple ranking formulas that provide interesting results on every problem of this set. In particular, they clearly outperform several reference policies previously introduced in the literature. We argue that these newly found formulas as well as the procedure for generating them may suggest new directions for studying bandit problems.Peer reviewe

    Contributions to Monte Carlo Search

    Full text link
    This research is motivated by improving decision making under uncertainty and in particular for games and symbolic regression. The present dissertation gathers research contributions in the field of Monte Carlo Search. These contributions are focused around the selection, the simulation and the recommendation policies. Moreover, we develop a methodology to automatically generate an MCS algorithm for a given problem. For the selection policy, in most of the bandit literature, it is assumed that there is no structure or similarities between arms. Thus each arm is independent from one another. In several instances however, arms can be closely related. We show both theoretically and empirically, that a significant improvement over the state-of-the-art selection policies is possible. For the contribution on simulation policy, we focus on the symbolic regression problem and ponder on how to consistently generate different expressions by changing the probability to draw each symbol. We formalize the situation into an optimization problem and try different approaches. We show a clear improvement in the sampling process for any length. We further test the best approach by embedding it into a MCS algorithm and it still shows an improvement. For the contribution on recommendation policy, we study the most common in combination with selection policies. A good recommendation policy is a policy that works well with a given selection policy. We show that there is a trend that seems to favor a robust recommendation policy over a riskier one. We also present a contribution where we automatically generate several MCS algorithms from a list of core components upon which most MCS algorithms are built upon and compare them to generic algorithms. The results show that it often enables discovering new variants of MCS that significantly outperform generic MCS algorithms

    Supply Side Optimisation in Online Display Advertising

    Get PDF
    On the Internet there are publishers (the supply side) who provide free contents (e.g., news) and services (e.g., email) to attract users. Publishers get paid by selling ad displaying opportunities (i.e., impressions) to advertisers. Advertisers then sell products to users who are converted by ads. Better supply side revenue allows more free content and services to be created, thus, benefiting the entire online advertising ecosystem. This thesis addresses several optimisation problems for the supply side. When a publisher creates an ad-supported website, he needs to decide the percentage of ads first. The thesis reports a large-scale empirical study of Internet ad density over past seven years, then presents a model that includes many factors, especially the competition among similar publishers, and gives an optimal dynamic ad density that generates the maximum revenue over time. This study also unveils the tragedy of the commons in online advertising where users' attention has been overgrazed which results in a global sub-optimum. After deciding the ad density, the publisher retrieves ads from various sources, including contracts, ad networks, and ad exchanges. This forms an exploration-exploitation problem when ad sources are typically unknown before trail. This problem is modelled using Partially Observable Markov Decision Process (POMDP), and the exploration efficiency is increased by utilising the correlation of ads. The proposed method reports 23.4% better than the best performing baseline in the real-world data based experiments. Since some ad networks allow (or expect) an input of keywords, the thesis also presents an adaptive keyword extraction system using BM25F algorithm and the multi-armed bandits model. This system has been tested by a domain service provider in crowdsourcing based experiments. If the publisher selects a Real-Time Bidding (RTB) ad source, he can use reserve price to manipulate auctions for better payoff. This thesis proposes a simplified game model that considers the competition between seller and buyer to be one-shot instead of repeated and gives heuristics that can be easily implemented. The model has been evaluated in a production environment and reported 12.3% average increase of revenue. The documentation of a prototype system for reserve price optimisation is also presented in the appendix of the thesis

    Machine Learning for SAT Solvers

    Get PDF
    Boolean SAT solvers are indispensable tools in a variety of domains in computer science and engineering where efficient search is required. Not only does this relieve the burden on the users of implementing their own search algorithm, they also leverage the surprising effectiveness of modern SAT solvers. Thanks to many decades of cumulative effort, researchers have made persistent improvements to SAT technology to the point where nowadays the best solvers are routinely used to solve extremely large instances with millions of variables. Even though our current paradigm of SAT solvers runs in worst-case exponential time, it appears that the techniques and heuristics embedded in these solvers avert the worst-case exponential time in practice. The implementations of these various solver heuristics and techniques are vital to the solvers effectiveness in practice. The state-of-the-art heuristics and techniques gather data during the run of the solver to inform their choices like which variable to branch on next or when to invoke a restart. The goal of these choices is to minimize the solving time. The methods in which these heuristics and techniques process the data generally do not have theoretical underpinnings. Consequently, understanding why these heuristics and techniques perform so well in practice remains a challenge and systematically improving them is rather difficult. This goes to the heart of this thesis, that is to utilize machine learning to process the data as part of an optimization problem to minimize solving time. Research in machine learning exploded over the past decade due to its success in extracting useful information out of large volumes of data. Machine learning outclasses manual handcoding in a wide variety of complex tasks where data are plentiful. This is also the case in modern SAT solvers where propagations, conflict analysis, and clause learning produces plentiful of data to be analyzed, and exploiting this data to the fullest is naturally where machine learning comes in. Many machine learning techniques have a theoretical basis that makes them easy to analyze and understand why they perform well. The branching heuristic is the first target for injecting machine learning. First we studied extant branching heuristics to understand what makes a branching heuristics good empirically. The fundamental observation is that good branching heuristics cause lots of clause learning by triggering conflicts as quickly as possible. This suggests that variables that cause conflicts are a valuable source of data. Another important observation is that the state-of-the-art VSIDS branching heuristic internally implements an exponential moving average. This highlights the importance of accounting for the temporal nature of the data when deciding to branch. These observations led to our proposal of a series of machine learning-based branching heuristics with the common goal of selecting the branching variables to increase probability of inducing conflicts. These branching heuristics are shown empirically to either be on par or outcompete the current state-of-the art. The second area of interest for machine learning is the restart policy. Just like in the branching heuristic work, we first study restarts to observe why they are effective in practice. The important observation here is that restarts shrink the assignment stack as conjectured by other researchers. We show that this leads to better clause learning by lowering the LBD of learnt clauses. Machine learning is used to predict the LBD of the next clause, and a restart is triggered when the LBD is excessively high. This policy is shown to be on par with state-of-the-art. The success of incorporating machine learning into branching and restarts goes to show that machine learning has an important role in the future of heuristic and technique design for SAT solvers

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Dynamic Difficulty Adjustment

    Get PDF
    One of the challenges that a computer game developer faces when creating a new game is getting the difficulty right. Providing a game with an ability to automatically scale the difficulty depending on the current player would make the games more engaging over longer time. In this work we aim at a dynamic difficulty adjustment algorithm that can be used as a black box: universal, nonintrusive, and with guarantees on its performance. While there are a few commercial games that boast about having such a system, as well as a few published results on this topic, to the best of our knowledge none of them satisfy all three of these properties. On the way to our destination we first consider a game as an interaction between a player and her opponent. In this context, assuming their goals are mutually exclusive, difficulty adjustment consists of tuning the skill of the opponent to match the skill of the player. We propose a way to estimate the latter and adjust the former based on ranking the moves available to each player. Two sets of empirical experiments demonstrate the power, but also the limitations of this approach. Most importantly, the assumptions we make restrict the class of games it can be applied to. Looking for universality, we drop the constraints on the types of games we consider. We rely on the power of supervised learning and use the data collected from game testers to learn models of difficulty adjustment, as well as a mapping from game traces to models. Given a short game trace, the corresponding model tells the game what difficulty adjustment should be used. Using a self-developed game, we show that the predicted adjustments match players' preferences. The quality of the difficulty models depends on the quality of existing training data. The desire to dispense with the need for it leads us to the last approach. We propose a formalization of dynamic difficulty adjustment as a novel learning problem in the context of online learning and provide an algorithm to solve it, together with an upper bound on its performance. We show empirical results obtained in simulation and in two qualitatively different games with human participants. Due to its general nature, this algorithm can indeed be used as a black box for dynamic difficulty adjustment: It is applicable to any game with various difficulty states; it does not interfere with the player's experience; and it has a theoretical guarantee on how many mistakes it can possibly make

    Systems for AutoML Research

    Get PDF
    corecore