672 research outputs found

    Nesting optimization with adversarial games, meta-learning, and deep equilibrium models

    Get PDF
    Nested optimization, whereby an optimization problem is constrained by the solutions of other optimization problems, has recently seen a surge in its application to Deep Learning. While the study of such problems started nearly a century ago in the context of market theory, many of the algorithms developed since do not scale to modern Deep Learning applications. In this thesis, I push the understanding and applicability of nested optimization to three machine learning domains: 1) adversarial games, 2) meta-learning and 3) deep equilibrium models. For each domain, I tackle a particular goal. In 1) I adversarially learn model compression, in the case where training data isn't available, in 2) I meta-learn hyperparameters for long optimization processes without introducing greediness, and in 3) I use deep equilibrium models to improve temporal coherence in video landmark detection. The first part of my thesis deals with casting model compression as an adversarial game. Performing knowledge transfer from a large teacher network to a smaller student is a popular task in deep learning. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. I propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. This is achieved by nesting the training optimization of the student with that of an adversarial generator, which searches for images on which the student poorly matches the teacher. These images are used to train the student in an online fashion. The student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 I improve on the state-of-the-art for few-shot distillation (with 100100 images per class), despite using no data. Finally, I also propose a metric to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries, and observe a significantly higher match between the zero-shot student and the teacher, than between a student distilled with real data and the teacher. The second part of my thesis deals with meta-learning hyperparameters in the case when the nested optimization to be differentiated is itself solved by many gradient steps. Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. I propose forward-mode differentiation with sharing (FDS), a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation, and gradient degradation issues by sharing hyperparameters that are contiguous in time. I provide theoretical guarantees about the noise reduction properties of my algorithm, and demonstrate its efficiency empirically by differentiating through ∼104\sim 10^4 gradient steps of unrolled optimization. I consider large hyperparameter search ranges on CIFAR-10 where I significantly outperform greedy gradient-based alternatives, while achieving ×20\times 20 speedups compared to the state-of-the-art black-box methods. The third part of my thesis deals with converting deep equilibrium models to a form of nested optimization in order to perform robust video landmark detection. Cascaded computation, whereby predictions are recurrently refined over several stages, has been a persistent theme throughout the development of landmark detection models. I show that the recently proposed deep equilibrium model (DEQ) can be naturally adapted to this form of computation, given appropriate regularization. My landmark model achieves state-of-the-art performance on the challenging WFLW facial landmark dataset, reaching 3.923.92 normalized mean error with fewer parameters and a training memory cost of O(1)\mathcal{O}(1) in the number of recurrent modules. Furthermore, I show that DEQs are particularly suited for landmark detection in videos. In this setting, it is typical to train on still images due to the lack of labeled videos. This can lead to a ``flickering'' effect at inference time on video, whereby a model can rapidly oscillate between different plausible solutions across consecutive frames. I show that the DEQ root solving problem can be turned into a constrained optimization problem in a way that emulates recurrence at inference time, despite not having access to temporal data at training time. I call this "Recurrence without Recurrence'', and demonstrate that it helps reduce landmark flicker by introducing a new metric, and contributing a new facial landmark video dataset targeting landmark uncertainty. On the hard subset of this new dataset, made up of 500500 videos, my model improves the accuracy and temporal coherence by 1010 and 13%13\% respectively, compared to the strongest previously published model using a hand-tuned conventional filter

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Simple Regret Optimization in Online Planning for Markov Decision Processes

    Full text link
    We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. The performance of algorithms for online planning is assessed in terms of simple regret, which is the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate reduction of simple regret and error probability. This algorithm is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. Our empirical evaluation shows that BRUE not only provides superior performance guarantees, but is also very effective in practice and favorably compares to state-of-the-art. We then extend BRUE with a variant of "learning by forgetting." The resulting set of algorithms, BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper bound on its reduction rate, and exhibits even more attractive empirical performance

    Generating Diverse and Competitive Play-Styles for Strategy Games

    Get PDF
    Designing agents that are able to achieve different play-styles while maintaining a competitive level of play is a difficult task, especially for games for which the research community has not found super-human performance yet, like strategy games. These require the AI to deal with large action spaces, long-term planning and partial observability, among other well-known factors that make decision-making a hard problem. On top of this, achieving distinct play-styles using a general algorithm without reducing playing strength is not trivial. In this paper, we propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes) and show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play. Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training

    Planning spatial networks with Monte Carlo tree search

    Get PDF
    We tackle the problem of goal-directed graph construction: given a starting graph, finding a set of edges whose addition maximally improves a global objective function. This problem emerges in many transportation and infrastructure networks that are of critical importance to society. We identify two significant shortcomings of present reinforcement learning methods: their exclusive focus on topology to the detriment of spatial characteristics (which are known to influence the growth and density of links), as well as the rapid growth in the action spaces and costs of model training. Our formulation as a deterministic Markov decision process allows us to adopt the Monte Carlo tree search framework, an artificial intelligence decision-time planning method. We propose improvements over the standard upper confidence bounds for trees (UCT) algorithm for this family of problems that addresses their single-agent nature, the trade-off between the cost of edges and their contribution to the objective, and an action space linear in the number of nodes. Our approach yields substantial improvements over UCT for increasing the efficiency and attack resilience of synthetic networks and real-world Internet backbone and metro systems, while using a wall clock time budget similar to other search-based algorithms. We also demonstrate that our approach scales to significantly larger networks than previous reinforcement learning methods, since it does not require training a model
    • …
    corecore