5,758 research outputs found

    On monte carlo tree search and reinforcement learning

    Get PDF
    Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search

    Accelerating Heuristic Search for AI Planning

    Get PDF
    AI Planning is an important research field. Heuristic search is the most commonly used method in solving planning problems. Despite recent advances in improving the quality of heuristics and devising better search strategies, the high computational cost of heuristic search remains a barrier that severely limits its application to real world problems. In this dissertation, we propose theories, algorithms and systems to accelerate heuristic search for AI planning. We make four major contributions in this dissertation. First, we propose a state-space reduction method called Stratified Planning to accelerate heuristic search. Stratified Planning can be combined with any heuristic search to prune redundant paths in state space, without sacrificing the optimality and completeness of search algorithms. Second, we propose a general theory for partial order reduction in planning. The proposed theory unifies previous reduction algorithms for planning, and ushers in new partial order reduction algorithms that can further accelerate heuristic search by pruning more nodes in state space than previously proposed algorithms. Third, we study the local structure of state space and propose using random walks to accelerate plateau exploration for heuristic search. We also implement two state-of-the-art planners that perform competitively in the Seventh International Planning Competition. Last, we utilize cloud computing to further accelerate search for planning. We propose a portfolio stochastic search algorithm that takes advantage of the cloud. We also implement a cloud-based planning system to which users can submit planning tasks and make full use of the computational resources provided by the cloud. We push the state of the art in AI planning by developing theories and algorithms that can accelerate heuristic search for planning. We implement state-of-the-art planning systems that have strong speed and quality performance

    Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

    Full text link
    Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.Comment: Published on AAAI 202

    Language Models Meet World Models: Embodied Experiences Enhance Language Models

    Full text link
    While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. Our approach deploys an embodied agent in a world model, particularly a simulator of the physical world (VirtualHome), and acquires a diverse set of embodied experiences through both goal-oriented planning and random exploration. These experiences are then used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking, etc. Moreover, it is desirable to preserve the generality of LMs during finetuning, which facilitates generalizing the embodied knowledge across tasks rather than being tied to specific simulations. We thus further introduce the classical (EWC) for selective weight updates, combined with low-rank adapters (LoRA) for training efficiency. Extensive experiments show our approach substantially improves base LMs on 18 downstream tasks by 64.28% on average. In particular, the small LMs (1.3B, 6B, and 13B) enhanced by our approach match or even outperform much larger LMs (e.g., ChatGPT)

    Optimisation of Mobile Communication Networks - OMCO NET

    Get PDF
    The mini conference “Optimisation of Mobile Communication Networks” focuses on advanced methods for search and optimisation applied to wireless communication networks. It is sponsored by Research & Enterprise Fund Southampton Solent University. The conference strives to widen knowledge on advanced search methods capable of optimisation of wireless communications networks. The aim is to provide a forum for exchange of recent knowledge, new ideas and trends in this progressive and challenging area. The conference will popularise new successful approaches on resolving hard tasks such as minimisation of transmit power, cooperative and optimal routing

    Particle filtering in compartmental projection models

    Get PDF
    Simulation models are important tools for real-time forecasting of pandemics. Models help health decision makers examine interventions and secure strong guidance when anticipating outbreak evolution. However, models usually diverge from the real observations. Stochastics involved in pandemic systems, such as changes in human contact patterns play a substantial role in disease transmissions and are not usually captured in traditional dynamic models. In addition, models of emerging diseases face the challenge of limited epidemiological knowledge about the natural history of disease. Even when the information about natural history is available -- for example for endemic seasonal diseases -- transmission models are often simplified and are involved with omissions. Availability of data streams can provide a view of early days of a pandemic, but fail to predict how the pandemic will evolve. Recent developments of computational statistics algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo, provide the possibility of creating models based on historical data as well as re-grounding models based on ongoing data observations. The objective of this thesis is to combine particle filtering -- a Sequential Monte Carlo algorithm -- with system dynamics models of pandemics. We developed particle filtering models that can recurrently be re-grounded as new observations become available. To this end, we also examined the effectiveness of this arrangement which is subject to specifics of the configuration (e.g., frequency of data sampling). While clinically-diagnosed cases are valuable incoming data stream during an outbreak, new generation of geo-spatially specific data sources, such as search volumes can work as a complementary data resource to clinical data. As another contribution, we used particle filtering in a model which can be re-grounded based on both clinical and search volume data. Our results indicate that the particle filtering in combination with compartmental models provides accurate projection systems for the estimation of model states and also model parameters (particularly compared to traditional calibration methodologies and in the context of emerging communicable diseases). The results also suggest that more frequent sampling from clinical data improves predictive accuracy outstandingly. The results also present that assumptions to make regarding the parameters associated with the particle filtering itself and changes in contact rate were robust across adequacy of empirical data since the beginning of the outbreak and inter-observation interval. The results also support the use of data from Google search API along with clinical data

    Quantum-enhanced reinforcement learning

    Get PDF
    Dissertação de mestrado em Engenharia FísicaThe field of Artificial Intelligence has lately witnessed extraordinary results. The ability to design a system capable of beating the world champion of Go, an ancient Chinese game known as the holy grail of AI, caused a spark worldwide, making people believe that some thing revolutionary is about to happen. A different flavor of learning called Reinforcement Learning is at the core of this revolution. In parallel, we are witnessing the emergence of a new field, that of Quantum Machine Learning which has already shown promising results in supervised/unsupervised learning. In this dissertation, we reach for the interplay between Quantum Computing and Reinforcement Learning. This learning by interaction was made possible in the quantum setting using the con cept of oraculization of task environments suggested by Dunjko in 2015. In this dissertation, we extended the oracular instances previously suggested to work in more general stochastic environments. On top of this quantum agent-environment paradigm we developed a novel quantum algorithm for near-optimal decision-making based on the Reinforcement Learn ing paradigm known as Sparse Sampling, obtaining a quantum speedup compared to the classical counterpart. The achievement was a quantum algorithm that exhibits a complexity independent on the number of states of the environment. This independence guarantees its suitability for dealing with large state spaces where planning may be inapplicable. The most important open questions remain whether it is possible to improve the orac ular instances of task environments to deal with even more general environments, especially the ability to represent negative rewards as a natural mechanism for negative feedback instead of some normalization of the reward and the extension of the algorithm to perform an informed tree-based search instead of the uninformed search proposed. Improvements on this result would allow the comparison between the algorithm and more recent classical Reinforcement Learning algorithms.O campo da Inteligência Artificial tem tido resultados extraordinários ultimamente, a capacidade de projetar um sistema capaz de vencer o campeão mundial de Go, um antigo jogo de origem Chinesa, conhecido como o santo graal da IA, causou uma faísca em todo o mundo, fazendo as pessoas acreditarem em que algo revolucionário estar a para acontecer. Um tipo diferente de aprendizagem, chamada Aprendizagem por Reforço está no cerne dessa revolução. Em paralelo surge também um novo campo, o da Aprendizagem Máquina Quântica, que já vem apresentando resultados promissores na aprendizagem supervisionada/não, supervisionada. Nesta dissertação, procuramos invés a interação entre Computação Quântica e a Aprendizagem por Reforço. Esta interação entre agente e Ambiente foi possível no cenário quântico usando o conceito de oraculização de ambientes sugerido por Dunjko em 2015. Neste trabalho, estendemos as instâncias oraculares sugeridas anteriormente para trabalhar em ambientes estocásticos generalizados. Tendo em conta este paradigma quântico agente-ambiente, desenvolvemos um novo algoritmo quântico para tomada de decisão aproximadamente ótima com base no paradigma da Aprendizagem por Reforço conhecido como Amostragem Esparsa, obtendo uma aceleração quântica em comparação com o caso clássico que possibilitou a obtenção de um algoritmo quântico que exibe uma complexidade independente do número de estados do ambiente. Esta independência garante a sua adaptação para ambientes com um grande espaço de estados em que o planeamento pode ser intratável. As questões mais pertinentes que se colocam é se é possível melhorar as instâncias oraculares de ambientes para lidar com ambientes ainda mais gerais, especialmente a capacidade de exprimir recompensas negativas como um mecanismo natural para feedback negativo em vez de alguma normalização da recompensa. Além disso, a extensão do algoritmo para realizar uma procura em árvore informada ao invés da procura não informada proposta. Melhorias neste resultado permitiriam a comparação entre o algoritmo quântico e os algoritmos clássicos mais recentes da Aprendizagem por Reforço

    Solving planning problems with deep reinforcement learning and tree search

    Get PDF
    Deep reinforcement learning methods are capable of learning complex heuristics starting with no prior knowledge, but struggle in environments where the learning signal is sparse. In contrast, planning methods can discover the optimal path to a goal in the absence of external rewards, but often require a hand-crafted heuristic function to be effective. In this thesis, we describe a model-based reinforcement learning method that bridges the middle ground between these two approaches. When evaluated on the complex domain of Sokoban, the model-based method was found to be more performant, stable and sample-efficient than a model-free baseline
    corecore