5,758 research outputs found
On monte carlo tree search and reinforcement learning
Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread
adoption within the games community. Its links to traditional reinforcement learning (RL)
methods have been outlined in the past; however, the use of RL techniques within tree search has
not been thoroughly studied yet. In this paper we re-examine in depth this close relation between
the two fields; our goal is to improve the cross-awareness between the two communities. We show
that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new
algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning
methods inspired by RL in conjunction with online search demonstrate encouraging results on
several classic board games and in arcade video game competitions, where our algorithm recently
ranked first. Our study promotes a unified view of learning, planning, and search
Accelerating Heuristic Search for AI Planning
AI Planning is an important research field. Heuristic search is the most commonly used method in solving planning problems. Despite recent advances in improving the quality of heuristics and devising better search strategies, the high computational cost of heuristic search remains a barrier that severely limits its application to real world problems. In this dissertation, we propose theories, algorithms and systems to accelerate heuristic search for AI planning.
We make four major contributions in this dissertation. First, we propose a state-space reduction method called Stratified Planning to accelerate heuristic search. Stratified Planning can be combined with any heuristic search to prune redundant paths in state space, without sacrificing the optimality and completeness of search algorithms.
Second, we propose a general theory for partial order reduction in planning. The proposed theory unifies previous reduction algorithms for planning, and ushers in new partial order reduction algorithms that can further accelerate heuristic search by pruning more nodes in state space than previously proposed algorithms.
Third, we study the local structure of state space and propose using random walks to accelerate plateau exploration for heuristic search. We also implement two state-of-the-art planners that perform competitively in the Seventh International Planning Competition.
Last, we utilize cloud computing to further accelerate search for planning. We propose a portfolio stochastic search algorithm that takes advantage of the cloud. We also implement a cloud-based planning system to which users can submit planning tasks and make full use of the computational resources provided by the cloud.
We push the state of the art in AI planning by developing theories and algorithms that can accelerate heuristic search for planning. We implement state-of-the-art planning systems that have strong speed and quality performance
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Markov decision processes (MDPs) are the defacto frame-work for sequential
decision making in the presence ofstochastic uncertainty. A classical
optimization criterion forMDPs is to maximize the expected discounted-sum
pay-off, which ignores low probability catastrophic events withhighly negative
impact on the system. On the other hand,risk-averse policies require the
probability of undesirableevents to be below a given threshold, but they do not
accountfor optimization of the expected payoff. We consider MDPswith
discounted-sum payoff with failure states which repre-sent catastrophic
outcomes. The objective of risk-constrainedplanning is to maximize the expected
discounted-sum payoffamong risk-averse policies that ensure the probability to
en-counter a failure state is below a desired threshold. Our maincontribution
is an efficient risk-constrained planning algo-rithm that combines UCT-like
search with a predictor learnedthrough interaction with the MDP (in the style
of AlphaZero)and with a risk-constrained action selection via linear
pro-gramming. We demonstrate the effectiveness of our approachwith experiments
on classical MDPs from the literature, in-cluding benchmarks with an order of
10^6 states.Comment: Published on AAAI 202
Language Models Meet World Models: Embodied Experiences Enhance Language Models
While large language models (LMs) have shown remarkable capabilities across
numerous tasks, they often struggle with simple reasoning and planning in
physical environments, such as understanding object permanence or planning
household activities. The limitation arises from the fact that LMs are trained
only on written text and miss essential embodied knowledge and skills. In this
paper, we propose a new paradigm of enhancing LMs by finetuning them with world
models, to gain diverse embodied knowledge while retaining their general
language capabilities. Our approach deploys an embodied agent in a world model,
particularly a simulator of the physical world (VirtualHome), and acquires a
diverse set of embodied experiences through both goal-oriented planning and
random exploration. These experiences are then used to finetune LMs to teach
diverse abilities of reasoning and acting in the physical world, e.g., planning
and completing goals, object permanence and tracking, etc. Moreover, it is
desirable to preserve the generality of LMs during finetuning, which
facilitates generalizing the embodied knowledge across tasks rather than being
tied to specific simulations. We thus further introduce the classical (EWC) for
selective weight updates, combined with low-rank adapters (LoRA) for training
efficiency. Extensive experiments show our approach substantially improves base
LMs on 18 downstream tasks by 64.28% on average. In particular, the small LMs
(1.3B, 6B, and 13B) enhanced by our approach match or even outperform much
larger LMs (e.g., ChatGPT)
Optimisation of Mobile Communication Networks - OMCO NET
The mini conference “Optimisation of Mobile Communication Networks” focuses on advanced methods for search and optimisation applied to wireless communication networks. It is sponsored by Research & Enterprise Fund Southampton Solent University.
The conference strives to widen knowledge on advanced search methods capable of optimisation of wireless communications networks. The aim is to provide a forum for exchange of recent knowledge, new ideas and trends in this progressive and challenging area. The conference will popularise new successful approaches on resolving hard tasks such as minimisation of transmit power, cooperative and optimal routing
Particle filtering in compartmental projection models
Simulation models are important tools for real-time forecasting of pandemics. Models help health decision makers examine interventions and secure strong guidance when anticipating outbreak evolution. However, models usually diverge from the real observations. Stochastics involved in pandemic systems, such as changes in human contact patterns play a substantial role in disease transmissions and are not usually captured in traditional dynamic models. In addition, models of emerging diseases face the challenge of limited epidemiological knowledge about the natural history of disease. Even when the information about natural history is available -- for example for endemic seasonal diseases -- transmission models are often simplified and are involved with omissions. Availability of data streams can provide a view of early days of a pandemic, but fail to predict how the pandemic will evolve. Recent developments of computational statistics algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo, provide the possibility of creating models based on historical data as well as re-grounding models based on ongoing data observations. The objective of this thesis is to combine particle filtering -- a Sequential Monte Carlo algorithm -- with system dynamics models of pandemics. We developed particle filtering models that can recurrently be re-grounded as new observations become available. To this end, we also examined the effectiveness of this arrangement which is subject to specifics of the configuration (e.g., frequency of data sampling). While clinically-diagnosed cases are valuable incoming data stream during an outbreak, new generation of geo-spatially specific data sources, such as search volumes can work as a complementary data resource to clinical data. As another contribution, we used particle filtering in a model which can be re-grounded based on both clinical and search volume data. Our results indicate that the particle filtering in combination with compartmental models provides accurate projection systems for the estimation of model states and also model parameters (particularly compared to traditional calibration methodologies and in the context of emerging communicable diseases). The results also suggest that more frequent sampling from clinical data improves predictive accuracy outstandingly. The results also present that assumptions to make regarding the parameters associated with the particle filtering itself and changes in contact rate were robust across adequacy of empirical data since the beginning of the outbreak and inter-observation interval. The results also support the use of data from Google search API along with clinical data
Quantum-enhanced reinforcement learning
Dissertação de mestrado em Engenharia FísicaThe field of Artificial Intelligence has lately witnessed extraordinary results. The ability to
design a system capable of beating the world champion of Go, an ancient Chinese game
known as the holy grail of AI, caused a spark worldwide, making people believe that some thing revolutionary is about to happen. A different flavor of learning called Reinforcement
Learning is at the core of this revolution. In parallel, we are witnessing the emergence of a
new field, that of Quantum Machine Learning which has already shown promising results in
supervised/unsupervised learning. In this dissertation, we reach for the interplay between
Quantum Computing and Reinforcement Learning.
This learning by interaction was made possible in the quantum setting using the con cept of oraculization of task environments suggested by Dunjko in 2015. In this dissertation,
we extended the oracular instances previously suggested to work in more general stochastic
environments. On top of this quantum agent-environment paradigm we developed a novel
quantum algorithm for near-optimal decision-making based on the Reinforcement Learn ing paradigm known as Sparse Sampling, obtaining a quantum speedup compared to the
classical counterpart. The achievement was a quantum algorithm that exhibits a complexity
independent on the number of states of the environment. This independence guarantees its
suitability for dealing with large state spaces where planning may be inapplicable.
The most important open questions remain whether it is possible to improve the orac ular instances of task environments to deal with even more general environments, especially
the ability to represent negative rewards as a natural mechanism for negative feedback
instead of some normalization of the reward and the extension of the algorithm to perform
an informed tree-based search instead of the uninformed search proposed. Improvements
on this result would allow the comparison between the algorithm and more recent classical
Reinforcement Learning algorithms.O campo da Inteligência Artificial tem tido resultados extraordinários ultimamente, a capacidade de projetar um sistema capaz de vencer o campeão mundial de Go, um antigo jogo de origem Chinesa, conhecido como o santo graal da IA, causou uma faísca em todo o mundo, fazendo as pessoas acreditarem em que algo revolucionário estar a para acontecer. Um tipo diferente de aprendizagem, chamada Aprendizagem por Reforço está no cerne dessa revolução. Em paralelo surge também um novo campo, o da Aprendizagem Máquina Quântica, que já vem apresentando resultados promissores na aprendizagem supervisionada/não, supervisionada. Nesta dissertação, procuramos invés a interação entre Computação Quântica e a Aprendizagem por Reforço.
Esta interação entre agente e Ambiente foi possível no cenário quântico usando o conceito de oraculização de ambientes sugerido por Dunjko em 2015. Neste trabalho, estendemos as instâncias oraculares sugeridas anteriormente para trabalhar em ambientes estocásticos generalizados. Tendo em conta este paradigma quântico agente-ambiente, desenvolvemos um novo algoritmo quântico para tomada de decisão aproximadamente ótima com base no paradigma da Aprendizagem por Reforço conhecido como Amostragem Esparsa, obtendo uma aceleração quântica em comparação com o caso clássico que possibilitou a obtenção de um algoritmo quântico que exibe uma complexidade independente do número de estados do ambiente. Esta independência garante a sua adaptação para ambientes com um grande espaço de estados em que o planeamento pode ser intratável.
As questões mais pertinentes que se colocam é se é possível melhorar as instâncias oraculares de ambientes para lidar com ambientes ainda mais gerais, especialmente a capacidade de exprimir recompensas negativas como um mecanismo natural para feedback negativo em vez de alguma normalização da recompensa. Além disso, a extensão do algoritmo para realizar uma procura em árvore informada ao invés da procura não informada proposta. Melhorias neste resultado permitiriam a comparação entre o algoritmo quântico e os algoritmos clássicos mais recentes da Aprendizagem por Reforço
Solving planning problems with deep reinforcement learning and tree search
Deep reinforcement learning methods are capable of learning complex heuristics starting with no prior knowledge, but struggle in environments where the learning signal is sparse. In contrast, planning methods can discover the optimal path to a goal in the absence of external rewards, but often require a hand-crafted heuristic function to be effective. In this thesis, we describe a model-based reinforcement learning method that bridges the middle ground between these two approaches. When evaluated on the complex domain of Sokoban, the model-based method was found to be more performant, stable and sample-efficient than a model-free baseline
- …