Search CORE

198 research outputs found

Delay Sensitive Communications over Cognitive Radio Networks

Author: Huang Jianwei
Wang Feng
Zhao Yuping
Publication venue
Publication date: 01/01/2012
Field of study

Supporting the quality of service of unlicensed users in cognitive radio networks is very challenging, mainly due to dynamic resource availability because of the licensed users' activities. In this paper, we study the optimal admission control and channel allocation decisions in cognitive overlay networks in order to support delay sensitive communications of unlicensed users. We formulate it as a Markov decision process problem, and solve it by transforming the original formulation into a stochastic shortest path problem. We then propose a simple heuristic control policy, which includes a threshold-based admission control scheme and and a largest-delay-first channel allocation scheme, and prove the optimality of the largest-delay-first channel allocation scheme. We further propose an improved policy using the rollout algorithm. By comparing the performance of both proposed policies with the upper-bound of the maximum revenue, we show that our policies achieve close-to-optimal performance with low complexities.Comment: 11 pages, 8 figure

arXiv.org e-Print Archive

A Minimum Relative Entropy Principle for Learning and Acting

Author: Braun Daniel A.
Ortega Pedro A.
Publication venue
Publication date: 10/04/2010
Field of study

This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.Comment: 36 pages, 11 figure

arXiv.org e-Print Archive

MPG.PuRe

Delays in Reinforcement Learning

Author: Liotet Pierre
Publication venue
Publication date: 20/09/2023
Field of study

Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays

arXiv.org e-Print Archive

Computational mechanisms of curiosity and goal-directed exploration

Author: Agrawal
Agrawal
Auer
Auer
Badre
Barto
Beal
Bellemare
Blanchard
Blanchard
Bogacz
Boorman
Bromberg-Martin
Burda
Burda
Bush
Campagner
Chow
Cohen
Daw
Feldman
Findling
FitzGerald
Friston
Friston
Friston
Friston
Friston
Friston
Friston
Fu
Gershman
Gershman
Gershman
Gottlieb
Grant
Hauser
Hauser
Houthooft
Howard
Iglesias
Iigaya
Itti
Jones
Kaelbling
Kakade
Kidd
Kidd
Kidd
Koch
Kolling
Krebs
Laversanne-Finot
Ligneul
Luciw
Mnih
Montague
Moran
Morris
Muller
Nour
Ostrovski
Oudeyer
Oudeyer
Padoa-Schioppa
Parr
Pezzulo
Ranade
Rudebeck
Rushworth
Schmidhuber
Schultz
Schulz
Schwartenbeck
Schwartenbeck
Schwartenbeck
Schwartenbeck
Smith
Solopchuck
Speekenbrink
Srinivas
Stalnaker
Still
Sun
Sutton
Sutton
Takahashi
Takahashi
Takahashi
Tang
Thompson
van Lieshout
Vasconcelos
Waltz
Wang
Weickert
Wikenheiser
Wilson
Wilson
Yang
Yu
Zentall
Zentall
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2019
Field of study

Successful behaviour depends on the right balance between maximising reward and soliciting information about the world. Here, we show how different types of information-gain emerge when casting behaviour as surprise minimisation. We present two distinct mechanisms for goal-directed exploration that express separable profiles of active sampling to reduce uncertainty. 'Hidden state' exploration motivates agents to sample unambiguous observations to accurately infer the (hidden) state of the world. Conversely, 'model parameter' exploration, compels agents to sample outcomes associated with high uncertainty, if they are informative for their representation of the task structure. We illustrate the emergence of these types of information-gain, termed active inference and active learning, and show how these forms of exploration induce distinct patterns of 'Bayes-optimal' behaviour. Our findings provide a computational framework for understanding how distinct levels of uncertainty systematically affect the exploration-exploitation trade-off in decision-making

Paris Lodron University of Salzburg

Crossref

UCL Discovery

University of East Anglia digital repository

Search-based structured prediction

Author: D. M. Bikel
Daniel Marcu
F. Rosenblatt
Hal Daumé
I. Tsochantaridis
J. A. Bagnell
John Langford
L. R. Foulds
R. Ando
S. Russell
U. Germann
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Markov and Semi-markov Chains, Processes, Systems and Emerging Related Fields

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

This book covers a broad range of research results in the field of Markov and Semi-Markov chains, processes, systems and related emerging fields. The authors of the included research papers are well-known researchers in their field. The book presents the state-of-the-art and ideas for further research for theorists in the fields. Nonetheless, it also provides straightforwardly applicable results for diverse areas of practitioners

Directory of Open Access Books (DOAB)

Recommended from our members

A unified framework for resource-bounded autonomous agents interacting with unknown environments

Author: Ortega Pedro Alejandro Jr
Publication venue: University of Cambridge
Publication date: 12/07/2011
Field of study

The aim of this thesis is to present a mathematical framework for conceptualizing and constructing adaptive autonomous systems under resource constraints. The first part of this thesis contains a concise presentation of the foundations of classical agency: namely the formalizations of decision making and learning. Decision making includes: (a) subjective expected utility (SEU) theory, the framework of decision making under uncertainty; (b) the maximum SEU principle to choose the optimal solution; and (c) its application to the design of autonomous systems, culminating in the Bellman optimality equations. Learning includes: (a) Bayesian probability theory, the theory for reasoning under uncertainty that extends logic; and (b) Bayes-Optimal agents, the application of Bayesian probability theory to the design of optimal adaptive agents. Then, two major problems of the maximum SEU principle are highlighted: (a) the prohibitive computational costs and (b) the need for the causal precedence of the choice of the policy. The second part of this thesis tackles the two aforementioned problems. First, an information-theoretic notion of resources in autonomous systems is established. Second, a framework for resource-bounded agency is introduced. This includes: (a) a maximum bounded SEU principle that is derived from a set of axioms of utility; (b) an axiomatic model of probabilistic causality, which is applied for the formalization of autonomous systems having uncertainty over their policy and environment; and (c) the Bayesian control rule, which is derived from the maximum bounded SEU principle and the model of causality, implementing a stochastic adaptive control law that deals with the case where autonomous agents are uncertain about their policy and environment

Apollo (Cambridge)

Dynamic optimisation for energy efficiency of injection moulding process

Author: Yin Kam Hoe
Publication venue
Publication date: 01/07/2015
Field of study

Low carbon economy has emerged as an important task in China since the energy intensity and carbon intensity reduction targets were clearly prescribed in its recent Twelfth Five-Year Plan during 2011-2015. While the largest enterprises have undertaken initial initiative to reduce their energy consumption, small and medium-sized enterprises (SMEs) will need to share the responsibilities in meeting the nation’s targets. However, there is no established structure for helping SMEs to reduce their efficiency gap and hence the implementation of energy-saving measures in SMEs still remains patchy. Addressing this issue, this thesis seeks to understand the critical barriers faced by SMEs and aims to develop proprietary methodologies that can facilitate manufacturing SMEs to close their efficiency gap. Process parameters optimisation is perceived to be an effective “no-cost” strategy which can be conducted by SMEs to realise energy efficiency improvement. A unique design of experiment (DOE) introduced by Dorian Shainin offers a simplistic framework to study process optimisation, but its application is not widespread and being criticised over its working principles. In order to address the inherent limitations of the Shainin’s method, it was integrated with the multivariate statistical methods and the signal-response system in the empirical study. The nature of the research aim also requires a theoretical approach to evaluate the economic performance of the energy efficiency investment. Hence, a spreadsheet-based decision support system (file SERP.xlsm) was created via dynamic programming (DP) method. The main contributions of this thesis can be subdivided into empirical level and theoretical level. At the empirical level, a technically feasible yet economically viable approach called “multi-response dynamic Shainin DOE” was developed. An empirical study on the injection moulding process was presented to examine the validity of this novel integrated methodology. The emphasis has been on the integration of multivariate techniques and signal-response analysis. The former successfully identified the critical factors to energy consumption and moulded parts’ impact performance regardless of the great fluctuation in the impact response. The latter enables the end-user to achieve different performance output based on the particular intent. At the theoretical level, the “DP-based spreadsheet solution” provides a convenient platform to help the rationally-behaved decision makers evaluate the energy efficiency investments. A simple hypothetical case study on the injection moulding industry was illustrated how the decision-making process for equipment replacement can evolve over time. To sum up, both proprietary methodologies enhance the dynamicity in the optimisation process to support injection moulding industry in closing their efficiency gap. The study at the empirical level was limited by the absence of real industrial case study where it is important to justify the practicality of the proposed methodology. Regarding the theoretical level, the dataset for initial validation on the spreadsheet solution was not available. Finally, it is important to continue the future work on the research limitations in order to increase the cogency of the proprietary methodologies for common use in the industry

Nottingham eTheses

Antecipação na tomada de decisão com múltiplos critérios sob incerteza

Author: Azevedo Carlos Renato Belo, 1984-
Publication venue: [s.n.]
Publication date: 26/08/2018
Field of study

Orientador: Fernando José Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A presença de incerteza em resultados futuros pode levar a indecisões em processos de escolha, especialmente ao elicitar as importâncias relativas de múltiplos critérios de decisão e de desempenhos de curto vs. longo prazo. Algumas decisões, no entanto, devem ser tomadas sob informação incompleta, o que pode resultar em ações precipitadas com consequências imprevisíveis. Quando uma solução deve ser selecionada sob vários pontos de vista conflitantes para operar em ambientes ruidosos e variantes no tempo, implementar alternativas provisórias flexíveis pode ser fundamental para contornar a falta de informação completa, mantendo opções futuras em aberto. A engenharia antecipatória pode então ser considerada como a estratégia de conceber soluções flexíveis as quais permitem aos tomadores de decisão responder de forma robusta a cenários imprevisíveis. Essa estratégia pode, assim, mitigar os riscos de, sem intenção, se comprometer fortemente a alternativas incertas, ao mesmo tempo em que aumenta a adaptabilidade às mudanças futuras. Nesta tese, os papéis da antecipação e da flexibilidade na automação de processos de tomada de decisão sequencial com múltiplos critérios sob incerteza é investigado. O dilema de atribuir importâncias relativas aos critérios de decisão e a recompensas imediatas sob informação incompleta é então tratado pela antecipação autônoma de decisões flexíveis capazes de preservar ao máximo a diversidade de escolhas futuras. Uma metodologia de aprendizagem antecipatória on-line é então proposta para melhorar a variedade e qualidade dos conjuntos futuros de soluções de trade-off. Esse objetivo é alcançado por meio da previsão de conjuntos de máximo hipervolume esperado, para a qual as capacidades de antecipação de metaheurísticas multi-objetivo são incrementadas com rastreamento bayesiano em ambos os espaços de busca e dos objetivos. A metodologia foi aplicada para a obtenção de decisões de investimento, as quais levaram a melhoras significativas do hipervolume futuro de conjuntos de carteiras financeiras de trade-off avaliadas com dados de ações fora da amostra de treino, quando comparada a uma estratégia míope. Além disso, a tomada de decisões flexíveis para o rebalanceamento de carteiras foi confirmada como uma estratégia significativamente melhor do que a de escolher aleatoriamente uma decisão de investimento a partir da fronteira estocástica eficiente evoluída, em todos os mercados artificiais e reais testados. Finalmente, os resultados sugerem que a antecipação de opções flexíveis levou a composições de carteiras que se mostraram significativamente correlacionadas com as melhorias observadas no hipervolume futuro esperado, avaliado com dados fora das amostras de treinoAbstract: The presence of uncertainty in future outcomes can lead to indecision in choice processes, especially when eliciting the relative importances of multiple decision criteria and of long-term vs. near-term performance. Some decisions, however, must be taken under incomplete information, what may result in precipitated actions with unforeseen consequences. When a solution must be selected under multiple conflicting views for operating in time-varying and noisy environments, implementing flexible provisional alternatives can be critical to circumvent the lack of complete information by keeping future options open. Anticipatory engineering can be then regarded as the strategy of designing flexible solutions that enable decision makers to respond robustly to unpredictable scenarios. This strategy can thus mitigate the risks of strong unintended commitments to uncertain alternatives, while increasing adaptability to future changes. In this thesis, the roles of anticipation and of flexibility on automating sequential multiple criteria decision-making processes under uncertainty are investigated. The dilemma of assigning relative importances to decision criteria and to immediate rewards under incomplete information is then handled by autonomously anticipating flexible decisions predicted to maximally preserve diversity of future choices. An online anticipatory learning methodology is then proposed for improving the range and quality of future trade-off solution sets. This goal is achieved by predicting maximal expected hypervolume sets, for which the anticipation capabilities of multi-objective metaheuristics are augmented with Bayesian tracking in both the objective and search spaces. The methodology has been applied for obtaining investment decisions that are shown to significantly improve the future hypervolume of trade-off financial portfolios for out-of-sample stock data, when compared to a myopic strategy. Moreover, implementing flexible portfolio rebalancing decisions was confirmed as a significantly better strategy than to randomly choosing an investment decision from the evolved stochastic efficient frontier in all tested artificial and real-world markets. Finally, the results suggest that anticipating flexible choices has lead to portfolio compositions that are significantly correlated with the observed improvements in out-of-sample future expected hypervolumeDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp