198 research outputs found

    Delay Sensitive Communications over Cognitive Radio Networks

    Full text link
    Supporting the quality of service of unlicensed users in cognitive radio networks is very challenging, mainly due to dynamic resource availability because of the licensed users' activities. In this paper, we study the optimal admission control and channel allocation decisions in cognitive overlay networks in order to support delay sensitive communications of unlicensed users. We formulate it as a Markov decision process problem, and solve it by transforming the original formulation into a stochastic shortest path problem. We then propose a simple heuristic control policy, which includes a threshold-based admission control scheme and and a largest-delay-first channel allocation scheme, and prove the optimality of the largest-delay-first channel allocation scheme. We further propose an improved policy using the rollout algorithm. By comparing the performance of both proposed policies with the upper-bound of the maximum revenue, we show that our policies achieve close-to-optimal performance with low complexities.Comment: 11 pages, 8 figure

    A Minimum Relative Entropy Principle for Learning and Acting

    Full text link
    This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.Comment: 36 pages, 11 figure

    Delays in Reinforcement Learning

    Full text link
    Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays

    Computational mechanisms of curiosity and goal-directed exploration

    Get PDF
    Successful behaviour depends on the right balance between maximising reward and soliciting information about the world. Here, we show how different types of information-gain emerge when casting behaviour as surprise minimisation. We present two distinct mechanisms for goal-directed exploration that express separable profiles of active sampling to reduce uncertainty. 'Hidden state' exploration motivates agents to sample unambiguous observations to accurately infer the (hidden) state of the world. Conversely, 'model parameter' exploration, compels agents to sample outcomes associated with high uncertainty, if they are informative for their representation of the task structure. We illustrate the emergence of these types of information-gain, termed active inference and active learning, and show how these forms of exploration induce distinct patterns of 'Bayes-optimal' behaviour. Our findings provide a computational framework for understanding how distinct levels of uncertainty systematically affect the exploration-exploitation trade-off in decision-making

    Markov and Semi-markov Chains, Processes, Systems and Emerging Related Fields

    Get PDF
    This book covers a broad range of research results in the field of Markov and Semi-Markov chains, processes, systems and related emerging fields. The authors of the included research papers are well-known researchers in their field. The book presents the state-of-the-art and ideas for further research for theorists in the fields. Nonetheless, it also provides straightforwardly applicable results for diverse areas of practitioners

    Dynamic optimisation for energy efficiency of injection moulding process

    Get PDF
    Low carbon economy has emerged as an important task in China since the energy intensity and carbon intensity reduction targets were clearly prescribed in its recent Twelfth Five-Year Plan during 2011-2015. While the largest enterprises have undertaken initial initiative to reduce their energy consumption, small and medium-sized enterprises (SMEs) will need to share the responsibilities in meeting the nation’s targets. However, there is no established structure for helping SMEs to reduce their efficiency gap and hence the implementation of energy-saving measures in SMEs still remains patchy. Addressing this issue, this thesis seeks to understand the critical barriers faced by SMEs and aims to develop proprietary methodologies that can facilitate manufacturing SMEs to close their efficiency gap. Process parameters optimisation is perceived to be an effective “no-cost” strategy which can be conducted by SMEs to realise energy efficiency improvement. A unique design of experiment (DOE) introduced by Dorian Shainin offers a simplistic framework to study process optimisation, but its application is not widespread and being criticised over its working principles. In order to address the inherent limitations of the Shainin’s method, it was integrated with the multivariate statistical methods and the signal-response system in the empirical study. The nature of the research aim also requires a theoretical approach to evaluate the economic performance of the energy efficiency investment. Hence, a spreadsheet-based decision support system (file SERP.xlsm) was created via dynamic programming (DP) method. The main contributions of this thesis can be subdivided into empirical level and theoretical level. At the empirical level, a technically feasible yet economically viable approach called “multi-response dynamic Shainin DOE” was developed. An empirical study on the injection moulding process was presented to examine the validity of this novel integrated methodology. The emphasis has been on the integration of multivariate techniques and signal-response analysis. The former successfully identified the critical factors to energy consumption and moulded parts’ impact performance regardless of the great fluctuation in the impact response. The latter enables the end-user to achieve different performance output based on the particular intent. At the theoretical level, the “DP-based spreadsheet solution” provides a convenient platform to help the rationally-behaved decision makers evaluate the energy efficiency investments. A simple hypothetical case study on the injection moulding industry was illustrated how the decision-making process for equipment replacement can evolve over time. To sum up, both proprietary methodologies enhance the dynamicity in the optimisation process to support injection moulding industry in closing their efficiency gap. The study at the empirical level was limited by the absence of real industrial case study where it is important to justify the practicality of the proposed methodology. Regarding the theoretical level, the dataset for initial validation on the spreadsheet solution was not available. Finally, it is important to continue the future work on the research limitations in order to increase the cogency of the proprietary methodologies for common use in the industry

    Antecipação na tomada de decisão com múltiplos critérios sob incerteza

    Get PDF
    Orientador: Fernando José Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A presença de incerteza em resultados futuros pode levar a indecisões em processos de escolha, especialmente ao elicitar as importâncias relativas de múltiplos critérios de decisão e de desempenhos de curto vs. longo prazo. Algumas decisões, no entanto, devem ser tomadas sob informação incompleta, o que pode resultar em ações precipitadas com consequências imprevisíveis. Quando uma solução deve ser selecionada sob vários pontos de vista conflitantes para operar em ambientes ruidosos e variantes no tempo, implementar alternativas provisórias flexíveis pode ser fundamental para contornar a falta de informação completa, mantendo opções futuras em aberto. A engenharia antecipatória pode então ser considerada como a estratégia de conceber soluções flexíveis as quais permitem aos tomadores de decisão responder de forma robusta a cenários imprevisíveis. Essa estratégia pode, assim, mitigar os riscos de, sem intenção, se comprometer fortemente a alternativas incertas, ao mesmo tempo em que aumenta a adaptabilidade às mudanças futuras. Nesta tese, os papéis da antecipação e da flexibilidade na automação de processos de tomada de decisão sequencial com múltiplos critérios sob incerteza é investigado. O dilema de atribuir importâncias relativas aos critérios de decisão e a recompensas imediatas sob informação incompleta é então tratado pela antecipação autônoma de decisões flexíveis capazes de preservar ao máximo a diversidade de escolhas futuras. Uma metodologia de aprendizagem antecipatória on-line é então proposta para melhorar a variedade e qualidade dos conjuntos futuros de soluções de trade-off. Esse objetivo é alcançado por meio da previsão de conjuntos de máximo hipervolume esperado, para a qual as capacidades de antecipação de metaheurísticas multi-objetivo são incrementadas com rastreamento bayesiano em ambos os espaços de busca e dos objetivos. A metodologia foi aplicada para a obtenção de decisões de investimento, as quais levaram a melhoras significativas do hipervolume futuro de conjuntos de carteiras financeiras de trade-off avaliadas com dados de ações fora da amostra de treino, quando comparada a uma estratégia míope. Além disso, a tomada de decisões flexíveis para o rebalanceamento de carteiras foi confirmada como uma estratégia significativamente melhor do que a de escolher aleatoriamente uma decisão de investimento a partir da fronteira estocástica eficiente evoluída, em todos os mercados artificiais e reais testados. Finalmente, os resultados sugerem que a antecipação de opções flexíveis levou a composições de carteiras que se mostraram significativamente correlacionadas com as melhorias observadas no hipervolume futuro esperado, avaliado com dados fora das amostras de treinoAbstract: The presence of uncertainty in future outcomes can lead to indecision in choice processes, especially when eliciting the relative importances of multiple decision criteria and of long-term vs. near-term performance. Some decisions, however, must be taken under incomplete information, what may result in precipitated actions with unforeseen consequences. When a solution must be selected under multiple conflicting views for operating in time-varying and noisy environments, implementing flexible provisional alternatives can be critical to circumvent the lack of complete information by keeping future options open. Anticipatory engineering can be then regarded as the strategy of designing flexible solutions that enable decision makers to respond robustly to unpredictable scenarios. This strategy can thus mitigate the risks of strong unintended commitments to uncertain alternatives, while increasing adaptability to future changes. In this thesis, the roles of anticipation and of flexibility on automating sequential multiple criteria decision-making processes under uncertainty are investigated. The dilemma of assigning relative importances to decision criteria and to immediate rewards under incomplete information is then handled by autonomously anticipating flexible decisions predicted to maximally preserve diversity of future choices. An online anticipatory learning methodology is then proposed for improving the range and quality of future trade-off solution sets. This goal is achieved by predicting maximal expected hypervolume sets, for which the anticipation capabilities of multi-objective metaheuristics are augmented with Bayesian tracking in both the objective and search spaces. The methodology has been applied for obtaining investment decisions that are shown to significantly improve the future hypervolume of trade-off financial portfolios for out-of-sample stock data, when compared to a myopic strategy. Moreover, implementing flexible portfolio rebalancing decisions was confirmed as a significantly better strategy than to randomly choosing an investment decision from the evolved stochastic efficient frontier in all tested artificial and real-world markets. Finally, the results suggest that anticipating flexible choices has lead to portfolio compositions that are significantly correlated with the observed improvements in out-of-sample future expected hypervolumeDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric
    corecore