2,016 research outputs found

    Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability

    Get PDF
    The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities. Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio

    Markov Decision Processes with Risk-Sensitive Criteria: An Overview

    Full text link
    The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term 'risk-sensitive' refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk. This comprises the well-known entropic risk measure and Conditional Value-at-Risk. We restrict our considerations to stationary problems with an infinite time horizon. Conditions are given under which optimal policies exist and solution procedures are explained. We present both the theory when the Optimized Certainty Equivalent is applied recursively as well as the case where it is applied to the cumulated reward. Discounted as well as non-discounted models are reviewe

    Cognitive finance: Behavioural strategies of spending, saving, and investing.

    Get PDF
    Research in economics is increasingly open to empirical results. The advances in behavioural approaches are expanded here by applying cognitive methods to financial questions. The field of "cognitive finance" is approached by the exploration of decision strategies in the financial settings of spending, saving, and investing. Individual strategies in these different domains are searched for and elaborated to derive explanations for observed irregularities in financial decision making. Strong context-dependency and adaptive learning form the basis for this cognition-based approach to finance. Experiments, ratings, and real world data analysis are carried out in specific financial settings, combining different research methods to improve the understanding of natural financial behaviour. People use various strategies in the domains of spending, saving, and investing. Specific spending profiles can be elaborated for a better understanding of individual spending differences. It was found that people differ along four dimensions of spending, which can be labelled: General Leisure, Regular Maintenance, Risk Orientation, and Future Orientation. Saving behaviour is strongly dependent on how people mentally structure their finance and on their self-control attitude towards decision space restrictions, environmental cues, and contingency structures. Investment strategies depend on how companies, in which investments are placed, are evaluated on factors such as Honesty, Prestige, Innovation, and Power. Further on, different information integration strategies can be learned in decision situations with direct feedback. The mapping of cognitive processes in financial decision making is discussed and adaptive learning mechanisms are proposed for the observed behavioural differences. The construal of a "financial personality" is proposed in accordance with other dimensions of personality measures, to better acknowledge and predict variations in financial behaviour. This perspective enriches economic theories and provides a useful ground for improving individual financial services

    Cryptocurrency trading as a Markov Decision Process

    Get PDF
    A gestão de portefólio é um problema em que, em vez de olhar para ativos únicos, o objetivo é olhar para um portefólio ou um conjunto de ativos como um todo. O objetivo é ter o melhor portefólio, a cada momento, enquanto tenta maximizar os lucros no final de uma sessão de trading. Esta tese aborda esta problemática, empregando algoritmos de Deep Reinforcement Learning, num ambiente que simula uma sessão de trading. É também apresentada a implementação desta metodologia proposta, aplicada a 11 criptomoedas e cinco algoritmos DRL. Foram avaliados três tipos de condições de mercado: tendência de alta, tendência de baixa e lateralização. Cada condição de mercado em cada algoritmo foi avaliada, usando três funções de recompensa diferentes, no ambiente de negociação, e todos os diferentes cenários foram testados contra as estratégias de gestão de portefólio clássicas, como seguir o vencedor, seguir o perdedor e portefólios igualmente distribuídos. Assim, esta estratégia foi o benchmark mais performativo e os modelos que produziram os melhores resultados tiveram uma abordagem semelhante, diversificar e segurar. Deep Deterministic Policy Gradient apresentou-se como o algoritmo mais estável, junto com seu algoritmo de extensão, Twin Delayed Deep Deterministic Policy Gradient. Proximal Policy Optimization foi o único algoritmo que não conseguiu produzir resultados decentes ao comparar com as estratégias de benchmark e outros algoritmos de Deep Reinforcement Learning.The problem with portfolio management is that, instead of looking at single assets, the goal is to look at a portfolio or a set of assets as a whole. The objective is to have the best portfolio at each given time while trying to maximize profits at the end of a trading session. This thesis addresses this issue by employing the Deep Reinforcement Learning algorithms in a cryptocurrency trading environment which simulates a trading session. It is also presented the implementation of this proposed methodology applied to 11 cryptocurrencies and five Deep Reinforcement Learning algorithms. Three types of market conditions were evaluated namely, up trending or bullish, down trending or bearish, and lateralization or sideways. Each market condition in each algorithm was evaluated using three different reward functions in the trading environment and all different scenarios were back tested against old school portfolio management strategies such as following-the-winner, following-the-loser, and equally weighted portfolios. The results seem to indicate that an equally-weighted portfolio is an hard to beat strategy in all market conditions. This strategy was the most performative benchmark and the models that produced the best results had a similar approach, diversify and hold. Deep Deterministic Policy Gradient presented itself to be the most stable algorithm along with its extension algorithm, Twin Delayed Deep Deterministic Policy Gradient. Proximal Policy Optimization was the only algorithm that could not produce decent results when comparing with the benchmark strategies and other Deep Reinforcement Learning algorithms

    Semi-Cooperative Learning in Smart Grid Agents

    Full text link

    Reinforcement learning for sequential decision-making: a data driven approach for finance

    Get PDF
    This work presents a variety of reinforcement learning applications to the domain of nance. It composes of two-part. The rst one represents a technical overview of the basic concepts in machine learning, which are required to understand and work with the reinforcement learning paradigm and are shared among the domains of applications. Chapter 1 outlines the fundamental principle of machine learning reasoning before introducing the neural network model as a central component of every algorithm presented in this work. Chapter 2 introduces the idea of reinforcement learning from its roots, focusing on the mathematical formalism generally employed in every application. We focus on integrating the reinforcement learning framework with the neural network, and we explain their critical role in the eld's development. After the technical part, we present our original contribution, articulated in three di erent essays. The narrative line follows the idea of introducing the use of varying reinforcement learning algorithms through a trading application (Brini and Tantari, 2021) in Chapter 3. Then in Chapter 4 we focus on one of the presented reinforcement learning algorithms and aim at improving its performance and scalability in solving the trading problem by leveraging prior knowledge of the setting. In Chapter 5 of the second part, we use the same reinforcement learning algorithm to solve the problem of exchanging liquidity in a system of banks that can borrow and lend money, highlighting the exibility and the e ectiveness of the reinforcement learning paradigm in the broad nancial domain. We conclude with some remarks and ideas for further research in reinforcement learning applied to nance

    Machine learning applications for censored data

    Get PDF
    The amount of data being gathered has increased tremendously as many aspects of our lives are becoming increasingly digital. Data alone is not useful, because the ultimate goal is to use the data to obtain new insights and create new applications. The largest challenge of computer science has been the largest on the algorithmic front: how can we create machines that help us do useful things with the data? To address this challenge, the field of data science has emerged as the systematic and interdisciplinary study of how knowledge can be extracted from both structed and unstructured data sets. Machine learning is a subfield of data science, where the task of building predictive models from data has been automated by a general learning algorithm and high prediction accuracy is the primary goal. Many practical problems can be formulated as questions and there is often data that describes the problem. The solution therefore seems simple: formulate a data set of inputs and outputs, and then apply machine learning to these examples in order to learn to predict the outputs. However, many practical problems are such that the correct outputs are not available because it takes years to collect them. For example, if one wants to predict the total amount of money spent by different customers, in principle one has to wait until all customers have decided to stop buying to add all of the purchases together to get the answers. We say that the data is ’censored’; the correct answers are only partially available because we cannot wait potentially years to collect a data set of historical inputs and outputs. This thesis presents new applications of machine learning to censored data sets, with the goal of answering the most relevant question in each application. These applications include digital marketing, peer-to-peer lending, unemployment, and game recommendation. Our solution takes into account the censoring in the data set, where previous applications have obtained biased results or used older data sets where censoring is not a problem. The solution is based on a three stage process that combines a mathematical description of the problem with machine learning: 1) deconstruct the problem as pairwise data, 2) apply machine learning to predict the missing pairs, 3) reconstruct the correct answer from these pairs. The abstract solution is similar in all domains, but the specific machine learning model and the pairwise description of the problem depends on the application.Kerätyn datan määrä on kasvanut kun digitalisoituminen on edennyt. Itse data ei kuitenkaan ole arvokasta, vaan tavoitteena on käyttää dataa tiedon hankkimiseen ja uusissa sovelluksissa. Suurin haaste onkin menetelmäkehityksessä: miten voidaan kehittää koneita jotka osaavat käyttää dataa hyödyksi? Monien alojen yhtymäkohtaa onkin kutsuttu Datatieteeksi (Data Science). Sen tavoitteena on ymmärtää, miten tietoa voidaan systemaattisesti saada sekä strukturoiduista että strukturoimattomista datajoukoista. Koneoppiminen voidaan nähdä osana datatiedettä, kun tavoitteena on rakentaa ennustavia malleja automaattisesti datasta ns. yleiseen oppimisalgoritmiin perustuen ja menetelmän fokus on ennustustarkkuudessa. Monet käytännön ongelmat voidaan muotoilla kysymyksinä, jota kuvaamaan on kerätty dataa. Ratkaisu vaikuttaakin koneoppimisen kannalta helpolta: määritellään datajoukko syötteitä ja oikeita vastauksia, ja kun koneoppimista sovelletaan tähän datajoukkoon niin vastaus opitaan ennustamaan. Monissa käytännön ongelmissa oikeaa vastausta ei kuitenkaan ole täysin saatavilla, koska datan kerääminen voi kestää vuosia. Jos esimerkiksi halutaan ennustaa miten paljon rahaa eri asiakkaat kuluttavat elinkaarensa aikana, täytyisi periaatteessa odottaa kunnes yrityksen kaikki asiakkaat lopettavat ostosten tekemisen jotta nämä voidaan laskea yhteen lopullisen vastauksen saamiseksi. Kutsumme tämänkaltaista datajoukkoa ’sensuroiduksi’; oikeat vastaukset on havaittu vain osittain koska esimerkkien kerääminen syötteistä ja oikeista vastauksista voi kestää vuosia. Tämä väitös esittelee koneoppimisen uusia sovelluksia sensuroituihin datajoukkoihin, ja tavoitteena on vastata kaikkein tärkeimpään kysymykseen kussakin sovelluksessa. Sovelluksina ovat mm. digitaalinen markkinointi, vertaislainaus, työttömyys ja pelisuosittelu. Ratkaisu ottaa huomioon sensuroinnin, siinä missä edelliset ratkaisut ovat saaneet vääristyneitä tuloksia tai keskittyneet ratkaisemaan yksinkertaisempaa ongelmaa datajoukoissa, joissa sensurointi ei ole ongelma. Ehdottamamme ratkaisu perustuu kolmeen vaiheeseen jossa yhdistyy ongelman matemaattinen ymmärrys ja koneoppiminen: 1) ongelma dekonstruoidaan parittaisena datana 2) koneoppimista sovelletaan puuttuvien parien ennustamiseen 3) oikea vastaus rekonstruoidaan ennustetuista pareista. Abstraktilla tasolla idea on kaikissa paperissa sama, mutta jokaisessa sovelluksessa hyödynnetään sitä varten suunniteltua koneoppimismenetelmää ja parittaista kuvausta

    Artificial Intelligence and Machine Learning Approaches to Energy Demand-Side Response: A Systematic Review

    Get PDF
    Recent years have seen an increasing interest in Demand Response (DR) as a means to provide flexibility, and hence improve the reliability of energy systems in a cost-effective way. Yet, the high complexity of the tasks associated with DR, combined with their use of large-scale data and the frequent need for near real-time de-cisions, means that Artificial Intelligence (AI) and Machine Learning (ML) — a branch of AI — have recently emerged as key technologies for enabling demand-side response. AI methods can be used to tackle various challenges, ranging from selecting the optimal set of consumers to respond, learning their attributes and pref-erences, dynamic pricing, scheduling and control of devices, learning how to incentivise participants in the DR schemes and how to reward them in a fair and economically efficient way. This work provides an overview of AI methods utilised for DR applications, based on a systematic review of over 160 papers, 40 companies and commercial initiatives, and 21 large-scale projects. The papers are classified with regards to both the AI/ML algorithm(s) used and the application area in energy DR. Next, commercial initiatives are presented (including both start-ups and established companies) and large-scale innovation projects, where AI methods have been used for energy DR. The paper concludes with a discussion of advantages and potential limitations of reviewed AI techniques for different DR tasks, and outlines directions for future research in this fast-growing area
    corecore