332 research outputs found

    A Neural Networks Committee for the Contextual Bandit Problem

    Get PDF
    This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards.Comment: 21st International Conference on Neural Information Processin

    The Assistive Multi-Armed Bandit

    Full text link
    Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.Comment: Accepted to HRI 201

    Hedging using reinforcement learning: Contextual kk-Armed Bandit versus QQ-learning

    Full text link
    The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton, is not only unrealistic but it is also undesirable due to high transaction costs. Over the last decades stochastic optimal-control methods have been developed to balance between effective replication and losses. More recently, with the rise of artificial intelligence, temporal-difference Reinforcement Learning, in particular variations of QQ-learning in conjunction with Deep Neural Networks, have attracted significant interest. From a practical point of view, however, such methods are often relatively sample inefficient, hard to train and lack performance guarantees. This motivates the investigation of a stable benchmark algorithm for hedging. In this article, the hedging problem is viewed as an instance of a risk-averse contextual kk-armed bandit problem, for which a large body of theoretical results and well-studied algorithms are available. We find that the kk-armed bandit model naturally fits to the P&LP\&L formulation of hedging, providing for a more accurate and sample efficient approach than QQ-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.Comment: 15 pages, 7 figure

    Gated Linear Networks for Continual Learning in a Class-Incremental with Repetition Scenario

    Get PDF
    Il continual learning, che comporta l'acquisizione incrementale di conoscenze nel tempo, è un problema impegnativo in ambienti complessi in cui la distribuzione dei dati può cambiare nel tempo. Nonostante i grandi risultati ottenuti dalle reti neurali nel risolvere una grande varietà di compiti, fanno ancora fatica a raggiungere le stesse buone prestazioni in un ambiente di apprendimento continuo, soffrendo di un problema noto come catastrophic forgetting. Questo problema, che consiste nella tendenza di un modello a sovrascrivere vecchie informazioni quando ne vengono presentate di nuove, è stato affrontato attraverso una varietà di strategie che adattano il modello in diversi modi. Tra queste, in questo lavoro ci concentreremo sulle Gated Linear Networks (GLN), un tipo di modelli che si basano su un meccanismo di gating per migliorare l'archiviazione e il recupero delle informazioni nel tempo. Questa classe di modelli è già stata applicata al continual learning con risultati finora promettenti, ma sempre in framework estremamente semplificati. In questo lavoro cercheremo di definire un ambiente di apprendimento continuo più complesso e di adattare i GLN alle crescenti sfide che questo ambiente presenterà, valutandone i punti di forza ed i limiti. In particolare, abbiamo scoperto che la presenza di una fase di encoding può aiutare a rendere un set di dati complessi più spazialmente separabile e quindi rendere i GLN più efficaci, e che il passaggio a uno scenario Class-Incremental con Ripetizioni è utile sia per aumentare il realismo del framework sia per facilitare l'apprendimento.Continual learning, which involves the incremental acquisition of knowledge over time, is a challenging problem in complex environments where the distribution of data may change over time. Despite the great results obtained by neural networks in solving a great variety of tasks they still struggle in showing the same strong performance in a continual learning environment, suffering from a problem known as catastrophic forgetting. This problem, that consists in a model's tendency to overwrite old knowledge when new one is presented, has been dealt with through a variety of strategies that adapt the models on different levels. Among those, in this work we will focus on Gated Linear Networks (GLNs), a type of models that rely on a gating mechanism to improve the storage and retrieval of information over time. This class of models has already been applied to continual learning with promising results, but always in extremely simplified frameworks. In this work we will try to define a more complex continual learning environment and to adapt GLNs to the increased challenges that this environment will present, evaluating their strengths and their limitations. In particular, we found that performing an encoding step can help making a complex dataset more spatially separable and therefore making the GLNs more effective, and that switching to a Class-Incremental with Repetition scenario is useful both to increase the realism of the framework while easing the learning difficulty

    Orchestrating energy-efficient vRANs: Bayesian learning and experimental results

    Get PDF
    Virtualized base stations (vBS) can be implemented in diverse commodity platforms and are expected to bring unprecedented operational flexibility and cost efficiency to the next generation of cellular networks. However, their widespread adoption is hampered by their complex configuration options that affect in a non-traditional fashion both their performance and their power consumption requirements. Following an in-depth experimental analysis in a bespoke testbed, we characterize the vBS power cost profile and reveal previously unknown couplings between their various control knobs. Motivated by these findings, we develop a Bayesian learning framework for the orchestration of vBSs and design two novel algorithms: (i) BP-vRAN, which employs online learning to balance the vBS performance and energy consumption, and (ii) SBP-vRAN, which augments our optimization approach with safe controls that maximize performance while respecting hard power constraints. We show that our approaches are data-efficient, i.e., converge an order of magnitude faster than state-of-the-art Deep Reinforcement Learning methods, and achieve optimal performance. We demonstrate the efficacy of these solutions in an experimental prototype using real traffic traces.This work has been supported by the European Commission through Grant No. 101017109 (DAEMON project), and the CERCA Programme/Generalitat de Catalunya
    corecore