332 research outputs found
A Neural Networks Committee for the Contextual Bandit Problem
This paper presents a new contextual bandit algorithm, NeuralBandit, which
does not need hypothesis on stationarity of contexts and rewards. Several
neural networks are trained to modelize the value of rewards knowing the
context. Two variants, based on multi-experts approach, are proposed to choose
online the parameters of multi-layer perceptrons. The proposed algorithms are
successfully tested on a large dataset with and without stationarity of
rewards.Comment: 21st International Conference on Neural Information Processin
The Assistive Multi-Armed Bandit
Learning preferences implicit in the choices humans make is a well studied
problem in both economics and computer science. However, most work makes the
assumption that humans are acting (noisily) optimally with respect to their
preferences. Such approaches can fail when people are themselves learning about
what they want. In this work, we introduce the assistive multi-armed bandit,
where a robot assists a human playing a bandit task to maximize cumulative
reward. In this problem, the human does not know the reward function but can
learn it through the rewards received from arm pulls; the robot only observes
which arms the human pulls but not the reward associated with each pull. We
offer sufficient and necessary conditions for successfully assisting the human
in this framework. Surprisingly, better human performance in isolation does not
necessarily lead to better performance when assisted by the robot: a human
policy can do better by effectively communicating its observed rewards to the
robot. We conduct proof-of-concept experiments that support these results. We
see this work as contributing towards a theory behind algorithms for
human-robot interaction.Comment: Accepted to HRI 201
Hedging using reinforcement learning: Contextual -Armed Bandit versus -learning
The construction of replication strategies for contingent claims in the
presence of risk and market friction is a key problem of financial engineering.
In real markets, continuous replication, such as in the model of Black, Scholes
and Merton, is not only unrealistic but it is also undesirable due to high
transaction costs. Over the last decades stochastic optimal-control methods
have been developed to balance between effective replication and losses. More
recently, with the rise of artificial intelligence, temporal-difference
Reinforcement Learning, in particular variations of -learning in conjunction
with Deep Neural Networks, have attracted significant interest. From a
practical point of view, however, such methods are often relatively sample
inefficient, hard to train and lack performance guarantees. This motivates the
investigation of a stable benchmark algorithm for hedging. In this article, the
hedging problem is viewed as an instance of a risk-averse contextual -armed
bandit problem, for which a large body of theoretical results and well-studied
algorithms are available. We find that the -armed bandit model naturally
fits to the formulation of hedging, providing for a more accurate and
sample efficient approach than -learning and reducing to the Black-Scholes
model in the absence of transaction costs and risks.Comment: 15 pages, 7 figure
Gated Linear Networks for Continual Learning in a Class-Incremental with Repetition Scenario
Il continual learning, che comporta l'acquisizione incrementale di conoscenze nel tempo, è un problema impegnativo in ambienti complessi in cui la distribuzione dei dati può cambiare nel tempo. Nonostante i grandi risultati ottenuti dalle reti neurali nel risolvere una grande varietà di compiti, fanno ancora fatica a raggiungere le stesse buone prestazioni in un ambiente di apprendimento continuo, soffrendo di un problema noto come catastrophic forgetting. Questo problema, che consiste nella tendenza di un modello a sovrascrivere vecchie informazioni quando ne vengono presentate di nuove, è stato affrontato attraverso una varietà di strategie che adattano il modello in diversi modi. Tra queste, in questo lavoro ci concentreremo sulle Gated Linear Networks (GLN), un tipo di modelli che si basano su un meccanismo di gating per migliorare l'archiviazione e il recupero delle informazioni nel tempo. Questa classe di modelli è già stata applicata al continual learning con risultati finora promettenti, ma sempre in framework estremamente semplificati. In questo lavoro cercheremo di definire un ambiente di apprendimento continuo più complesso e di adattare i GLN alle crescenti sfide che questo ambiente presenterà, valutandone i punti di forza ed i limiti. In particolare, abbiamo scoperto che la presenza di una fase di encoding può aiutare a rendere un set di dati complessi più spazialmente separabile e quindi rendere i GLN più efficaci, e che il passaggio a uno scenario Class-Incremental con Ripetizioni è utile sia per aumentare il realismo del framework sia per facilitare l'apprendimento.Continual learning, which involves the incremental acquisition of knowledge over time, is a challenging problem in complex environments where the distribution of data may change over time. Despite the great results obtained by neural networks in solving a great variety of tasks they still struggle in showing the same strong performance in a continual learning environment, suffering from a problem known as catastrophic forgetting. This problem, that consists in a model's tendency to overwrite old knowledge when new one is presented, has been dealt with through a variety of strategies that adapt the models on different levels. Among those, in this work we will focus on Gated Linear Networks (GLNs), a type of models that rely on a gating mechanism to improve the storage and retrieval of information over time. This class of models has already been applied to continual learning with promising results, but always in extremely simplified frameworks. In this work we will try to define a more complex continual learning environment and to adapt GLNs to the increased challenges that this environment will present, evaluating their strengths and their limitations. In particular, we found that performing an encoding step can help making a complex dataset more spatially separable and therefore making the GLNs more effective, and that switching to a Class-Incremental with Repetition scenario is useful both to increase the realism of the framework while easing the learning difficulty
Orchestrating energy-efficient vRANs: Bayesian learning and experimental results
Virtualized base stations (vBS) can be implemented in diverse commodity platforms and are expected to bring unprecedented operational flexibility and cost efficiency to the next generation of cellular networks. However, their widespread adoption is hampered by their complex configuration options that affect in a non-traditional fashion both their performance and their power consumption requirements. Following an in-depth experimental analysis in a bespoke testbed, we characterize the vBS power cost profile and reveal previously unknown couplings between their various control knobs. Motivated by these findings, we develop a Bayesian learning framework for the orchestration of vBSs and design two novel algorithms: (i) BP-vRAN, which employs online learning to balance the vBS performance and energy consumption, and (ii) SBP-vRAN, which augments our optimization approach with safe controls that maximize performance while respecting hard power constraints. We show that our approaches are data-efficient, i.e., converge an order of magnitude faster than state-of-the-art Deep Reinforcement Learning methods, and achieve optimal performance. We demonstrate the efficacy of these solutions in an experimental prototype using real traffic traces.This work has been supported by the European Commission through Grant No. 101017109 (DAEMON project), and the CERCA Programme/Generalitat de Catalunya
- …