7,540 research outputs found
Approximate Convex Optimization by Online Game Playing
Lagrangian relaxation and approximate optimization algorithms have received
much attention in the last two decades. Typically, the running time of these
methods to obtain a approximate solution is proportional to
. Recently, Bienstock and Iyengar, following Nesterov,
gave an algorithm for fractional packing linear programs which runs in
iterations. The latter algorithm requires to solve a
convex quadratic program every iteration - an optimization subroutine which
dominates the theoretical running time.
We give an algorithm for convex programs with strictly convex constraints
which runs in time proportional to . The algorithm does NOT
require to solve any quadratic program, but uses gradient steps and elementary
operations only. Problems which have strictly convex constraints include
maximum entropy frequency estimation, portfolio optimization with loss risk
constraints, and various computational problems in signal processing.
As a side product, we also obtain a simpler version of Bienstock and
Iyengar's result for general linear programming, with similar running time.
We derive these algorithms using a new framework for deriving convex
optimization algorithms from online game playing algorithms, which may be of
independent interest
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Recently, there has been a growing interest in modeling planning with
information constraints. Accordingly, an agent maximizes a regularized expected
utility known as the free energy, where the regularizer is given by the
information divergence from a prior to a posterior policy. While this approach
can be justified in various ways, including from statistical mechanics and
information theory, it is still unclear how it relates to decision-making
against adversarial environments. This connection has previously been suggested
in work relating the free energy to risk-sensitive control and to extensive
form games. Here, we show that a single-agent free energy optimization is
equivalent to a game between the agent and an imaginary adversary. The
adversary can, by paying an exponential penalty, generate costs that diminish
the decision maker's payoffs. It turns out that the optimal strategy of the
adversary consists in choosing costs so as to render the decision maker
indifferent among its choices, which is a definining property of a Nash
equilibrium, thus tightening the connection between free energy optimization
and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1
Learning To Play The Trading Game
Can we train a stock trading bot that can take decisions in high-entropy envi- ronments like stock markets to generate profits based on some optimal policy? Can we further extend this learning for any general trading problem? Quantitative Al- gorithms are responsible for more than 75% of the stock trading around the world. Creating a stock market prediction model is comparatively easy. But creating a prof- itable prediction model is still considered as a challenging task in the field of machine learning and deep learning due to the unpredictability of the financial markets. Us- ing biologically inspired computing techniques of reinforcement learning (RL) and artificial neural networks(ANN), this project attempts to train an agent who takes decisions based on the optimal decision policies learned. Different existing RL tech- niques and their slightly modified variants will be used to train the agent, and the trained model is then tested against different stock prices and also stock portfolio settings to see if the agent has learned the rules of the game and can it act optimally irrespective of the testing data provided. This work aims to provide general users with simple recommendations about the possible investment decisions of selected stocks in the portfolio. Results of the implemented approaches do seem to work somewhat well on specific periods of stock market time series, but they are observed to be fragile. Selected strategies do not guarantee similar results on all historical time-periods, nor they are guaranteed to provide exceptional performance on unpredictable future stock market time-series data
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Competitive portfolio selection using stochastic predictions
We study a portfolio selection problem where a player attempts to maximise a utility function that represents the growth rate of wealth. We show that, given some stochastic predictions of the asset prices in the next time step, a sublinear expected regret is attainable against an optimal greedy algorithm, subject to tradeoff against the \accuracy" of such predictions that learn (or improve) over time. We also study the effects of introducing transaction costs into the model
Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability
The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities.
Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio
- …