Search CORE

7,540 research outputs found

Approximate Convex Optimization by Online Game Playing

Author: Hazan Elad
Publication venue
Publication date: 01/01/2006
Field of study

Lagrangian relaxation and approximate optimization algorithms have received much attention in the last two decades. Typically, the running time of these methods to obtain a

\epsilon

approximate solution is proportional to

\frac{1}{\epsilon^2}

. Recently, Bienstock and Iyengar, following Nesterov, gave an algorithm for fractional packing linear programs which runs in

\frac{1}{\epsilon}

iterations. The latter algorithm requires to solve a convex quadratic program every iteration - an optimization subroutine which dominates the theoretical running time. We give an algorithm for convex programs with strictly convex constraints which runs in time proportional to

\frac{1}{\epsilon}

. The algorithm does NOT require to solve any quadratic program, but uses gradient steps and elementary operations only. Problems which have strictly convex constraints include maximum entropy frequency estimation, portfolio optimization with loss risk constraints, and various computational problems in signal processing. As a side product, we also obtain a simpler version of Bienstock and Iyengar's result for general linear programming, with similar running time. We derive these algorithms using a new framework for deriving convex optimization algorithms from online game playing algorithms, which may be of independent interest

arXiv.org e-Print Archive

CiteSeerX

An Adversarial Interpretation of Information-Theoretic Bounded Rationality

Author: Lee Daniel D.
Ortega Pedro A.
Publication venue
Publication date: 22/04/2014
Field of study

Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decision-making against adversarial environments. This connection has previously been suggested in work relating the free energy to risk-sensitive control and to extensive form games. Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker's payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning To Play The Trading Game

Author: Kulkarni Neeraj
Publication venue: SJSU ScholarWorks
Publication date: 22/05/2019
Field of study

Can we train a stock trading bot that can take decisions in high-entropy envi- ronments like stock markets to generate profits based on some optimal policy? Can we further extend this learning for any general trading problem? Quantitative Al- gorithms are responsible for more than 75% of the stock trading around the world. Creating a stock market prediction model is comparatively easy. But creating a prof- itable prediction model is still considered as a challenging task in the field of machine learning and deep learning due to the unpredictability of the financial markets. Us- ing biologically inspired computing techniques of reinforcement learning (RL) and artificial neural networks(ANN), this project attempts to train an agent who takes decisions based on the optimal decision policies learned. Different existing RL tech- niques and their slightly modified variants will be used to train the agent, and the trained model is then tested against different stock prices and also stock portfolio settings to see if the agent has learned the rules of the game and can it act optimally irrespective of the testing data provided. This work aims to provide general users with simple recommendations about the possible investment decisions of selected stocks in the portfolio. Results of the implemented approaches do seem to work somewhat well on specific periods of stock market time series, but they are observed to be fragile. Selected strategies do not guarantee similar results on all historical time-periods, nor they are guaranteed to provide exceptional performance on unpredictable future stock market time-series data

SJSU ScholarWorks

Deep learning for video game playing

Author: Bontrager Philip
Justesen Niels
Risi Sebastian
Togelius Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository

Competitive portfolio selection using stochastic predictions

Author: A Blum
E Hazan
F Black
G Stoltz
I Karatzas
J Kivinen
L Györfi
RC Merton
TM Cover
TM Cover
WF Sharpe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/09/2016
Field of study

We study a portfolio selection problem where a player attempts to maximise a utility function that represents the growth rate of wealth. We show that, given some stochastic predictions of the asset prices in the next time step, a sublinear expected regret is attainable against an optimal greedy algorithm, subject to tradeoff against the \accuracy" of such predictions that learn (or improve) over time. We also study the effects of introducing transaction costs into the model

Crossref

LSE Research Online

Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability

Author: Maree Charl
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities. Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio

Agder University Research Archive