54,167 research outputs found
Welfare and Fairness in Multi-objective Reinforcement Learning
We study fair multi-objective reinforcement learning in which an agent must
learn a policy that simultaneously achieves high reward on multiple dimensions
of a vector-valued reward. Motivated by the fair resource allocation
literature, we model this as an expected welfare maximization problem, for some
non-linear fair welfare function of the vector of long-term cumulative rewards.
One canonical example of such a function is the Nash Social Welfare, or
geometric mean, the log transform of which is also known as the Proportional
Fairness objective. We show that even approximately optimal optimization of the
expected Nash Social Welfare is computationally intractable even in the tabular
case. Nevertheless, we provide a novel adaptation of Q-learning that combines
non-linear scalarized learning updates and non-stationary action selection to
learn effective policies for optimizing nonlinear welfare functions. We show
that our algorithm is provably convergent, and we demonstrate experimentally
that our approach outperforms techniques based on linear scalarization,
mixtures of optimal linear scalarizations, or stationary action selection for
the Nash Social Welfare Objective.Comment: 9 page
IMM: An Imitative Reinforcement Learning Approach with Predictive Representation Learning for Automatic Market Making
Market making (MM) has attracted significant attention in financial trading
owing to its essential function in ensuring market liquidity. With strong
capabilities in sequential decision-making, Reinforcement Learning (RL)
technology has achieved remarkable success in quantitative trading.
Nonetheless, most existing RL-based MM methods focus on optimizing single-price
level strategies which fail at frequent order cancellations and loss of queue
priority. Strategies involving multiple price levels align better with actual
trading scenarios. However, given the complexity that multi-price level
strategies involves a comprehensive trading action space, the challenge of
effectively training profitable RL agents for MM persists. Inspired by the
efficient workflow of professional human market makers, we propose Imitative
Market Maker (IMM), a novel RL framework leveraging both knowledge from
suboptimal signal-based experts and direct policy interactions to develop
multi-price level MM strategies efficiently. The framework start with
introducing effective state and action representations adept at encoding
information about multi-price level orders. Furthermore, IMM integrates a
representation learning unit capable of capturing both short- and long-term
market trends to mitigate adverse selection risk. Subsequently, IMM formulates
an expert strategy based on signals and trains the agent through the
integration of RL and imitation learning techniques, leading to efficient
learning. Extensive experimental results on four real-world market datasets
demonstrate that IMM outperforms current RL-based market making strategies in
terms of several financial criteria. The findings of the ablation study
substantiate the effectiveness of the model components
Comparing policy gradient and value function based reinforcement learning methods in simulated electrical power trade
In electrical power engineering, reinforcement learning algorithms can be used to model the strategies of electricity market participants. However, traditional value function based reinforcement learning algorithms suffer from convergence issues when used with value function approximators. Function approximation is required in this domain to capture the characteristics of the complex and continuous multivariate problem space. The contribution of this paper is the comparison of policy gradient reinforcement learning methods, using artificial neural networks for policy function approximation, with traditional value function based methods in simulations of electricity trade. The methods are compared using an AC optimal power flow based power exchange auction market model and a reference electric power system model
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
- …