6,935 research outputs found
Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods
The literature on Inverse Reinforcement Learning (IRL) typically assumes that
humans take actions in order to minimize the expected value of a cost function,
i.e., that humans are risk neutral. Yet, in practice, humans are often far from
being risk neutral. To fill this gap, the objective of this paper is to devise
a framework for risk-sensitive IRL in order to explicitly account for a human's
risk sensitivity. To this end, we propose a flexible class of models based on
coherent risk measures, which allow us to capture an entire spectrum of risk
preferences from risk-neutral to worst-case. We propose efficient
non-parametric algorithms based on linear programming and semi-parametric
algorithms based on maximum likelihood for inferring a human's underlying risk
measure and cost function for a rich class of static and dynamic
decision-making settings. The resulting approach is demonstrated on a simulated
driving game with ten human participants. Our method is able to infer and mimic
a wide range of qualitatively different driving styles from highly risk-averse
to risk-neutral in a data-efficient manner. Moreover, comparisons of the
Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL
framework more accurately captures observed participant behavior both
qualitatively and quantitatively, especially in scenarios where catastrophic
outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1:
(i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to
hold under weaker assumptions; (iii) Added additional figures and expanded
discussions to improve readabilit
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing
By leveraging the concept of mobile edge computing (MEC), massive amount of
data generated by a large number of Internet of Things (IoT) devices could be
offloaded to MEC server at the edge of wireless network for further
computational intensive processing. However, due to the resource constraint of
IoT devices and wireless network, both the communications and computation
resources need to be allocated and scheduled efficiently for better system
performance. In this paper, we propose a joint computation offloading and
multi-user scheduling algorithm for IoT edge computing system to minimize the
long-term average weighted sum of delay and power consumption under stochastic
traffic arrival. We formulate the dynamic optimization problem as an
infinite-horizon average-reward continuous-time Markov decision process (CTMDP)
model. One critical challenge in solving this MDP problem for the multi-user
resource control is the curse-of-dimensionality problem, where the state space
of the MDP model and the computation complexity increase exponentially with the
growing number of users or IoT devices. In order to overcome this challenge, we
use the deep reinforcement learning (RL) techniques and propose a neural
network architecture to approximate the value functions for the post-decision
system states. The designed algorithm to solve the CTMDP problem supports
semi-distributed auction-based implementation, where the IoT devices submit
bids to the BS to make the resource control decisions centrally. Simulation
results show that the proposed algorithm provides significant performance
improvement over the baseline algorithms, and also outperforms the RL
algorithms based on other neural network architectures
- …