80 research outputs found
Adversarial Imitation Learning from Incomplete Demonstrations
Imitation learning targets deriving a mapping from states to actions, a.k.a.
policy, from expert demonstrations. Existing methods for imitation learning
typically require any actions in the demonstrations to be fully available,
which is hard to ensure in real applications. Though algorithms for learning
with unobservable actions have been proposed, they focus solely on state
information and overlook the fact that the action sequence could still be
partially available and provide useful information for policy deriving. In this
paper, we propose a novel algorithm called Action-Guided Adversarial Imitation
Learning (AGAIL) that learns a policy from demonstrations with incomplete
action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to
separate demonstrations into state and action trajectories, and train a policy
with state trajectories while using actions as auxiliary information to guide
the training whenever applicable. Built upon the Generative Adversarial
Imitation Learning, AGAIL has three components: a generator, a discriminator,
and a guide. The generator learns a policy with rewards provided by the
discriminator, which tries to distinguish state distributions between
demonstrations and samples generated by the policy. The guide provides
additional rewards to the generator when demonstrated actions for specific
states are available. We compare AGAIL to other methods on benchmark tasks and
show that AGAIL consistently delivers comparable performance to the
state-of-the-art methods even when the action sequence in demonstrations is
only partially available.Comment: Accepted to International Joint Conference on Artificial Intelligence
(IJCAI-19
Optimal Online Transmission Policy for Energy-Constrained Wireless-Powered Communication Networks
This work considers the design of online transmission policy in a
wireless-powered communication system with a given energy budget. The system
design objective is to maximize the long-term throughput of the system
exploiting the energy storage capability at the wireless-powered node. We
formulate the design problem as a constrained Markov decision process (CMDP)
problem and obtain the optimal policy of transmit power and time allocation in
each fading block via the Lagrangian approach. To investigate the system
performance in different scenarios, numerical simulations are conducted with
various system parameters. Our simulation results show that the optimal policy
significantly outperforms a myopic policy which only maximizes the throughput
in the current fading block. Moreover, the optimal allocation of transmit power
and time is shown to be insensitive to the change of modulation and coding
schemes, which facilitates its practical implementation.Comment: 7 pages, accepted by ICC 2019. An extended version of this paper is
accepted by IEEE TW
- …