139 research outputs found
TextGAIL: Generative Adversarial Imitation Learning for Text Generation
Generative Adversarial Networks (GANs) for text generation have recently
received many criticisms, as they perform worse than their MLE counterparts. We
suspect previous text GANs' inferior performance is due to the lack of a
reliable guiding signal in their discriminators. To address this problem, we
propose a generative adversarial imitation learning framework for text
generation that uses large pre-trained language models to provide more reliable
reward guidance. Our approach uses contrastive discriminator, and proximal
policy optimization (PPO) to stabilize and improve text generation performance.
For evaluation, we conduct experiments on a diverse set of unconditional and
conditional text generation tasks. Experimental results show that TextGAIL
achieves better performance in terms of both quality and diversity than the MLE
baseline. We also validate our intuition that TextGAIL's discriminator
demonstrates the capability of providing reasonable rewards with an additional
task.Comment: AAAI 202
Stateful Memory-Augmented Transformers for Dialogue Modeling
Transformer encoder-decoder models have shown impressive performance in
dialogue modeling. However, as Transformers are inefficient in processing long
sequences, dialogue history length often needs to be truncated. To address this
problem, we propose a new memory-augmented Transformer that is compatible with
existing pre-trained encoder-decoder models and enables efficient preservation
of history information. It incorporates a separate memory module alongside the
pre-trained Transformer to effectively interchange information between the
memory states and the current input context. We evaluate our model on three
dialogue datasets and two language modeling datasets. Experimental results show
that our method has achieved superior efficiency and performance compared to
other pre-trained Transformer baselines
Energy-Efficient NOMA Enabled Heterogeneous Cloud Radio Access Networks
Heterogeneous cloud radio access networks (H-CRANs) are envisioned to be
promising in the fifth generation (5G) wireless networks. H-CRANs enable users
to enjoy diverse services with high energy efficiency, high spectral
efficiency, and low-cost operation, which are achieved by using cloud computing
and virtualization techniques. However, H-CRANs face many technical challenges
due to massive user connectivity, increasingly severe spectrum scarcity and
energy-constrained devices. These challenges may significantly decrease the
quality of service of users if not properly tackled. Non-orthogonal multiple
access (NOMA) schemes exploit non-orthogonal resources to provide services for
multiple users and are receiving increasing attention for their potential of
improving spectral and energy efficiency in 5G networks. In this article a
framework for energy-efficient NOMA H-CRANs is presented. The enabling
technologies for NOMA H-CRANs are surveyed. Challenges to implement these
technologies and open issues are discussed. This article also presents the
performance evaluation on energy efficiency of H-CRANs with NOMA.Comment: This work has been accepted by IEEE Network. Pages 18, Figure
Spatio-temporal Incentives Optimization for Ride-hailing Services with Offline Deep Reinforcement Learning
A fundamental question in any peer-to-peer ride-sharing system is how to,
both effectively and efficiently, meet the request of passengers to balance the
supply and demand in real time. On the passenger side, traditional approaches
focus on pricing strategies by increasing the probability of users' call to
adjust the distribution of demand. However, previous methods do not take into
account the impact of changes in strategy on future supply and demand changes,
which means drivers are repositioned to different destinations due to
passengers' calls, which will affect the driver's income for a period of time
in the future. Motivated by this observation, we make an attempt to optimize
the distribution of demand to handle this problem by learning the long-term
spatio-temporal values as a guideline for pricing strategy. In this study, we
propose an offline deep reinforcement learning based method focusing on the
demand side to improve the utilization of transportation resources and customer
satisfaction. We adopt a spatio-temporal learning method to learn the value of
different time and location, then incentivize the ride requests of passengers
to adjust the distribution of demand to balance the supply and demand in the
system. In particular, we model the problem as a Markov Decision Process (MDP)
- …