Search CORE

186 research outputs found

TextGAIL: Generative Adversarial Imitation Learning for Text Generation

Author: Li Lei
Wu Qingyang
Yu Zhou
Publication venue
Publication date: 16/02/2021
Field of study

Generative Adversarial Networks (GANs) for text generation have recently received many criticisms, as they perform worse than their MLE counterparts. We suspect previous text GANs' inferior performance is due to the lack of a reliable guiding signal in their discriminators. To address this problem, we propose a generative adversarial imitation learning framework for text generation that uses large pre-trained language models to provide more reliable reward guidance. Our approach uses contrastive discriminator, and proximal policy optimization (PPO) to stabilize and improve text generation performance. For evaluation, we conduct experiments on a diverse set of unconditional and conditional text generation tasks. Experimental results show that TextGAIL achieves better performance in terms of both quality and diversity than the MLE baseline. We also validate our intuition that TextGAIL's discriminator demonstrates the capability of providing reasonable rewards with an additional task.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Stateful Memory-Augmented Transformers for Dialogue Modeling

Author: Wu Qingyang
Yu Zhou
Publication venue
Publication date: 15/09/2022
Field of study

Transformer encoder-decoder models have shown impressive performance in dialogue modeling. However, as Transformers are inefficient in processing long sequences, dialogue history length often needs to be truncated. To address this problem, we propose a new memory-augmented Transformer that is compatible with existing pre-trained encoder-decoder models and enables efficient preservation of history information. It incorporates a separate memory module alongside the pre-trained Transformer to effectively interchange information between the memory states and the current input context. We evaluate our model on three dialogue datasets and two language modeling datasets. Experimental results show that our method has achieved superior efficiency and performance compared to other pre-trained Transformer baselines

arXiv.org e-Print Archive

Energy-Efficient NOMA Enabled Heterogeneous Cloud Radio Access Networks

Author: Hu Rose Qingyang
Wang Yuhao
Wong Kai-Kit
Wu Yongpeng
Zhou Fuhui
Publication venue
Publication date: 06/01/2018
Field of study

Heterogeneous cloud radio access networks (H-CRANs) are envisioned to be promising in the fifth generation (5G) wireless networks. H-CRANs enable users to enjoy diverse services with high energy efficiency, high spectral efficiency, and low-cost operation, which are achieved by using cloud computing and virtualization techniques. However, H-CRANs face many technical challenges due to massive user connectivity, increasingly severe spectrum scarcity and energy-constrained devices. These challenges may significantly decrease the quality of service of users if not properly tackled. Non-orthogonal multiple access (NOMA) schemes exploit non-orthogonal resources to provide services for multiple users and are receiving increasing attention for their potential of improving spectral and energy efficiency in 5G networks. In this article a framework for energy-efficient NOMA H-CRANs is presented. The enabling technologies for NOMA H-CRANs are surveyed. Challenges to implement these technologies and open issues are discussed. This article also presents the performance evaluation on energy efficiency of H-CRANs with NOMA.Comment: This work has been accepted by IEEE Network. Pages 18, Figure

arXiv.org e-Print Archive

UCL Discovery

Spatio-temporal Incentives Optimization for Ride-hailing Services with Offline Deep Reinforcement Learning

Author: Li Qingyang
Qin Zhiwei
Wu Yanqiu
Publication venue
Publication date: 06/11/2022
Field of study

A fundamental question in any peer-to-peer ride-sharing system is how to, both effectively and efficiently, meet the request of passengers to balance the supply and demand in real time. On the passenger side, traditional approaches focus on pricing strategies by increasing the probability of users' call to adjust the distribution of demand. However, previous methods do not take into account the impact of changes in strategy on future supply and demand changes, which means drivers are repositioned to different destinations due to passengers' calls, which will affect the driver's income for a period of time in the future. Motivated by this observation, we make an attempt to optimize the distribution of demand to handle this problem by learning the long-term spatio-temporal values as a guideline for pricing strategy. In this study, we propose an offline deep reinforcement learning based method focusing on the demand side to improve the utilization of transportation resources and customer satisfaction. We adopt a spatio-temporal learning method to learn the value of different time and location, then incentivize the ride requests of passengers to adjust the distribution of demand to balance the supply and demand in the system. In particular, we model the problem as a Markov Decision Process (MDP)

arXiv.org e-Print Archive

Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

Author: Gu Jing
Wu Qingyang
Yu Zhou
Publication venue
Publication date: 18/08/2020
Field of study

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Towards Better Language Models: Algorithms, Architectures, and Applications

Author: Wu Qingyang
Publication venue
Publication date: 01/01/2024
Field of study

This thesis explores the advancement of language models by focusing on three important perspectives: Algorithms, Architectures, and Applications. We aim to improve the performance, efficiency, and practical usage of these language models. Specifically, we studied reinforcement learning for language models, recurrent memory-augmented transformers, and practical applications in text generation and dialogue systems. Firstly, we address the limitations of the traditional training algorithm, maximum likelihood estimation (MLE). We propose TextGAIL, a generative adversarial imitation learning framework that combines large pre-trained language models with adversarial training to improve the quality and diversity of generated text. We further explore a modern reinforcement learning from human feedback (RLHF) pipeline to more effectively align language model outputs with human preferences. Next, we investigate architecture improvements with Recurrent Memory-Augmented Transformers. In this direction, we first introduce Memformer, an autoregressive model that utilizes an external dynamic memory for efficient long-sequence processing. We build upon Memformer and propose MemBART, a stateful memory-augmented Transformer encoder-decoder model. Recurrent Memory-Augmented Transformers demonstrate superior performance and efficiency in handling long contexts compared to traditional Transformer architectures. Finally, we make several contributions to effectively applying language models to dialogue systems in practice. We design task-oriented dialogue systems that leverage pre-trained language models to significantly reduce the need for human annotations. We also introduce DiactTOD, a novel approach to improving the out-of-distribution generalization ability of dialogue act-controlled generation in task-oriented systems. In this thesis, we also make progress by expanding the scope of traditional task-oriented dialogue systems by proposing a novel paradigm that utilizes external knowledge tools to provide more accurate knowledge. Our penultimate application tackles the data-scarcity problem common in many real-world dialogue systems. We propose an automatic data augmentation technique to improve training efficacy. Lastly, we make progress on end-user experiences by presenting FaceChat, a multimodal dialogue framework enabling emotionally-sensitive, face-to-face interactions, demonstrating the potential of multimodal language models in various applications. Our work highlights the significance of building better language models, demonstrating how these improvements can positively impact a wide range of downstream tasks and applications. Our work makes a meaningful contribution to language model research, providing valuable insights and methodologies for developing more powerful and efficient models

Columbia University Academic Commons

DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems

Author: Gung James
Shu Raphael
Wu Qingyang
Zhang Yi
Publication venue
Publication date: 01/08/2023
Field of study

Dialogue act annotations are important to improve response generation quality in task-oriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit annotations, they may lack interpretability or face difficulties defining task-specific rewards. In this work, we present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space. DiactTOD, when pre-trained on a large corpus, is able to predict and control dialogue acts to generate controllable responses using these latent representations in a zero-shot fashion. Our approach demonstrates state-of-the-art performance across a wide range of experimental settings on the MultiWOZ dataset, including zero-shot, few-shot, and full data fine-tuning with both end-to-end and policy optimization configurations.Comment: SIGDial 202

arXiv.org e-Print Archive