365 research outputs found

    TextGAIL: Generative Adversarial Imitation Learning for Text Generation

    Full text link
    Generative Adversarial Networks (GANs) for text generation have recently received many criticisms, as they perform worse than their MLE counterparts. We suspect previous text GANs' inferior performance is due to the lack of a reliable guiding signal in their discriminators. To address this problem, we propose a generative adversarial imitation learning framework for text generation that uses large pre-trained language models to provide more reliable reward guidance. Our approach uses contrastive discriminator, and proximal policy optimization (PPO) to stabilize and improve text generation performance. For evaluation, we conduct experiments on a diverse set of unconditional and conditional text generation tasks. Experimental results show that TextGAIL achieves better performance in terms of both quality and diversity than the MLE baseline. We also validate our intuition that TextGAIL's discriminator demonstrates the capability of providing reasonable rewards with an additional task.Comment: AAAI 202

    Spatio-temporal Incentives Optimization for Ride-hailing Services with Offline Deep Reinforcement Learning

    Full text link
    A fundamental question in any peer-to-peer ride-sharing system is how to, both effectively and efficiently, meet the request of passengers to balance the supply and demand in real time. On the passenger side, traditional approaches focus on pricing strategies by increasing the probability of users' call to adjust the distribution of demand. However, previous methods do not take into account the impact of changes in strategy on future supply and demand changes, which means drivers are repositioned to different destinations due to passengers' calls, which will affect the driver's income for a period of time in the future. Motivated by this observation, we make an attempt to optimize the distribution of demand to handle this problem by learning the long-term spatio-temporal values as a guideline for pricing strategy. In this study, we propose an offline deep reinforcement learning based method focusing on the demand side to improve the utilization of transportation resources and customer satisfaction. We adopt a spatio-temporal learning method to learn the value of different time and location, then incentivize the ride requests of passengers to adjust the distribution of demand to balance the supply and demand in the system. In particular, we model the problem as a Markov Decision Process (MDP)

    A case study of the Lunger phenomenon based on multiple algorithms

    Full text link
    In this study, we conduct a thorough and meticulous examination of the Runge phenomenon. Initially, we engage in an extensive review of relevant literature, which aids in delineating the genesis and essence of the Runge phenomenon, along with an exploration of both conventional and contemporary algorithmic solutions. Subsequently, the paper delves into a diverse array of resolution methodologies, encompassing classical numerical approaches, regularization techniques, mock-Chebyshev interpolation, the TISI (Three-Interval Interpolation Strategy), external pseudo-constraint interpolation, and interpolation strategies predicated upon Singular Value Decomposition (SVD). For each method, we not only introduce but also innovate a novel algorithm to effectively address the phenomenon. This paper executes detailed numerical computations for each method, employing visualization techniques to vividly illustrate the efficacy of various strategies in mitigating the Runge phenomenon. Our findings reveal that although traditional methods exhibit commendable performance in certain instances, novel approaches such as mock-Chebyshev interpolation and regularization-centric methods demonstrate marked superiority in specific contexts. Moreover, the paper provides a critical analysis of these methodologies, specifically highlighting the constraints and potential avenues for enhancement in SVD decomposition-based interpolation strategies. In conclusion, we propose future research trajectories and underscore the imperative of further exploration into interpolation strategies, with an emphasis on their practical application validation. This article serves not only as a comprehensive resource on the Runge phenomenon for researchers but also offers pragmatic guidance for resolving real-world interpolation challenges.Comment: 13 Figures 9 Pages. After first submission, there was a revision of the authorship order, which was the result of joint discussion
    • …
    corecore