Search CORE

63 research outputs found

RLTF: Reinforcement Learning from Unit Test Feedback

Author: Fu Qiang
Han Xiao
Liu Jiate
Xiao Kaiwen
Yang Wei
Ye Deheng
Zhu Yiqin
Publication venue
Publication date: 10/07/2023
Field of study

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, these RL methods have only used offline frameworks, limiting their exploration of new sample spaces. Additionally, current approaches that utilize unit test signals are rather simple, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code can be found at: https://github.com/Zyq-scut/RLTF

arXiv.org e-Print Archive

Interaction-Aware Decision-Making for Autonomous Vehicles in Forced Merging Scenario Leveraging Social Psychology Factors

Author: Girard Anouck
Kolmanovsky Ilya
Li Xiao
Liu Kaiwen
Tseng H. Eric
Publication venue
Publication date: 25/09/2023
Field of study

Understanding the intention of vehicles in the surrounding traffic is crucial for an autonomous vehicle to successfully accomplish its driving tasks in complex traffic scenarios such as highway forced merging. In this paper, we consider a behavioral model that incorporates both social behaviors and personal objectives of the interacting drivers. Leveraging this model, we develop a receding-horizon control-based decision-making strategy, that estimates online the other drivers' intentions using Bayesian filtering and incorporates predictions of nearby vehicles' behaviors under uncertain intentions. The effectiveness of the proposed decision-making strategy is demonstrated and evaluated based on simulation studies in comparison with a game theoretic controller and a real-world traffic dataset

arXiv.org e-Print Archive

Grouped Knowledge Distillation for Deep Face Recognition

Author: Guo Kaiwen
Lei Zhen
Zhang Xiao-Yu
Zhao Weisong
Zhu Xiangyu
Publication venue
Publication date: 10/04/2023
Field of study

Compared with the feature-based distillation methods, logits distillation can liberalize the requirements of consistent feature dimension between teacher and student networks, while the performance is deemed inferior in face recognition. One major challenge is that the light-weight student network has difficulty fitting the target logits due to its low model capacity, which is attributed to the significant number of identities in face recognition. Therefore, we seek to probe the target logits to extract the primary knowledge related to face identity, and discard the others, to make the distillation more achievable for the student network. Specifically, there is a tail group with near-zero values in the prediction, containing minor knowledge for distillation. To provide a clear perspective of its impact, we first partition the logits into two groups, i.e., Primary Group and Secondary Group, according to the cumulative probability of the softened prediction. Then, we reorganize the Knowledge Distillation (KD) loss of grouped logits into three parts, i.e., Primary-KD, Secondary-KD, and Binary-KD. Primary-KD refers to distilling the primary knowledge from the teacher, Secondary-KD aims to refine minor knowledge but increases the difficulty of distillation, and Binary-KD ensures the consistency of knowledge distribution between teacher and student. We experimentally found that (1) Primary-KD and Binary-KD are indispensable for KD, and (2) Secondary-KD is the culprit restricting KD at the bottleneck. Therefore, we propose a Grouped Knowledge Distillation (GKD) that retains the Primary-KD and Binary-KD but omits Secondary-KD in the ultimate KD loss calculation. Extensive experimental results on popular face recognition benchmarks demonstrate the superiority of proposed GKD over state-of-the-art methods.Comment: 9 pages, 2 figures, 7 tables, accepted by AAAI 202

arXiv.org e-Print Archive

Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration

Author: Dong Yuxiao
Liu Xiao
Men Kaiwen
Tang Jie
Yang Kejuan
Zeng Aohan
Publication venue
Publication date: 24/05/2023
Field of study

We identify two crucial limitations in the evaluation of recent parallel-integrated method Parallel Context Windows (PCW), which extends the maximum context lengths of language models, e.g., 2048 for LLaMA, by harnessing window-wise attention and positional embedding techniques. We first show that a simple yet strong baseline, weighted sum ensemble, is missing for the in-context few-shot classification. Moreover, on more challenging Chain-of-Thought (CoT) reasoning (e.g., HotpotQA), PCW would present unexpected deterioration regarding question miscomprehension and false inference. Based on our findings, we suggest that the existing PCW design may not guarantee sufficient improvement and practicality in handling lengthy documents in real-world applications. More community efforts on enabling language models' long context understanding ability should be paid

arXiv.org e-Print Archive

Convolutional Embedding for Edit Distance

Author: Cheng James
Dai Xinyan
Wang Yuxuan
Yan Xiao
Yang Han
Zhou Kaiwen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2020
Field of study

Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity search challenging for large datasets. In this paper, we propose a deep learning pipeline (called CNN-ED) that embeds edit distance into Euclidean distance for fast approximate similarity search. A convolutional neural network (CNN) is used to generate fixed-length vector embeddings for a dataset of strings and the loss function is a combination of the triplet loss and the approximation error. To justify our choice of using CNN instead of other structures (e.g., RNN) as the model, theoretical analysis is conducted to show that some basic operations in our CNN model preserve edit distance. Experimental results show that CNN-ED outperforms data-independent CGK embedding and RNN-based GRU embedding in terms of both accuracy and efficiency by a large margin. We also show that string similarity search can be significantly accelerated using CNN-based embeddings, sometimes by orders of magnitude.Comment: Accepted by the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 202

arXiv.org e-Print Archive

Crossref

Recommended from our members

Stock option, contract elements design and corporate innovation output – an analyse based on risk-taking and performance-based incentives

Author: Chang Kaiwen
Shi Qi
Wu Jiaying
Xiao Shufang
Publication venue
Publication date: 01/11/2021
Field of study

Purpose: With the accelerated technological advancement, innovation has become a critical factor, which affects the core competitiveness of a company. However, studies about the relationship between internal stock option mechanisms and innovation productivity remain limited. Therefore, this paper aims to examine the impact of stock options and their elements design on innovation output from an internal mechanism perspective. Design/methodology/approach: Using a sample of 302 stock option incentive plans announced and implemented between 2006 and 2016, this study uses the propensity score matching and difference-in-difference model to find out whether the implementation of stock options improves the innovation outputs of enterprises. Findings: Based on the statistical analysis, it is concluded that: stock options can stimulate corporate innovation; a stock option may drive innovation outputs through two ways, performance-based incentives and risk-taking incentives, with the latter one playing a more dominant role and the risk-taking incentives of stock options, could be optimised when the non-executives granting proportion is larger, the granting range is limited, the incentive period is longer, the exercisable proportion is increasing, the price-to-strike ratio is lower and relatively loose performance assessment criteria are applied. Originality/value: The conclusion reached in the study may provide valuable information to listed firms in designing and implementing the stock option plans

Central Archive at the University of Reading

Nucleocapsid mutations R203K/G204R increase the infectivity, fitness, and virulence of SARS-CoV-2

Author: Dong Pan
Fu Beibei
Liu Gexin
Luo Haitao
Meng Kaiwen
Tang Wanyan
Wu Haibo
Xiao Yang
Xing Na
Xue Weiwei
Publication venue
Publication date: 01/01/2021
Field of study

Previous work found that the co-occurring mutations R203K/G204R on the SARS-CoV-2 nucleocapsid (N) protein are increasing in frequency among emerging variants of concern or interest. Through a combination of in silico analyses, this study demonstrates that R203K/G204R are adaptive, while large-scale phylogenetic analyses indicate that R203K/G204R associate with the emergence of the high-transmissibility SARS-CoV-2 lineage B.1.1.7. Competition experiments suggest that the 203K/204R variants possess a replication advantage over the preceding R203/G204 variants, possibly related to ribonucleocapsid (RNP) assembly. Moreover, the 203K/204R virus shows increased infectivity in human lung cells and hamsters. Accordingly, we observe a positive association between increased COVID-19 severity and sample frequency of 203K/204R. Our work suggests that the 203K/204R mutations contribute to the increased transmission and virulence of select SARS-CoV-2 variants. In addition to mutations in the spike protein, mutations in the nucleocapsid protein are important for viral spreading during the pandemic

Institutional Repository of the Freie Universität Berlin

Crowdsourcing Detection of Sampling Biases in Image Datasets

Author: Chen Shuo-Han
Dube Somesh
Hu Xiao
Kao Gore
Lu Yung-Hsiang
Thiruvathukal George K.
Vegesana Anirudh
Wang Haobo
Yin Ming
Yu Kaiwen
Publication venue: Loyola eCommons
Publication date: 01/04/2020
Field of study

Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowdsourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development

Crossref

Loyola eCommons