63 research outputs found
RLTF: Reinforcement Learning from Unit Test Feedback
The goal of program synthesis, or code generation, is to generate executable
code based on given descriptions. Recently, there has been an increasing number
of studies employing reinforcement learning (RL) to improve the performance of
large language models (LLMs) for code. However, these RL methods have only used
offline frameworks, limiting their exploration of new sample spaces.
Additionally, current approaches that utilize unit test signals are rather
simple, not accounting for specific error locations within the code. To address
these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test
Feedback, a novel online RL framework with unit test feedback of
multi-granularity for refining code LLMs. Our approach generates data in
real-time during training and simultaneously utilizes fine-grained feedback
signals to guide the model towards producing higher-quality code. Extensive
experiments show that RLTF achieves state-of-the-art performance on the APPS
and the MBPP benchmarks. Our code can be found at:
https://github.com/Zyq-scut/RLTF
Interaction-Aware Decision-Making for Autonomous Vehicles in Forced Merging Scenario Leveraging Social Psychology Factors
Understanding the intention of vehicles in the surrounding traffic is crucial
for an autonomous vehicle to successfully accomplish its driving tasks in
complex traffic scenarios such as highway forced merging. In this paper, we
consider a behavioral model that incorporates both social behaviors and
personal objectives of the interacting drivers. Leveraging this model, we
develop a receding-horizon control-based decision-making strategy, that
estimates online the other drivers' intentions using Bayesian filtering and
incorporates predictions of nearby vehicles' behaviors under uncertain
intentions. The effectiveness of the proposed decision-making strategy is
demonstrated and evaluated based on simulation studies in comparison with a
game theoretic controller and a real-world traffic dataset
Grouped Knowledge Distillation for Deep Face Recognition
Compared with the feature-based distillation methods, logits distillation can
liberalize the requirements of consistent feature dimension between teacher and
student networks, while the performance is deemed inferior in face recognition.
One major challenge is that the light-weight student network has difficulty
fitting the target logits due to its low model capacity, which is attributed to
the significant number of identities in face recognition. Therefore, we seek to
probe the target logits to extract the primary knowledge related to face
identity, and discard the others, to make the distillation more achievable for
the student network. Specifically, there is a tail group with near-zero values
in the prediction, containing minor knowledge for distillation. To provide a
clear perspective of its impact, we first partition the logits into two groups,
i.e., Primary Group and Secondary Group, according to the cumulative
probability of the softened prediction. Then, we reorganize the Knowledge
Distillation (KD) loss of grouped logits into three parts, i.e., Primary-KD,
Secondary-KD, and Binary-KD. Primary-KD refers to distilling the primary
knowledge from the teacher, Secondary-KD aims to refine minor knowledge but
increases the difficulty of distillation, and Binary-KD ensures the consistency
of knowledge distribution between teacher and student. We experimentally found
that (1) Primary-KD and Binary-KD are indispensable for KD, and (2)
Secondary-KD is the culprit restricting KD at the bottleneck. Therefore, we
propose a Grouped Knowledge Distillation (GKD) that retains the Primary-KD and
Binary-KD but omits Secondary-KD in the ultimate KD loss calculation. Extensive
experimental results on popular face recognition benchmarks demonstrate the
superiority of proposed GKD over state-of-the-art methods.Comment: 9 pages, 2 figures, 7 tables, accepted by AAAI 202
Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration
We identify two crucial limitations in the evaluation of recent
parallel-integrated method Parallel Context Windows (PCW), which extends the
maximum context lengths of language models, e.g., 2048 for LLaMA, by harnessing
window-wise attention and positional embedding techniques. We first show that a
simple yet strong baseline, weighted sum ensemble, is missing for the
in-context few-shot classification. Moreover, on more challenging
Chain-of-Thought (CoT) reasoning (e.g., HotpotQA), PCW would present unexpected
deterioration regarding question miscomprehension and false inference. Based on
our findings, we suggest that the existing PCW design may not guarantee
sufficient improvement and practicality in handling lengthy documents in
real-world applications. More community efforts on enabling language models'
long context understanding ability should be paid
Convolutional Embedding for Edit Distance
Edit-distance-based string similarity search has many applications such as
spell correction, data de-duplication, and sequence alignment. However,
computing edit distance is known to have high complexity, which makes string
similarity search challenging for large datasets. In this paper, we propose a
deep learning pipeline (called CNN-ED) that embeds edit distance into Euclidean
distance for fast approximate similarity search. A convolutional neural network
(CNN) is used to generate fixed-length vector embeddings for a dataset of
strings and the loss function is a combination of the triplet loss and the
approximation error. To justify our choice of using CNN instead of other
structures (e.g., RNN) as the model, theoretical analysis is conducted to show
that some basic operations in our CNN model preserve edit distance.
Experimental results show that CNN-ED outperforms data-independent CGK
embedding and RNN-based GRU embedding in terms of both accuracy and efficiency
by a large margin. We also show that string similarity search can be
significantly accelerated using CNN-based embeddings, sometimes by orders of
magnitude.Comment: Accepted by the 43rd International ACM SIGIR Conference on Research
and Development in Information Retrieval, 202
Recommended from our members
Stock option, contract elements design and corporate innovation output – an analyse based on risk-taking and performance-based incentives
Purpose: With the accelerated technological advancement, innovation has become a critical factor, which affects the core competitiveness of a company. However, studies about the relationship between internal stock option mechanisms and innovation productivity remain limited. Therefore, this paper aims to examine the impact of stock options and their elements design on innovation output from an internal mechanism perspective.
Design/methodology/approach: Using a sample of 302 stock option incentive plans announced and implemented between 2006 and 2016, this study uses the propensity score matching and difference-in-difference model to find out whether the implementation of stock options improves the innovation outputs of enterprises.
Findings: Based on the statistical analysis, it is concluded that: stock options can stimulate corporate innovation; a stock option may drive innovation outputs through two ways, performance-based incentives and risk-taking incentives, with the latter one playing a more dominant role and the risk-taking incentives of stock options, could be optimised when the non-executives granting proportion is larger, the granting range is limited, the incentive period is longer, the exercisable proportion is increasing, the price-to-strike ratio is lower and relatively loose performance assessment criteria are applied.
Originality/value: The conclusion reached in the study may provide valuable information to listed firms in designing and implementing the stock option plans
Nucleocapsid mutations R203K/G204R increase the infectivity, fitness, and virulence of SARS-CoV-2
Previous work found that the co-occurring mutations R203K/G204R on the SARS-CoV-2 nucleocapsid (N) protein are increasing in frequency among emerging variants of concern or interest. Through a combination of in silico analyses, this study demonstrates that R203K/G204R are adaptive, while large-scale phylogenetic analyses indicate that R203K/G204R associate with the emergence of the high-transmissibility SARS-CoV-2 lineage B.1.1.7. Competition experiments suggest that the 203K/204R variants possess a replication advantage over the preceding R203/G204 variants, possibly related to ribonucleocapsid (RNP) assembly. Moreover, the 203K/204R virus shows increased infectivity in human lung cells and hamsters. Accordingly, we observe a positive association between increased COVID-19 severity and sample frequency of 203K/204R. Our work suggests that the 203K/204R mutations contribute to the increased transmission and virulence of select SARS-CoV-2 variants. In addition to mutations in the spike protein, mutations in the nucleocapsid protein are important for viral spreading during the pandemic
Crowdsourcing Detection of Sampling Biases in Image Datasets
Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowdsourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development
- …