622 research outputs found

    PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

    Full text link
    Large language models (LLMs) have showcased remarkable potential across various tasks by conditioning on prompts. However, the quality of different human-written prompts leads to substantial discrepancies in LLMs' performance, and improving prompts usually necessitates considerable human effort and expertise. To this end, this paper proposes Prompt with Actor-Critic Editing (PACE) for LLMs to enable automatic prompt editing. Drawing inspiration from the actor-critic algorithm in reinforcement learning, PACE leverages LLMs as the dual roles of actors and critics, conceptualizing prompt as a type of policy. PACE refines prompt, taking into account the feedback from both actors performing prompt and critics criticizing response. This process helps LLMs better align prompt to a specific task, thanks to real responses and thinking from LLMs. We conduct extensive experiments on 24 instruction induction tasks and 21 big-bench tasks. Experimental results indicate that PACE elevates the relative performance of medium/low-quality human-written prompts by up to 98\%, which has comparable performance to high-quality human-written prompts. Moreover, PACE also exhibits notable efficacy for prompt generation

    CodeScore: Evaluating Code Generation by Learning Code Execution

    Full text link
    A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing CEMs can be categorized into match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) and execution-based CEMs (e.g., AvgPassRatio and Pass@k), but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, including collecting expensive test cases, resolving tedious execution dependencies, and enormous execution time. To address these issues, in this paper, we propose CodeScore, an efficient and effective CEM for code generation, which estimates test case PassRatio of generated code without executing code. We also present a framework named UniCE for training unified code evaluation models by learning code execution, i.e., learning PassRatio and Executability of generated code. In order to learn code execution comprehensively, we construct more than 100 test cases for each task in several popular benchmark datasets, covering MBPP, APPS, and HumanEval. Experimental results show that CodeScore has obtained a state-of-the-art correlation with execution-based CEMs. CodeScore is strongly correlated with AvgPassPatio, and binary CodeScore is moderately correlated with Pass@1. In particular, CodeScore eliminates the need for test cases and execution dependencies in inference, and CodeScore reduces execution time by three orders of magnitude compared to AvgPassPatio and Pass@1

    Self-planning Code Generation with Large Language Models

    Full text link
    Although large language models have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule the solution steps prior to implementation. Thus we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem solving. This paper proposes a self-planning code generation method with large language model, which consists of two phases, namely planning phase and implementation phase. Specifically, in the planning phase, the language model plans out the solution steps from the intent combined with in-context learning. Then it enters the implementation phase, where the model generates code step by step, guided by the solution steps. The effectiveness of self-planning code generation has been rigorously evaluated on multiple code generation datasets and the results have demonstrated a marked superiority over naive direct generation approaches with language model. The improvement in performance is substantial, highlighting the significance of self-planning in code generation tasks

    Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning

    Full text link
    Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation.Comment: Accepted by the 46th International Conference on Software Engineering (ICSE 2024

    Description and phylogenetic analysis of the complete mitochondrial genome in Eulaelaps silvestris provides new insights into the molecular classification of the family Haemogamasidae

    Get PDF
    In this study, the mitochondrial genome of Eulaelaps silvestris, which parasitizes Apodemus chevrieri, was sequenced and assembled to fill the gap in understanding the molecular evolution of the genus Eulaelaps. The E. silvestris mitochondrial genome is a double-stranded DNA molecule with a length of 14 882 bp, with a distinct AT preference for base composition and a notably higher AT content than GC content. The arrangement between genes is relatively compact, with a total of 10 gene intergenic regions and 12 gene overlap regions. All protein-coding genes had a typical ATN initiation codon, and only 2 protein-coding genes had an incomplete termination codon T. Out of the 13 protein-coding genes, the 5 most frequently used codons ended in A/U, with only 1 codon ending in G/C had an relative synonymous codon usage value >1. Except for trnS1 and trnS2, which lacked the D arm, all other tRNAs were able to form a typical cloverleaf structure; and there were a total of 38 mismatches in the folding process of tRNA genes. Unlike the gene arrangement order of the arthropod hypothetical ancestor, the E. silvestris mitochondrial genome underwent fewer rearrangements, mainly near tRNA genes and control regions. Both the maximum likelihood tree and the Bayesian tree showed that the family Haemogamasidae is most closely related to the family Dermanyssidae. The results not only provide a theoretical basis for studying the phylogenetic relationships of the genus Eulaelaps, but also provide molecular evidence that the family Haemogamasidae does not belong to the subfamily Laelapidae

    Applications of computer vision in measuring total cumulative pitch deviation of a gear

    Get PDF
    Kao osnovni dio mehaničkog prijenosa, zupčanici bi trebali imati visoku točnost prijenosa te se stoga mjerenje točnosti zupčanika smatra ključnom tehnologijom. Na temelju postojećeg istraživanja, u ispitivanje zupčanika uvodi se tehnologija računalnog vida. U skladu s definicijama tehničkih pokazatelja kao što su odstupanje koraka zupčanika, debljine zuba itd., analizom i istraživanjem se predlaže nova metoda mjerenja kako bi se upotpunilo određivanje modula, broja zubi i sveukupnog odstupanja koraka zupčanika. U predloženim metodama, pokazatelji kao što su korak i debljina zuba, mjere se radije dužinom luka nego tetivom luka, tako da budu u skladu s njihovim definicijama. Mjerene točke mogu se pozicionirati na referentnom opsegu zupčanika dok se ispituju odstupanja koraka i ukupna odstupanja zajedničkog koraka. Konačno, zupčanik se mjeri predloženom metodom. Glavni se test dobiva na sljedeći način. Odstupanje pojedinog koraka je 10,3 μm, ukupno zajedničko odstupanje je 44,8 μm, a odstupanje debljine zuba je 5,2 μm. Ispitivanje pokazuje da je predložena metoda izvediva i učinkovita i može zadovoljiti potrebe za točnošću u strojarstvu. Rad daje novo rješenje za mjerenje zupčanika on-line.As the basic part of the mechanical transmission, gears should have high transmission accuracy, so the gear’s precision measurement is a key technology. Based on the existing research, the computer vision technology is introduced in gear test. According to the definitions of technical indexes, such as pitch deviation, tooth thickness deviation, etc., through analysis and research, a new measurement method is proposed, so as to complete the determination of the modulus, tooth number and total accumulated pitch deviation of a gear. In the proposed methods, those indexes such as the pitch and tooth thickness, are measured in arc length, rather than in chordal length, so that they are consistent with their definitions. And the measured points can be positioned on the reference circle of a gear while the pitch and total accumulative pitch deviations are examined. Finally, a gear is measured with the proposed method. The main test is obtained as follows. The single pitch deviation is 10,3 μm, total cumulative pitch deviation is 44,8 μm, and tooth thick deviation is 5,2 μm. The experiment demonstrates that the proposed method is feasible and effective, and can meet the precision need of engineering practice. This study provides a new solution method for the gear’s on-line measurement
    corecore