32 research outputs found
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their
potential to generate code from natural language descriptions across a wide
range of programming tasks. Several benchmarks have recently emerged to
evaluate the ability of LLMs to generate functionally correct code from natural
language intent with respect to a set of hidden test cases. This has enabled
the research community to identify significant and reproducible advancements in
LLM capabilities. However, there is currently a lack of benchmark datasets for
assessing the ability of LLMs to generate functionally correct code edits based
on natural language descriptions of intended changes. This paper aims to
address this gap by motivating the problem NL2Fix of translating natural
language descriptions of code changes (namely bug fixes described in Issue
reports in repositories) into correct code fixes. To this end, we introduce
Defects4J-NL2Fix, a dataset of 283 Java programs from the popular Defects4J
dataset augmented with high-level descriptions of bug fixes, and empirically
evaluate the performance of several state-of-the-art LLMs for the this task.
Results show that these LLMS together are capable of generating plausible fixes
for 64.6% of the bugs, and the best LLM-based technique can achieve up to
21.20% top-1 and 35.68% top-5 accuracy on this benchmark
Improving Polk County Service Integration Team\u27s Resource Sharing
Background: Polk County Service Integration (SI) collaborates with community partners to provide resources/information for individuals and families within the community. This collaboration includes a monthly newsletter to promote community resources, services, and events. Aim: The aim was to create a standardized submission tool for newsletter contributors to use to improve communication and promote resource utilization by community members. Methodology: This process improvement was structured using the Plan Do Study Act (PDSA) model. The PDSA model allowed for reassessment of project needs, and multiple cycles were done to develop a comprehensive evaluation and recommendation for the SI newsletter process. One assessment completed was a survey of SI partners.Results: The focus of survey data was surrounding the partner\u27s participation in submitting information to the SI newsletter. It revealed an overarching theme that partners do not feel they have relevant information to contribute. This thought represented the majority of respondents with a percentage of 68.3%. Discussion: Based on the results, we recommend implementation of the standardized submission tool. Through evaluation of results, it was found that users had difficulty with the submission process as a whole. With addition of the submission tool, these problems will be mitigated via guided questioning that will spark contribution ideas from the partners. To evaluate the continued effectiveness of the submission tool, participation of partners will be monitored. Implications: Implementation of the submission tool will begin January 2021. The implications of this are to ease the submission process for the SI coordinator and improve utilization of resources
Program Merge Conflict Resolution via Neural Transformers
Collaborative software development is an integral part of the modern software
development life cycle, essential to the success of large-scale software
projects. When multiple developers make concurrent changes around the same
lines of code, a merge conflict may occur. Such conflicts stall pull requests
and continuous integration pipelines for hours to several days, seriously
hurting developer productivity. To address this problem, we introduce
MergeBERT, a novel neural program merge framework based on token-level
three-way differencing and a transformer encoder model. By exploiting the
restricted nature of merge conflict resolutions, we reformulate the task of
generating the resolution sequence as a classification task over a set of
primitive merge patterns extracted from real-world merge commit data. Our model
achieves 63-68% accuracy for merge resolution synthesis, yielding nearly a 3x
performance improvement over existing semi-structured, and 2x improvement over
neural program merge tools. Finally, we demonstrate that MergeBERT is
sufficiently flexible to work with source code files in Java, JavaScript,
TypeScript, and C# programming languages. To measure the practical use of
MergeBERT, we conduct a user study to evaluate MergeBERT suggestions with 25
developers from large OSS projects on 122 real-world conflicts they
encountered. Results suggest that in practice, MergeBERT resolutions would be
accepted at a higher rate than estimated by automatic metrics for precision and
accuracy. Additionally, we use participant feedback to identify future avenues
for improvement of MergeBERT.Comment: ESEC/FSE '22 camera ready version. 12 pages, 4 figures, online
appendi
Ranking LLM-Generated Loop Invariants for Program Verification
Synthesizing inductive loop invariants is fundamental to automating program
verification. In this work, we observe that Large Language Models (such as
gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of
programs in a 0-shot setting, yet require several samples to generate the
correct invariants. This can lead to a large number of calls to a program
verifier to establish an invariant. To address this issue, we propose a {\it
re-ranking} approach for the generated results of LLMs. We have designed a
ranker that can distinguish between correct inductive invariants and incorrect
attempts based on the problem definition. The ranker is optimized as a
contrastive ranker. Experimental results demonstrate that this re-ranking
mechanism significantly improves the ranking of correct invariants among the
generated candidates, leading to a notable reduction in the number of calls to
a verifier.Comment: Findings of The 2023 Conference on Empirical Methods in Natural
Language Processing (EMNLP-findings 2023
Combined effect of age and body mass index on postoperative mortality and morbidity in laparoscopic cholecystectomy patients
BackgroundPrevious studies have assessed the impact of age and body mass index (BMI) on surgery outcomes separately. This retrospective cohort study aimed to investigate the combined effect of age and BMI on postoperative mortality and morbidity in patients undergoing laparoscopic cholecystectomy.MethodsData from the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) database for laparoscopic cholecystectomy patients between 2008 and 2020 were analyzed. Patient demographics, functional status, admission sources, preoperative risk factors, laboratory data, perioperative variables, and 30-day postoperative outcomes were included in the dataset. Logistic regression was used to determine the association of age, BMI, and age/BMI with mortality and morbidity. Patients were stratified into different subcategories based on their age and BMI, and the age/BMI score was calculated. The chi-square test, independent sample t-test, and ANOVA were used as appropriate for each category.ResultsThe study included 435,052 laparoscopic cholecystectomy patients. Logistic regression analysis revealed that a higher age/BMI score was associated with an increased risk of mortality (adj OR 13.13 95% CI, 9.19–18.77, p < 0.0001) and composite morbidity (adj OR 2.57, 95% CI 2.23–2.95, p < 0.0001).ConclusionOlder age, especially accompanied by a low BMI, appears to increase the post-operative mortality and morbidity risks in laparoscopic cholecystectomy patients, while paradoxically, a higher BMI seems to be protective. Our hypothesis is that a lower BMI, perhaps secondary to malnutrition, can carry a greater risk of surgery complications for the elderly. Age/BMI is strongly and positively associated with mortality and morbidity and could be used as a new scoring system for predicting outcomes in patients undergoing surgery. Nevertheless, laparoscopic cholecystectomy remains a very safe procedure with relatively low complication rates
Recommended from our members
Models, Metrics, and Minds Empirical Perspectives on Developer Productivity
Programming tasks require inherent cognitive load, but the design of the tools and languages a programmer uses to complete their task can either increase mental burden, or optimize for it. To build software that better supports all those that interact with it, we must develop the necessary processes and frameworks to understand the impact that software has on its users and to account for it when designing the next generation of languages and tools. Understanding the complexities of comprehension processes when developing software requires diverse research strategies that bring together fundamentals of human cognition, from domains like Psychology and Cognitive Neuroscience, to empirical methods used in Software Engineering research.In this dissertation we contribute novel methods and perspectives to the domain of program comprehension and software developer productivity including: 1) a novel perspective and tools for studying cognitive processes during computing activities, 2) a better understanding of how software quality factors impact mental effort and productivity during bug localization, and 3) opportunities to improve metrics that serve as proxies for user evaluation of models for code summarization, readability, and merge tasks. We discuss how user-centered development and evaluation processes can help to develop theories that better inform and align tools designed to improve developer productivity in practice
Association between angiotensin-converting enzyme insertion/deletion gene polymorphism and end-stage renal disease in lebanese patients with diabetic nephropathy
Diabetic nephropathy (DN) is one of the leading causes of end-stage renal disease (ESRD). The development and progression of nephropathy is strongly determined by genetic factors, and few genes have been shown to contribute to DN. An insertion/deletion (I/D) polymorphism of the gene encoding angiotensin-converting enzyme (ACE) was reported as a candidate gene predisposing to DN and ESRD. Accordingly, we investigated the frequency of ACE I/D polymorphism in 50 patients with DN, of whom 33 had ESRD and compared them with 64 patients with type 2 diabetes mellitus (T2DM) but with normal renal function. Polymerase chain reaction amplification, using specific primers, was performed to genotype ACE I/D. Chi-square test was used to assess the differences between the groups. The frequencies of the ACE genotypes were as follows: 48% D/D, 40% I/D, and 12% I/I in patients with DN in contrast to 32.8% D/D, 45.3% I/D, and 21.9% I/I in T2DM. The distribution of the D/D, D/I, and I/I genotypes did not significantly differ between T2DM and DN. However, having the D allele carried a risk for the development of DN [odds ratio (OD), 1.71, P = 0.054]. On the other hand, the distribution of the D/D, D/I, and I/I genotypes was significantly different between T2DM and ESRD patients, χ2 = 7.23, P = 0.027. This was reflected by the D allele which carried a risk for the development of ESRD (OR, 2.51, P = 0.0057). These findings suggest that the D allele may be considered as a risk factor for both the development of DN and the progression of DN to ESRD in Lebanese population with T2DM
Recommended from our members
The Effect of Poor Source Code Lexicon and Readability on Developers' Cognitive Load
It has been well documented that a large portion of the cost of any software lies in the time spent by developers in understanding a program's source code before any changes can be undertaken. One of the main contributors to software comprehension, by subsequent developers or by the authors themselves, has to do with the quality of the lexicon, (i.e., the identifiers and comments) that is used by developers to embed domain concepts and to communicate with their teammates. In fact, previous research shows that there is a positive correlation between the quality of identifiers and the quality of a software project. Results suggest that poor quality lexicon impairs program comprehension and consequently increases the effort that developers must spend to maintain the software. However, we do not yet know or have any empirical evidence, of the relationship between the quality of the lexicon and the cognitive load that developers experience when trying to understand a piece of software. Given the associated costs, there is a critical need to empirically characterize the impact of the quality of the lexicon on developers' ability to comprehend a program. In this study, we explore the effect of poor source code lexicon and readability on developers' cognitive load as measured by a cutting-edge and minimally invasive functional brain imaging technique called functional Near Infrared Spectroscopy (fNIRS). Additionally, while developers perform software comprehension tasks, we map cognitive load data to source code identifiers using an eye tracking device. Our results show that the presence of linguistic antipatterns in source code significantly increases the developers' cognitive load