270 research outputs found
From Query to Usable Code: An Analysis of Stack Overflow Code Snippets
Enriched by natural language texts, Stack Overflow code snippets are an
invaluable code-centric knowledge base of small units of source code. Besides
being useful for software developers, these annotated snippets can potentially
serve as the basis for automated tools that provide working code solutions to
specific natural language queries.
With the goal of developing automated tools with the Stack Overflow snippets
and surrounding text, this paper investigates the following questions: (1) How
usable are the Stack Overflow code snippets? and (2) When using text search
engines for matching on the natural language questions and answers around the
snippets, what percentage of the top results contain usable code snippets?
A total of 3M code snippets are analyzed across four languages: C\#, Java,
JavaScript, and Python. Python and JavaScript proved to be the languages for
which the most code snippets are usable. Conversely, Java and C\# proved to be
the languages with the lowest usability rate. Further qualitative analysis on
usable Python snippets shows the characteristics of the answers that solve the
original question. Finally, we use Google search to investigate the alignment
of usability and the natural language annotations around code snippets, and
explore how to make snippets in Stack Overflow an adequate base for future
automatic program generation.Comment: 13th IEEE/ACM International Conference on Mining Software
Repositories, 11 page
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
Are We Ready to Embrace Generative AI for Software Q&A?
Stack Overflow, the world's largest software Q&A (SQA) website, is facing a
significant traffic drop due to the emergence of generative AI techniques.
ChatGPT is banned by Stack Overflow after only 6 days from its release. The
main reason provided by the official Stack Overflow is that the answers
generated by ChatGPT are of low quality. To verify this, we conduct a
comparative evaluation of human-written and ChatGPT-generated answers. Our
methodology employs both automatic comparison and a manual study. Our results
suggest that human-written and ChatGPT-generated answers are semantically
similar, however, human-written answers outperform ChatGPT-generated ones
consistently across multiple aspects, specifically by 10% on the overall score.
We release the data, analysis scripts, and detailed results at
https://anonymous.4open.science/r/GAI4SQA-FD5C.Comment: Accepted by the New Ideas and Emerging Results (NIER) track at The
IEEE/ACM Automated Software Engineering (ASE) Conferenc
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Patching is a common activity in software development. It is generally
performed on a source code base to address bugs or add new functionalities. In
this context, given the recurrence of bugs across projects, the associated
similar patches can be leveraged to extract generic fix actions. While the
literature includes various approaches leveraging similarity among patches to
guide program repair, these approaches often do not yield fix patterns that are
tractable and reusable as actionable input to APR systems. In this paper, we
propose a systematic and automated approach to mining relevant and actionable
fix patterns based on an iterative clustering strategy applied to atomic
changes within patches. The goal of FixMiner is thus to infer separate and
reusable fix patterns that can be leveraged in other patch generation systems.
Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree
structure of the edit scripts that captures the AST-level context of the code
changes. FixMiner uses different tree representations of Rich Edit Scripts for
each round of clustering to identify similar changes. These are abstract syntax
trees, edit actions trees, and code context trees. We have evaluated FixMiner
on thousands of software patches collected from open source projects.
Preliminary results show that we are able to mine accurate patterns,
efficiently exploiting change information in Rich Edit Scripts. We further
integrated the mined patterns to an automated program repair prototype,
PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J
benchmark. Beyond this quantitative performance, we show that the mined fix
patterns are sufficiently relevant to produce patches with a high probability
of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models
Software development is an inherently collaborative process, where various
stakeholders frequently express their opinions and emotions across diverse
platforms. Recognizing the sentiments conveyed in these interactions is crucial
for the effective development and ongoing maintenance of software systems. Over
the years, many tools have been proposed to aid in sentiment analysis, but
accurately identifying the sentiments expressed in software engineering
datasets remains challenging.
Although fine-tuned smaller large language models (sLLMs) have shown
potential in handling software engineering tasks, they struggle with the
shortage of labeled data. With the emergence of bigger large language models
(bLLMs), it is pertinent to investigate whether they can handle this challenge
in the context of sentiment analysis for software engineering. In this work, we
undertake a comprehensive empirical study using five established datasets. We
assess the performance of three open-source bLLMs in both zero-shot and
few-shot scenarios. Additionally, we compare them with fine-tuned sLLMs.
Our experimental findings demonstrate that bLLMs exhibit state-of-the-art
performance on datasets marked by limited training data and imbalanced
distributions. bLLMs can also achieve excellent performance under a zero-shot
setting. However, when ample training data is available or the dataset exhibits
a more balanced distribution, fine-tuned sLLMs can still achieve superior
results.Comment: Submitted to TOSE
- …