101 research outputs found
Detecting Similar Applications with Collaborative Tagging
Abstract—Detecting similar applications are useful for var-ious purposes ranging from program comprehension, rapid prototyping, plagiarism detection, and many more. McMillan et al. have proposed a solution to detect similar applications based on common Java API usage patterns. Recently, collaborative tagging has impacted software development practices. Various sites allow users to give various tags to software systems. In this study, we would like to complement the study by McMillan et al. by leveraging another source of information aside from API usage patterns, namely software tags. We have performed a user study involving several participants and the results show that collaborative tagging is a promising source of information useful for detecting similar software applications. I
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection
Duplicate bug report detection (DBRD) is a long-standing challenge in both
academia and industry. Over the past decades, researchers have proposed various
approaches to detect duplicate bug reports more accurately. With the recent
advancement of deep learning, researchers have also proposed several approaches
that leverage deep learning models to detect duplicate bug reports. A recent
benchmarking study on DBRD also reveals that the performance of deep
learning-based approaches is not always better than the traditional approaches.
However, traditional approaches have limitations, e.g., they are usually based
on the bag-of-words model, which cannot capture the semantics of bug reports.
To address these aforementioned challenges, we seek to leverage
state-of-the-art large language model to improve the performance of the
traditional DBRD approach.
In this paper, we propose an approach called Cupid, which combines the
best-performing traditional DBRD approach REP with the state-of-the-art large
language model ChatGPT. Specifically, we first leverage ChatGPT under the
zero-shot setting to get essential information on bug reports. We then use the
essential information as the input of REP to detect duplicate bug reports. We
conducted an evaluation on comparing Cupid with three existing approaches on
three datasets. The experimental results show that Cupid achieves new
state-of-the-art results, reaching Recall Rate@10 scores ranging from 0.59 to
0.67 across all the datasets analyzed. Our work highlights the potential of
combining large language models to improve the performance of software
engineering tasks.Comment: Work in progres
Towards generating transformation rules without examples for android API replacement
National Research Foundation (NRF) Singaporeauthors' own version</p
CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
Lee Kuan Yew Fellowship, Singapore Management Universit
Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models
Software development is an inherently collaborative process, where various
stakeholders frequently express their opinions and emotions across diverse
platforms. Recognizing the sentiments conveyed in these interactions is crucial
for the effective development and ongoing maintenance of software systems. Over
the years, many tools have been proposed to aid in sentiment analysis, but
accurately identifying the sentiments expressed in software engineering
datasets remains challenging.
Although fine-tuned smaller large language models (sLLMs) have shown
potential in handling software engineering tasks, they struggle with the
shortage of labeled data. With the emergence of bigger large language models
(bLLMs), it is pertinent to investigate whether they can handle this challenge
in the context of sentiment analysis for software engineering. In this work, we
undertake a comprehensive empirical study using five established datasets. We
assess the performance of three open-source bLLMs in both zero-shot and
few-shot scenarios. Additionally, we compare them with fine-tuned sLLMs.
Our experimental findings demonstrate that bLLMs exhibit state-of-the-art
performance on datasets marked by limited training data and imbalanced
distributions. bLLMs can also achieve excellent performance under a zero-shot
setting. However, when ample training data is available or the dataset exhibits
a more balanced distribution, fine-tuned sLLMs can still achieve superior
results.Comment: Submitted to TOSE
- …