4,711 research outputs found
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Software repositories contain large amounts of textual data, ranging from
source code comments and issue descriptions to questions, answers, and comments
on Stack Overflow. To make sense of this textual data, topic modelling is
frequently used as a text-mining tool for the discovery of hidden semantic
structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used
topic model that aims to explain the structure of a corpus by grouping texts.
LDA requires multiple parameters to work well, and there are only rough and
sometimes conflicting guidelines available on how these parameters should be
set. In this paper, we contribute (i) a broad study of parameters to arrive at
good local optima for GitHub and Stack Overflow text corpora, (ii) an
a-posteriori characterisation of text corpora related to eight programming
languages, and (iii) an analysis of corpus feature importance via per-corpus
LDA configuration. We find that (1) popular rules of thumb for topic modelling
parameter configuration are not applicable to the corpora used in our
experiments, (2) corpora sampled from GitHub and Stack Overflow have different
characteristics and require different configurations to achieve good model fit,
and (3) we can predict good configurations for unseen corpora reliably. These
findings support researchers and practitioners in efficiently determining
suitable configurations for topic modelling when analysing textual data
contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie
Translating Video Recordings of Mobile App Usages into Replayable Scenarios
Screen recordings of mobile applications are easy to obtain and capture a
wealth of information pertinent to software developers (e.g., bugs or feature
requests), making them a popular mechanism for crowdsourced app feedback. Thus,
these videos are becoming a common artifact that developers must manage. In
light of unique mobile development constraints, including swift release cycles
and rapidly evolving platforms, automated techniques for analyzing all types of
rich software artifacts provide benefit to mobile developers. Unfortunately,
automatically analyzing screen recordings presents serious challenges, due to
their graphical nature, compared to other types of (textual) artifacts. To
address these challenges, this paper introduces V2S, a lightweight, automated
approach for translating video recordings of Android app usages into replayable
scenarios. V2S is based primarily on computer vision techniques and adapts
recent solutions for object detection and image classification to detect and
classify user actions captured in a video, and convert these into a replayable
test scenario. We performed an extensive evaluation of V2S involving 175 videos
depicting 3,534 GUI-based actions collected from users exercising features and
reproducing bugs from over 80 popular Android apps. Our results illustrate that
V2S can accurately replay scenarios from screen recordings, and is capable of
reproducing 89% of our collected videos with minimal overhead. A case
study with three industrial partners illustrates the potential usefulness of
V2S from the viewpoint of developers.Comment: In proceedings of the 42nd International Conference on Software
Engineering (ICSE'20), 13 page
Y2K Interruption: Can the Doomsday Scenario Be Averted?
The management philosophy until recent years has been to replace the workers with computers, which are available 24 hours a day, need no benefits, no insurance and never complain. But as the year 2000 approached, along with it came the fear of the millennium bug, generally known as Y2K, and the computers threatened to strike!!!! Y2K, though an abbreviation of year 2000, generally refers to the computer glitches which are associated with the year 2000. Computer companies, in order to save memory and money, adopted a voluntary standard in the beginning of the computer era that all computers automatically convert any year designated by two numbers such as 99 into 1999 by adding the digits 19. This saved enormous amount of memory, and thus money, because large databases containing birth dates or other dates only needed to contain the last two digits such as 65 or 86. But it also created a built in flaw that could make the computers inoperable from January 2000. The problem is that most of these old computers are programmed to convert 00 (for the year 2000) into 1900 and not 2000. The trouble could therefore, arise when the systems had to deal with dates outside the 1900s. In 2000, for example a programme that calculates the age of a person born in 1965 will subtract 65 from 00 and get -65. The problem is most acute in mainframe systems, but that does not mean PCs, UNIX and other computing environments are trouble free. Any computer system that relies on date calculations must be tested because the Y2K or the millennium bug arises because of a potential for “date discontinuity” which occurs when the time expressed by a system, or any of its components, does not move in consonance with real time. Though attention has been focused on the potential problems linked with change from 1999 to 2000, date discontinuity may occur at other times in and around this period.
Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models
Bug reports are vital for software maintenance that allow users to inform
developers of the problems encountered while using the software. As such,
researchers have committed considerable resources toward automating bug replay
to expedite the process of software maintenance. Nonetheless, the success of
current automated approaches is largely dictated by the characteristics and
quality of bug reports, as they are constrained by the limitations of
manually-crafted patterns and pre-defined vocabulary lists. Inspired by the
success of Large Language Models (LLMs) in natural language understanding, we
propose AdbGPT, a new lightweight approach to automatically reproduce the bugs
from bug reports through prompt engineering, without any training and
hard-coding effort. AdbGPT leverages few-shot learning and chain-of-thought
reasoning to elicit human knowledge and logical reasoning from LLMs to
accomplish the bug replay in a manner similar to a developer. Our evaluations
demonstrate the effectiveness and efficiency of our AdbGPT to reproduce 81.3%
of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines
and ablation studies. We also conduct a small-scale user study to confirm the
usefulness of AdbGPT in enhancing developers' bug replay capabilities.Comment: Accepted to 46th International Conference on Software Engineering
(ICSE 2024
Large Language Models for Software Engineering: A Systematic Literature Review
Large Language Models (LLMs) have significantly impacted numerous domains,
notably including Software Engineering (SE). Nevertheless, a well-rounded
understanding of the application, effects, and possible limitations of LLMs
within SE is still in its early stages. To bridge this gap, our systematic
literature review takes a deep dive into the intersection of LLMs and SE, with
a particular focus on understanding how LLMs can be exploited in SE to optimize
processes and outcomes. Through a comprehensive review approach, we collect and
analyze a total of 229 research papers from 2017 to 2023 to answer four key
research questions (RQs). In RQ1, we categorize and provide a comparative
analysis of different LLMs that have been employed in SE tasks, laying out
their distinctive features and uses. For RQ2, we detail the methods involved in
data collection, preprocessing, and application in this realm, shedding light
on the critical role of robust, well-curated datasets for successful LLM
implementation. RQ3 allows us to examine the specific SE tasks where LLMs have
shown remarkable success, illuminating their practical contributions to the
field. Finally, RQ4 investigates the strategies employed to optimize and
evaluate the performance of LLMs in SE, as well as the common techniques
related to prompt optimization. Armed with insights drawn from addressing the
aforementioned RQs, we sketch a picture of the current state-of-the-art,
pinpointing trends, identifying gaps in existing research, and flagging
promising areas for future study
Recommended from our members
A survey on online monitoring approaches of computer-based systems
This report surveys forms of online data collection that are in current use (as well as being the subject of research to adapt them to changing technology and demands), and can be used as inputs to assessment of dependability and resilience, although they are not primarily meant for this use
Recommended from our members
Analysing the Resolution of Security Bugs in Software Maintenance
Security bugs in software systems are often reported after incidents of malicious attacks. Developers often need to resolve these bugs quickly in order to maintain the security of such systems. Bug resolution includes two kinds of activities: triaging confirms that the bugs are indeed security problems, after which fixing involves making changes to the code.
It is reported in the literature that, statistically, security bugs are reopened more often compared to others, which poses two new research questions: (a) Are developers “rushing” to triage security bugs too soon under the pressure of deadlines? (b) Do developers need to spend more time fixing security bugs to avoid frequent reopening?
This thesis explores these questions in order to determine whether security bug fixing should take a higher priority than other bugs to avoid malicious attackers exploiting vulnerabilities before the problems are fixed, and whether security bug fixing should take a higher priority than other bugs.
In this thesis a quantitative approach has been adopted by conducting statistical empirical studies to observe the behaviour of software developers engaged in dealing with security bugs.
Firstly, the concept of "rush'' has been borrowed from the time management literature to refer to the behaviour of people delivering work under the pressure of deadlines. By observing how developers deliver bug resolution before the deadline of releases, the degree of rush has been measured as the ratio between the actual time spent by developers during triaging and the theoretical time the developers have by delaying the fixes until the next regular release.
In this thesis, a suggest that delaying bug assignment helps find the right developer and gives the developer more time to prepare for the same workload with more relaxed planning constraints. Secondly, to analyse the complexity of security bug fixes, the fan-in complexity of functions relevant to security bugs has been measured, rather than simply measuring the time spent by the software developers on the fixing of such bugs.
The first null hypothesis is tested using a Man-Whitney method on five software case studies, Samba, MozillaFirefox, RedHat, FreeBSD and Mozilla. The second null hypothesis is tested by comparing the results of fixing security and non-security bugs from the Samba and MozillaFirefox case studies.
Statistically significant results suggest that security bugs are triaged in a rush compared to non-security bugs for RedHat, FreeBSD and Mozilla.
In terms of fan-in, the results of the Samba and MozillaFirefox case studies suggest that security bugs are more complex to fix compared to non-security bugs
EFFECTIVE METHODS AND TOOLS FOR MINING APP STORE REVIEWS
Research on mining user reviews in mobile application (app) stores has noticeably advanced in the past few years. The main objective is to extract useful information that app developers can use to build more sustainable apps. In general, existing research on app store mining can be classified into three genres: classification of user feedback into different types of software maintenance requests (e.g., bug reports and feature requests), building practical tools that are readily available for developers to use, and proposing visions for enhanced mobile app stores that integrate multiple sources of user feedback to ensure app survivability. Despite these major advances, existing tools and techniques still suffer from several drawbacks. Specifically, the majority of techniques rely on the textual content of user reviews for classification. However, due to the inherently diverse and unstructured nature of user-generated online textual reviews, text-based review mining techniques often produce excessively complicated models that are prone to over-fitting. Furthermore, the majority of proposed techniques focus on extracting and classifying the functional requirements in mobile app reviews, providing a little or no support for extracting and synthesizing the non-functional requirements (NFRs) raised in user feedback (e.g., security, reliability, and usability). In terms of tool support, existing tools are still far from being adequate for practical applications. In general, there is a lack of off-the-shelf tools that can be used by researchers and practitioners to accurately mine user reviews. Motivated by these observations, in this dissertation, we explore several research directions aimed at addressing the current issues and shortcomings in app store review mining research. In particular, we introduce a novel semantically aware approach for mining and classifying functional requirements from app store reviews. This approach reduces the dimensionality of the data and enhances the predictive capabilities of the classifier. We then present a two-phase study aimed at automatically capturing the NFRs in user reviews. We also introduce MARC, a tool that enables developers to extract, classify, and summarize user reviews
- …