35 research outputs found
Copilot Security: A User Study
Code generation tools driven by artificial intelligence have recently become
more popular due to advancements in deep learning and natural language
processing that have increased their capabilities. The proliferation of these
tools may be a double-edged sword because while they can increase developer
productivity by making it easier to write code, research has shown that they
can also generate insecure code. In this paper, we perform a user-centered
evaluation GitHub's Copilot to better understand its strengths and weaknesses
with respect to code security. We conduct a user study where participants solve
programming problems, which have potentially vulnerable solutions, with and
without Copilot assistance. The main goal of the user study is to determine how
the use of Copilot affects participants' security performance. In our set of
participants (n=25), we find that access to Copilot accompanies a more secure
solution when tackling harder problems. For the easier problem, we observe no
effect of Copilot access on the security of solutions. We also observe no
disproportionate impact of Copilot use on particular kinds of vulnerabilities
Examining User-Developer Feedback Loops in the iOS App Store
Application Stores, such as the iTunes App Store, give developers access to their users’ complaints and requests in the form of app reviews. However, little is known about how developers are responding to app reviews. Without such knowledge developers, users, App Stores, and researchers could build upon wrong foundations. To address this knowledge gap, in this study we focus on feedback loops, which occur when developers address a user concern. To conduct this study we use both supervised and unsupervised methods to automatically analyze a corpus of 1752 different apps from the iTunes App Store consisting of 30,875 release notes and 806,209 app reviews. We found that 18.7% of the apps in our corpus contain instances of feedback loops. In these feedback loops we observed interesting behaviors. For example, (i) feedback loops with feature requests and login issues were twice as likely as general bugs to be fixed by developers, (ii) users who reviewed with an even tone were most likely to have their concerns addressed, and (iii) the star rating of the app reviews did not influence the developers likelihood of completing a feedback loop
Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?
Several advances in deep learning have been successfully applied to the
software development process. Of recent interest is the use of neural language
models to build tools, such as Copilot, that assist in writing code. In this
paper we perform a comparative empirical analysis of Copilot-generated code
from a security perspective. The aim of this study is to determine if Copilot
is as bad as human developers - we investigate whether Copilot is just as
likely to introduce the same software vulnerabilities that human developers
did. Using a dataset of C/C++ vulnerabilities, we prompt Copilot to generate
suggestions in scenarios that previously led to the introduction of
vulnerabilities by human developers. The suggestions are inspected and
categorized in a 2-stage process based on whether the original vulnerability or
the fix is reintroduced. We find that Copilot replicates the original
vulnerable code ~33% of the time while replicating the fixed code at a ~25%
rate. However this behavior is not consistent: Copilot is more susceptible to
introducing some types of vulnerability than others and is more likely to
generate vulnerable code in response to prompts that correspond to older
vulnerabilities than newer ones. Overall, given that in a substantial
proportion of instances Copilot did not generate code with the same
vulnerabilities that human developers had introduced previously, we conclude
that Copilot is not as bad as human developers at introducing vulnerabilities
in code
RLocator: Reinforcement Learning for Bug Localization
Software developers spend a significant portion of time fixing bugs in their
projects. To streamline this process, bug localization approaches have been
proposed to identify the source code files that are likely responsible for a
particular bug. Prior work proposed several similarity-based machine-learning
techniques for bug localization. Despite significant advances in these
techniques, they do not directly optimize the evaluation measures. We argue
that directly optimizing evaluation measures can positively contribute to the
performance of bug localization approaches. Therefore, In this paper, we
utilize Reinforcement Learning (RL) techniques to directly optimize the ranking
metrics. We propose RLocator, a Reinforcement Learning-based bug localization
approach. We formulate RLocator using a Markov Decision Process (MDP) to
optimize the evaluation measures directly. We present the technique and
experimentally evaluate it based on a benchmark dataset of 8,316 bug reports
from six highly popular Apache projects. The results of our evaluation reveal
that RLocator achieves a Mean Reciprocal Rank (MRR) of 0.62, a Mean Average
Precision (MAP) of 0.59, and a Top 1 score of 0.46. We compare RLocator with
two state-of-the-art bug localization tools, FLIM and BugLocator. Our
evaluation reveals that RLocator outperforms both approaches by a substantial
margin, with improvements of 38.3% in MAP, 36.73% in MRR, and 23.68% in the Top
K metric. These findings highlight that directly optimizing evaluation measures
considerably contributes to performance improvement of the bug localization
problem
Guest Editorial: Special issue on software engineering for mobile applications
Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch
Diversity in Software Engineering Conferences and Journals
Diversity with respect to ethnicity and gender has been studied in
open-source and industrial settings for software development. Publication
avenues such as academic conferences and journals contribute to the growing
technology industry. However, there have been very few diversity-related
studies conducted in the context of academia. In this paper, we study the
ethnic, gender, and geographical diversity of the authors published in Software
Engineering conferences and journals. We provide a systematic quantitative
analysis of the diversity of publications and organizing and program committees
of three top conferences and two top journals in Software Engineering, which
indicates the existence of bias and entry barriers towards authors and
committee members belonging to certain ethnicities, gender, and/or geographical
locations in Software Engineering conferences and journal publications. For our
study, we analyse publication (accepted authors) and committee data (Program
and Organizing committee/ Journal Editorial Board) from the conferences ICSE,
FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The
analysis of the data shows that across participants and committee members,
there are some communities that are consistently significantly lower in
representation, for example, publications from countries in Africa, South
America, and Oceania. However, a correlation study between the diversity of the
committees and the participants did not yield any conclusive evidence.
Furthermore, there is no conclusive evidence that papers with White authors or
male authors were more likely to be cited. Finally, we see an improvement in
the ethnic diversity of the authors over the years 2010-2022 but not in gender
or geographical diversity.Comment: 13 pages, 10 figures, 4 table