233 research outputs found
Understanding the Heterogeneity of Contributors in Bug Bounty Programs
Background: While bug bounty programs are not new in software development, an
increasing number of companies, as well as open source projects, rely on
external parties to perform the security assessment of their software for
reward. However, there is relatively little empirical knowledge about the
characteristics of bug bounty program contributors. Aim: This paper aims to
understand those contributors by highlighting the heterogeneity among them.
Method: We analyzed the histories of 82 bug bounty programs and 2,504 distinct
bug bounty contributors, and conducted a quantitative and qualitative survey.
Results: We found that there are project-specific and non-specific contributors
who have different motivations for contributing to the products and
organizations. Conclusions: Our findings provide insights to make bug bounty
programs better and for further studies of new software development roles.Comment: 6 pages, ESEM 201
Bug or Not? Bug Report Classification Using N-Gram IDF
Previous studies have found that a significant number of bug reports are
misclassified between bugs and non-bugs, and that manually classifying bug
reports is a time-consuming task. To address this problem, we propose a bug
reports classification model with N-gram IDF, a theoretical extension of
Inverse Document Frequency (IDF) for handling words and phrases of any length.
N-gram IDF enables us to extract key terms of any length from texts, these key
terms can be used as the features to classify bug reports. We build
classification models with logistic regression and random forest using features
from N-gram IDF and topic modeling, which is widely used in various software
engineering tasks. With a publicly available dataset, our results show that our
N-gram IDF-based models have a superior performance than the topic-based models
on all of the evaluated cases. Our models show promising results and have a
potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201
Using High-Rising Cities to Visualize Performance in Real-Time
For developers concerned with a performance drop or improvement in their
software, a profiler allows a developer to quickly search and identify
bottlenecks and leaks that consume much execution time. Non real-time profilers
analyze the history of already executed stack traces, while a real-time
profiler outputs the results concurrently with the execution of software, so
users can know the results instantaneously. However, a real-time profiler risks
providing overly large and complex outputs, which is difficult for developers
to quickly analyze. In this paper, we visualize the performance data from a
real-time profiler. We visualize program execution as a three-dimensional (3D)
city, representing the structure of the program as artifacts in a city (i.e.,
classes and packages expressed as buildings and districts) and their program
executions expressed as the fluctuating height of artifacts. Through two case
studies and using a prototype of our proposed visualization, we demonstrate how
our visualization can easily identify performance issues such as a memory leak
and compare performance changes between versions of a program. A demonstration
of the interactive features of our prototype is available at
https://youtu.be/eleVo19Hp4k.Comment: 10 pages, VISSOFT 2017, Artifact:
https://github.com/sefield/high-rising-city-artifac
DevGPT: Studying Developer-ChatGPT Conversations
The emergence of large language models (LLMs) such as ChatGPT has disrupted
the landscape of software development. Many studies are investigating the
quality of responses generated by ChatGPT, the efficacy of various prompting
techniques, and its comparative performance in programming contests, to name a
few examples. Yet, we know very little about how ChatGPT is actually used by
software developers. What questions do developers present to ChatGPT? What are
the dynamics of these interactions? What is the backdrop against which these
conversations are held, and how do the conversations feedback into the
artifacts of their work? To close this gap, we introduce DevGPT, a curated
dataset which encompasses 17,913 prompts and ChatGPT's responses including
11,751 code snippets, coupled with the corresponding software development
artifacts -- ranging from source code, commits, issues, pull requests, to
discussions and Hacker News threads -- to enable the analysis of the context
and implications of these developer interactions with ChatGPT.Comment: MSR 2024 Mining Challenge Proposa
- β¦