2,219 research outputs found
Predicting the Impact of Crashes Across Release Channels
Software maintenance faces a persistent challenge with crash bugs, especially
across diverse release channels catering to distinct user bases. Nightly
builds, favoured by enthusiasts, often reveal crashes that are cheaper to fix
but may differ significantly from those in stable releases. In this paper, we
emphasize the need for a data-driven solution to predict the impact of crashes
happening on nightly channels once they are released to stable channels. We
also list the challenges that need to be considered when approaching this
problem
Benchmark-driven Software Performance Optimization
Software systems are an integral part of modern society. As we continue to harness software automation in all aspects of our daily lives, the runtime performance of these systems become increasingly important. When everything seems just a click away, performance issues that compromise the responsiveness of a system can lead to severe financial and reputation losses. Designing efficient code is critical for ensuring good and consistent performance of software systems. It requires performance expertize, and encompasses a set of difficult design decisions that need to be continuously revisited throughout the evolution of the software. Developers must test the performance of their core implementations, select efficient data structures and algorithms, explore parallel processing when it provides performance benefits, among many other aspects. Furthermore, the constant pressure for high-productivity laid on developers, aligned with the increasing complexity of modern software, makes designing efficient code an even more challenging endeavor.
This thesis presents a series of novel approaches based on empirical insights that attempt to support developers at the task of designing efficient code. We present contributions in three aspects. First, we investigate the prevalence and impact of bad practices on performance benchmarks of Java-based open-source software. We show that not only these bad practices occur frequently, they often distort the benchmark results substantially. Moreover, we devise a tool that can be used by developers to identify bad practices during benchmark creation automatically.
Second, we design an application-level framework that identifies suboptimal implementations and selects optimized variants at runtime, effectively optimizing the execution time and memory usage of the target application. Furthermore, we investigate the performance of data structures from several popular collection libraries. Our findings show that alternative variants can be selected for substantial performance improvement under specific usage scenarios.
Third, we investigate the parallelization of object processing via Java streams. We propose a decision-support framework that leverages machine-learning models trained through a series of benchmarks, to identify and report stream pipelines that should be processed in parallel for better performance
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
The success of a Pull Request (PR) depends on the responsiveness of the
maintainers and the contributor during the review process. Being aware of the
expected waiting times can lead to better interactions and managed expectations
for both the maintainers and the contributor. In this paper, we propose a
machine-learning approach to predict the first response latency of the
maintainers following the submission of a PR, and the first response latency of
the contributor after receiving the first response from the maintainers. We
curate a dataset of 20 large and popular open-source projects on GitHub and
extract 21 features to characterize projects, contributors, PRs, and review
processes. Using these features, we then evaluate seven types of classifiers to
identify the best-performing models. We also perform permutation feature
importance and SHAP analyses to understand the importance and impact of
different features on the predicted response latencies. Our best-performing
models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for
maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors
compared to a no-skilled classifier across the projects. Our findings indicate
that PRs submitted earlier in the week, containing an average or slightly
above-average number of commits, and with concise descriptions are more likely
to receive faster first responses from the maintainers. Similarly, PRs with a
lower first response latency from maintainers, that received the first response
of maintainers earlier in the week, and containing an average or slightly
above-average number of commits tend to receive faster first responses from the
contributors. Additionally, contributors with a higher acceptance rate and a
history of timely responses in the project are likely to both obtain and
provide faster first responses.Comment: Manuscript submitted to IEEE Transactions on Software Engineering
(TSE
Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects
Pull Requests (PRs) that are neither progressed nor resolved clutter the list
of PRs, making it difficult for the maintainers to manage and prioritize
unresolved PRs. To automatically track, follow up, and close such inactive PRs,
Stale bot was introduced by GitHub. Despite its increasing adoption, there are
ongoing debates on whether using Stale bot alleviates or exacerbates the
problem of inactive PRs. To better understand if and how Stale bot helps
projects in their pull-based development workflow, we perform an empirical
study of 20 large and popular open-source projects. We find that Stale bot can
help deal with a backlog of unresolved PRs as the projects closed more PRs
within the first few months of adoption. Moreover, Stale bot can help improve
the efficiency of the PR review process as the projects reviewed PRs that ended
up merged and resolved PRs that ended up closed faster after the adoption.
However, Stale bot can also negatively affect the contributors as the projects
experienced a considerable decrease in their number of active contributors
after the adoption. Therefore, relying solely on Stale bot to deal with
inactive PRs may lead to decreased community engagement and an increased
probability of contributor abandonment.Comment: Manuscript submitted to ACM Transactions on Software Engineering and
Methodolog
Where to Go Now? Finding Alternatives for Declining Packages in the npm Ecosystem
Software ecosystems (e.g., npm, PyPI) are the backbone of modern software
developments. Developers add new packages to ecosystems every day to solve new
problems or provide alternative solutions, causing obsolete packages to decline
in their importance to the community. Packages in decline are reused less
overtime and may become less frequently maintained. Thus, developers usually
migrate their dependencies to better alternatives. Replacing packages in
decline with better alternatives requires time and effort by developers to
identify packages that need to be replaced, find the alternatives, asset
migration benefits, and finally, perform the migration.
This paper proposes an approach that automatically identifies packages that
need to be replaced and finds their alternatives supported with real-world
examples of open source projects performing the suggested migrations. At its
core, our approach relies on the dependency migration patterns performed in the
ecosystem to suggest migrations to other developers. We evaluated our approach
on the npm ecosystem and found that 96% of the suggested alternatives are
accurate. Furthermore, by surveying expert JavaScript developers, 67% of them
indicate that they will use our suggested alternative packages in their future
projects
On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests
Pull-based development has enabled numerous volunteers to contribute to
open-source projects with fewer barriers. Nevertheless, a considerable amount
of pull requests (PRs) with valid contributions are abandoned by their
contributors, wasting the effort and time put in by both the contributors and
maintainers. To better understand the underlying dynamics of
contributor-abandoned PRs, we conduct a mixed-methods study using both
quantitative and qualitative methods. We curate a dataset consisting of 265,325
PRs including 4,450 abandoned ones from ten popular and mature GitHub projects
and measure 16 features characterizing PRs, contributors, review processes, and
projects. Using statistical and machine learning techniques, we find that
complex PRs, novice contributors, and lengthy reviews have a higher probability
of abandonment and the rate of PR abandonment fluctuates alongside the
projects' maturity or workload. To identify why contributors abandon their PRs,
we also manually examine a random sample of 354 abandoned PRs. We observe that
the most frequent abandonment reasons are related to the obstacles faced by
contributors, followed by the hurdles imposed by maintainers during the review
process. Finally, we survey the top core maintainers of the studied projects to
understand their perspectives on dealing with PR abandonment and on our
findings.Comment: Manuscript accepted for publication in ACM Transactions on Software
Engineering and Methodology (TOSEM
Far transfer to language and math of a short software-based gaming intervention
Executive functions (EF) in children can be trained, but it remains unknown whether training-related benefits elicit far transfer to real-life situations. Here, we investigate whether a set of computerized games might yield near and far transfer on an experimental and an active control group of low-SES otherwise typically developing 6-y-olds in a 3-mo pretest–training–posttest design that was ecologically deployed (at school). The intervention elicits transfer to some (but not all) facets of executive function. These changes cascade to real-world measures of school performance. The intervention equalizes academic outcomes across children who regularly attend school and those who do not because of social and familiar circumstances.Fil: Goldin, Andrea Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Universidad Torcuato Di Tella; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Hermida, Maria Julia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas "Norberto Quirno". CEMIC-CONICET.; ArgentinaFil: Shalóm, Diego Edgar. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Costa, Martín Elias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; ArgentinaFil: Lopez Rosenfeld, Matías. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; ArgentinaFil: Segretin, María Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas "Norberto Quirno". CEMIC-CONICET.; ArgentinaFil: Fernandez Slezak, Diego. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lipina, Sebastián Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas "Norberto Quirno". CEMIC-CONICET.; Argentina. Universidad Nacional de San Martín; ArgentinaFil: Sigman, Mariano. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Universidad Torcuato Di Tella; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin
- …