Search CORE

2,219 research outputs found

Predicting the Impact of Crashes Across Release Channels

Author: Castelluccio Marco
Costa Diego Elias
Mujahid Suhaib
Publication venue
Publication date: 24/01/2024
Field of study

Software maintenance faces a persistent challenge with crash bugs, especially across diverse release channels catering to distinct user bases. Nightly builds, favoured by enthusiasts, often reveal crashes that are cheaper to fix but may differ significantly from those in stable releases. In this paper, we emphasize the need for a data-driven solution to predict the impact of crashes happening on nightly channels once they are released to stable channels. We also list the challenges that need to be considered when approaching this problem

arXiv.org e-Print Archive

Benchmark-driven Software Performance Optimization

Author: Damasceno Costa Diego Elias
Publication venue
Publication date: 01/01/2019
Field of study

Software systems are an integral part of modern society. As we continue to harness software automation in all aspects of our daily lives, the runtime performance of these systems become increasingly important. When everything seems just a click away, performance issues that compromise the responsiveness of a system can lead to severe financial and reputation losses. Designing efficient code is critical for ensuring good and consistent performance of software systems. It requires performance expertize, and encompasses a set of difficult design decisions that need to be continuously revisited throughout the evolution of the software. Developers must test the performance of their core implementations, select efficient data structures and algorithms, explore parallel processing when it provides performance benefits, among many other aspects. Furthermore, the constant pressure for high-productivity laid on developers, aligned with the increasing complexity of modern software, makes designing efficient code an even more challenging endeavor. This thesis presents a series of novel approaches based on empirical insights that attempt to support developers at the task of designing efficient code. We present contributions in three aspects. First, we investigate the prevalence and impact of bad practices on performance benchmarks of Java-based open-source software. We show that not only these bad practices occur frequently, they often distort the benchmark results substantially. Moreover, we devise a tool that can be used by developers to identify bad practices during benchmark creation automatically. Second, we design an application-level framework that identifies suboptimal implementations and selects optimized variants at runtime, effectively optimizing the execution time and memory usage of the target application. Furthermore, we investigate the performance of data structures from several popular collection libraries. Our findings show that alternative variants can be selected for substantial performance improvement under specific usage scenarios. Third, we investigate the parallelization of object processing via Java streams. We propose a decision-support framework that leverages machine-learning models trained through a series of benchmarks, to identify and report stream pipelines that should be processed in parallel for better performance

Heidelberger Dokumentenserver

Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

Author: Abdellatif Ahmad
Costa Diego Elias
Khatoonabadi SayedHassan
Shihab Emad
Publication venue
Publication date: 13/11/2023
Field of study

The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.Comment: Manuscript submitted to IEEE Transactions on Software Engineering (TSE

arXiv.org e-Print Archive

Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects

Author: Costa Diego Elias
Khatoonabadi SayedHassan
Mujahid Suhaib
Shihab Emad
Publication venue
Publication date: 29/05/2023
Field of study

Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale bot alleviates or exacerbates the problem of inactive PRs. To better understand if and how Stale bot helps projects in their pull-based development workflow, we perform an empirical study of 20 large and popular open-source projects. We find that Stale bot can help deal with a backlog of unresolved PRs as the projects closed more PRs within the first few months of adoption. Moreover, Stale bot can help improve the efficiency of the PR review process as the projects reviewed PRs that ended up merged and resolved PRs that ended up closed faster after the adoption. However, Stale bot can also negatively affect the contributors as the projects experienced a considerable decrease in their number of active contributors after the adoption. Therefore, relying solely on Stale bot to deal with inactive PRs may lead to decreased community engagement and an increased probability of contributor abandonment.Comment: Manuscript submitted to ACM Transactions on Software Engineering and Methodolog

arXiv.org e-Print Archive

Where to Go Now? Finding Alternatives for Declining Packages in the npm Ecosystem

Author: Abdalkareem Rabe
Costa Diego Elias
Mujahid Suhaib
Shihab Emad
Publication venue
Publication date: 16/08/2023
Field of study

Software ecosystems (e.g., npm, PyPI) are the backbone of modern software developments. Developers add new packages to ecosystems every day to solve new problems or provide alternative solutions, causing obsolete packages to decline in their importance to the community. Packages in decline are reused less overtime and may become less frequently maintained. Thus, developers usually migrate their dependencies to better alternatives. Replacing packages in decline with better alternatives requires time and effort by developers to identify packages that need to be replaced, find the alternatives, asset migration benefits, and finally, perform the migration. This paper proposes an approach that automatically identifies packages that need to be replaced and finds their alternatives supported with real-world examples of open source projects performing the suggested migrations. At its core, our approach relies on the dependency migration patterns performed in the ecosystem to suggest migrations to other developers. We evaluated our approach on the npm ecosystem and found that 96% of the suggested alternatives are accurate. Furthermore, by surveying expert JavaScript developers, 67% of them indicate that they will use our suggested alternative packages in their future projects

arXiv.org e-Print Archive

On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests

Author: Abdalkareem Rabe
Costa Diego Elias
Khatoonabadi SayedHassan
Shihab Emad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2022
Field of study

Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying dynamics of contributor-abandoned PRs, we conduct a mixed-methods study using both quantitative and qualitative methods. We curate a dataset consisting of 265,325 PRs including 4,450 abandoned ones from ten popular and mature GitHub projects and measure 16 features characterizing PRs, contributors, review processes, and projects. Using statistical and machine learning techniques, we find that complex PRs, novice contributors, and lengthy reviews have a higher probability of abandonment and the rate of PR abandonment fluctuates alongside the projects' maturity or workload. To identify why contributors abandon their PRs, we also manually examine a random sample of 354 abandoned PRs. We observe that the most frequent abandonment reasons are related to the obstacles faced by contributors, followed by the hurdles imposed by maintainers during the review process. Finally, we survey the top core maintainers of the studied projects to understand their perspectives on dealing with PR abandonment and on our findings.Comment: Manuscript accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM

arXiv.org e-Print Archive

Far transfer to language and math of a short software-based gaming intervention

Author: Costa Martín Elias
Fernandez Slezak Diego
Goldin Andrea Paula
Hermida Maria Julia
Lipina Sebastián Javier
Lopez Rosenfeld Matías
Segretin María Soledad
Shalóm Diego Edgar
Sigman Mariano
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/04/2014
Field of study

Executive functions (EF) in children can be trained, but it remains unknown whether training-related benefits elicit far transfer to real-life situations. Here, we investigate whether a set of computerized games might yield near and far transfer on an experimental and an active control group of low-SES otherwise typically developing 6-y-olds in a 3-mo pretest–training–posttest design that was ecologically deployed (at school). The intervention elicits transfer to some (but not all) facets of executive function. These changes cascade to real-world measures of school performance. The intervention equalizes academic outcomes across children who regularly attend school and those who do not because of social and familiar circumstances.Fil: Goldin, Andrea Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Universidad Torcuato Di Tella; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Hermida, Maria Julia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas "Norberto Quirno". CEMIC-CONICET.; ArgentinaFil: Shalóm, Diego Edgar. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Costa, Martín Elias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; ArgentinaFil: Lopez Rosenfeld, Matías. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; ArgentinaFil: Segretin, María Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas "Norberto Quirno". CEMIC-CONICET.; ArgentinaFil: Fernandez Slezak, Diego. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lipina, Sebastián Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas "Norberto Quirno". CEMIC-CONICET.; Argentina. Universidad Nacional de San Martín; ArgentinaFil: Sigman, Mariano. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Universidad Torcuato Di Tella; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

PubMed Central