9 research outputs found

    Action-based recommendation in pull-request development

    Get PDF
    Pull requests (PRs) selection is a challenging task faced by integrators in pull-based development (PbD), with hundreds of PRs submitted on a daily basis to large open-source projects. Managing these PRs manually consumes integrators' time and resources and may lead to delays in the acceptance, response, or rejection of PRs that can propose bug fixes or feature enhancements. On the one hand, well-known platforms for performing PbD, like GitHub, do not provide built-in recommendation mechanisms for facilitating the management of PRs. On the other hand, prior research on PRs recommendation has focused on the likelihood of either a PR being accepted or receive a response by the integrator. In this paper, we consider both those likelihoods, this to help integrators in the PRs selection process by suggesting to them the appropriate actions to undertake on each specific PR. To this aim, we propose an approach, called CARTESIAN (aCceptance And Response classificaTion-based requESt IdentificAtioN) modeling the PRs recommendation according to PR actions. In particular, CARTESIAN is able to recommend three types of PR actions: accept, respond, and reject. We evaluated CARTESIAN on the PRs of 19 popular GitHub projects. The results of our study demonstrate that our approach can identify PR actions with an average precision and recall of about 86%. Moreover, our findings also highlight that CARTESIAN outperforms the results of two baseline approaches in the task of PRs selection

    How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?

    Full text link
    Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost model based on early participation (first three months of activity) in 290,255 GitHub projects and we interpret the model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023

    Energy Consumption of Functional Programs in the Context of Lazy Evaluation

    Get PDF
    We have limited natural resources available to support our daily living, be they raw materials for manufacturing or energy to generate work. The pace at which we consume those resources is approaching the limits at which nature can replenish them, and at which we can extract them. It is with those resources that we develop the most varied technology, on which our modern way of life is increasingly more dependent, to provide every kind of service conceivable. In particular, the Information and Communication Technologies are an essential part of today’s living. With ever more devices, supporting different services, in utilization, their energy demand grows daily. Aware of this facts, hardware/software developers seek ways to optimize the energy consumption by the computing hardware/software artifacts. Our work, focused on software, was driven by the need to know if, and to what extent, can we save energy by refactoring existing programs. To that extent, we implemented a benchmark that was used to analyze the energy consumption of various implementations of common data structure abstractions, implemented in the Edison library, for the Haskell programming language. Our findings lead us to conclude that, we can save energy, to a great extent, depending on the usage pattern, by software programs, of the native operations available in Edison.O planeta Terra dispõe de recursos naturais limitados disponíveis para suportar o nosso quotidiano, sejam eles matérias primas para manufactura ou energia para gerar trabalho. O ritmo a que consumimos esse recursos está a aproximar-se dos limites dentro dos quais a natureza pode restabelecê-los, e a que nós podemos extraí-los. É com esses recursos que desenvolvemos a mais variada tecnologia, da qual o nosso modo de vida moderno é cada vez mais dependente, para providenciar todos os tipos de serviços imagináveis. Em particular, as Tecnologias de Informação e Comunicação (TIC) são uma parte essencial da vida de hoje. Com cada vez mais dispositivos, suportando diferentes serviços, em utilização, o seu consumo de energia cresce diariamente. Cientes deste factos, os desenvolvedores de hardware/software procuram modos de optimizar o consumo de energia dos artefactos computationais (hardware/software). O nosso trabalho, focado no software, foi motivado pela necessidade de apurar se, e até que ponto, podemos poupar energia adaptando programas existentes. Nessa medida, implementámos um benchmark que foi utilizado para analisar o consumo energético de várias implementações de abstracções de estruturas de dados comuns, implementadas na biblioteca Edison, para a linguagem de programação Haskell. As nossas descobertas levam-nos a concluir que podemos poupar energia, extensivamente, dependendo do padrão de utilização, por parte dos programas, das operações nativas disponíveis na Edison

    Mineração de questões sobre o uso de expressões lambda em Java 8

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2018.Este trabalho apresenta um estudo sobre características presentes em perguntas relacionadas com o uso de expressões lambda em Java 8. Este estudo foi realizado através da mineração de perguntas e respostas do site Stack Overflow, de onde foram encontradas 1975 perguntas e 3974 respostas relacionadas com este tema. Estes dados foram usados para verificar como o interesse dos usuários do Stack Overflow em fazer estas perguntas variou ao longo do tempo e se essas perguntas costumam ser respondidas adequadamente. Foi feita também uma análise de sentimentos nestas perguntas e suas respostas para tentar verificar quais sentimentos são expressados pelos seus autores. Além disso foram procurados os principais tópicos abordados nestas perguntas e, para isso, foram selecionadas as 100 perguntas mais populares que foram lidas uma a uma. A leitura destas perguntas possibilitou não só ajudaram a identificar os tópicos mais populares, mais também pôde ser usada para complementar os resultados da análise de sentimentos.This work presents a study about features identified in questions about the usage of lambda expressions in Java 8. The study was done by means of mining questions and answers from the site Stack Overflow, from which were found 1975 questions and 3974 answers related to this theme. The aquired data was used to check how the interest in asking these questions changed with time, and whether this questions are answered successfully. Also, a sentiment analysis was done to try and see what sentiments are expressed by the authors of these questions and answers. Furthermore, the main topics addressed in these questions were looked for and to accomplish this, the 100 most popular questions were selected and read one by one. After reading these questions it was not only possible to better identify the most popular topics among the questions, it was also possible to complement the results of the sentiment analysis

    An empirical investigation of relevant changes and automation needs in modern code review

    Get PDF
    SUMMARY of the PAPER: This paper investigates the approaches and tools that, from a "developer's point of view", are still needed to facilitate Modern Code Review (MCR) activities. To that end, we empirically elicited a taxonomy of recurrent review change types that characterize MCR. This by (i) qualitatively and quantitatively analyzing review changes/commits of ten open-source projects; (ii) integrating MCR change types from existing taxonomies available from the literature; and (iii) surveying 52 developers to integrate eventually missing change types in the taxonomy. The results of our study highlight that the availability of new emerging development technologies (e.g., cloud-based technologies) and practices (e.g., continuous delivery) has pushed developers to perform additional activities during MCR and that additional types of feedback are expected by reviewers. Our participants provided also recommendations, specified techniques to employ, and highlighted the data to analyze for building recommender systems able to automate the code review activities composing our taxonomy. In summary, this study sheds some more light on the approaches and tools that are still needed to facilitate MCR activities, confirming the feasibility and usefulness of using summarization techniques during MCR activities. We believe that the results of our work represent an essential step for meeting the expectations of developers and supporting the vision of full or partial automation in MCR. REPLICATION PACKAGE: https://zenodo.org/record/3679402#.XxgSgy17Hxg PREPRINT: https://spanichella.github.io/img/EMSE-MCR-2020.pdfRecent research has shown that available tools for Modern Code Review (MCR) are still far from meeting the current expectations of developers. The objective of this paper is to investigate the approaches and tools that, from a developer's point of view, are still needed to facilitate MCR activities. To that end, we first empirically elicited a taxonomy of recurrent review change types that characterize MCR. The taxonomy was designed by performing three steps: (i) we generated an initial version of the taxonomy by qualitatively and quantitatively analyzing 211 review changes/commits and 648 review comments of ten open-source projects; then (ii) we integrated into this initial taxonomy, topics, and MCR change types of an existing taxonomy available from the literature; finally, (iii) we surveyed 52 developers to integrate eventually missing change types in the taxonomy. Results of our study highlight that the availability of new emerging development technologies (e.g., cloud-based technologies) and practices (e.g., continuous delivery) has pushed developers to perform additional activities during MCR and that additional types of feedback are expected by reviewers. Our participants provided recommendations, specified techniques to employ, and highlighted the data to analyze for building recommender systems able to automate the code review activities composing our taxonomy. We surveyed 14 additional participants (12 developers and 2 researchers), not involved in the previous survey, to qualitatively assess the relevance and completeness of the identified MCR change types as well as assess how critical and feasible to implement are some of the identified techniques to support MCR activities. Thus, with a study involving 21 additional developers, we qualitatively assess the feasibility and usefulness of leveraging natural language feedback (automation considered critical/feasible to implement) in supporting developers during MCR activities. In summary, this study sheds some more light on the approaches and tools that are still needed to facilitate MCR activities, confirming the feasibility and usefulness of using summarization techniques during MCR activities. We believe that the results of our work represent an essential step for meeting the expectations of developers and supporting the vision of full or partial automation in MC

    Utilizing traceable software artifacts to improve bug localization

    Get PDF
    Die Entwicklung von Softwaresystemen ist eine komplexe Aufgabe. Qualitätssicherung versucht auftretenden Softwarefehler (bugs) in Systemen zu vermeiden, jedoch können Fehler nie ausgeschlossen werden. Sobald ein Softwarefehler entdeckt wird, wird typischerweise ein Fehlerbericht (bug report) erstellt. Dieser dient als Ausgangspunkt für den Entwickler den Fehler im Quellcode der Software zu finden und zu beheben (bug fixing). Fehlerberichte sowie weitere Softwareartefakte, z.B. Anforderungen und der Quellcode selbst, werden in Software Repositories abgelegt. Diese erlauben die Artefakte mit trace links zur Nachvollziehbarkeit (traceability) zu verknüpfen. Oftmals ist die Erstellung der trace links im Entwicklungsprozess vorgeschrieben. Dazu zählen u.a. die Luftfahrt- und Automobilindustrie, sowie die Entwicklung von medizinischen Geräten. Das Auffinden von Softwarefehlern in großen Systemen mit tausenden Artefakten ist eine anspruchsvolle, zeitintensive und fehleranfällige Aufgabe, welche eine umfangreiche Projektkenntnis erfordert. Deswegen wird seit Jahren aktiv an der Automatisierung dieses Prozesses geforscht. Weiterhin wird die manuelle Erstellung und Pflege von trace links als Belastung empfunden und sollte weitgehend automatisiert werden. In dieser Arbeit wird ein neuartiger Algorithmus zum Auffinden von Softwarefehlern vorgestellt, der aktiv die erstellten trace links ausnutzt. Die Artefakte und deren Beziehungen dienen zur Erstellung eines Nachvollziehbarkeitsgraphen, welcher analysiert wird um fehlerhafte Quellcodedateien anhand eines Fehlerberichtes zu finden. Jedoch muss angenommen werden, dass nicht alle notwendigen trace links zwischen den Softwareartefakten eines Projektes erstellt wurden. Deswegen wird ein vollautomatisierter, projektunabhängiger Ansatz vorgestellt, der diese fehlenden trace links erstellt (augmentation). Die Grundlage zur Entwicklung dieses Algorithmus ist der typische Entwicklungsprozess eines Softwareprojektes. Die entwickelten Ansätze wurden mit mehr als 32.000 Fehlerberichten von 27 Open-Source Projekten evaluiert und die Ergebnisse zeigen, dass die Einbeziehung von traceability signifikant das Auffinden von Fehlern im Quellcode verbessert. Weiterhin kann der entwickelte Augmentation Algorithmus zuverlässig fehlende trace links erstellen.The development of software systems is a very complex task. Quality assurance tries to prevent defects – software bugs – in deployed systems, but it is impossible to avoid bugs all together, especially during development. Once a bug is observed, typically a bug report is written. It guides the responsible developer to locate the bug in the project's source code, and once found to fix it. The bug reports, along with other development artifacts such as requirements and the source code are stored in software repositories. The repositories also allow to create relationships – trace links – among contained artifacts. Establishing this traceability is demanded in many domains, such as safety related ones like the automotive and aviation industry, or in development of medical devices. However, in large software systems with thousands of artifacts, especially source code files, manually locating a bug is time consuming, error-prone, and requires extensive knowledge of the project. Thus, automating the bug localization process is actively researched since many years. Further, manually creating and maintaining trace links is often considered as a burden, and there is the need to automate this task as well. Multiple studies have shown, that traceability is beneficial for many software development tasks. This thesis presents a novel bug localization algorithm utilizing traceability. The project's artifacts and trace links are used to create a traceability graph. This graph is then analyzed to locate defective source code files for a given bug report. Since the existing trace link set of a project is possibly incomplete, another algorithm is prosed to augment missing links. The algorithm is fully automated, project independent, and derived from a project's development workflow. An evaluation on more than 32,000 bug reports from 27 open-source projects shows, that incorporating traceability information into bug localization significantly improves the bug localization performance compared to two state of the art algorithms. Further, the trace link augmentation approach reliably constructs missing links and therefore simplifies the required trace maintenance

    Supporting the grow-and-prune model for evolving software product lines

    Get PDF
    207 p.Software Product Lines (SPLs) aim at supporting the development of a whole family of software products through a systematic reuse of shared assets. To this end, SPL development is separated into two interrelated processes: (1) domain engineering (DE), where the scope and variability of the system is defined and reusable core-assets are developed; and (2) application engineering (AE), where products are derived by selecting core assets and resolving variability. Evolution in SPLs is considered to be more challenging than in traditional systems, as both core-assets and products need to co-evolve. The so-called grow-and-prune model has proven great flexibility to incrementally evolve an SPL by letting the products grow, and later prune the product functionalities deemed useful by refactoring and merging them back to the reusable SPL core-asset base. This Thesis aims at supporting the grow-and-prune model as for initiating and enacting the pruning. Initiating the pruning requires SPL engineers to conduct customization analysis, i.e. analyzing how products have changed the core-assets. Customization analysis aims at identifying interesting product customizations to be ported to the core-asset base. However, existing tools do not fulfill engineers needs to conduct this practice. To address this issue, this Thesis elaborates on the SPL engineers' needs when conducting customization analysis, and proposes a data-warehouse approach to help SPL engineers on the analysis. Once the interesting customizations have been identified, the pruning needs to be enacted. This means that product code needs to be ported to the core-asset realm, while products are upgraded with newer functionalities and bug-fixes available in newer core-asset releases. Herein, synchronizing both parties through sync paths is required. However, the state of-the-art tools are not tailored to SPL sync paths, and this hinders synchronizing core-assets and products. To address this issue, this Thesis proposes to leverage existing Version Control Systems (i.e. git/Github) to provide sync operations as first-class construct
    corecore