38 research outputs found

    Mining Software Repositories for Release Engineers - Empirical Studies on Integration and Infrastructures-as-Code

    Get PDF
    RÉSUMÉ Release engineering (Releng) est le processus de la mise en production logicielle des contributions des développeurs en un produit intégré livré aux utilisateurs. Ce processus consiste des phases d’intégration, de construction et des tests, de déploiement et de livraison, pour finalement entrer au marché. Alors que, traditionnellement, la mise en production prend plusieurs mois pour livrer un produit logiciel complet aux clients, la mise en production moderne vise à apporter de la valeur au client plus rapidement, dans l’ordre des jours ou des semaines, pour recevoir de rétroaction utile plus rapidement et pour minimiser le temps perdu sur des fonctionnalités échouées. De nos jours, une panoplie d’outils et de techniques a émergé pour soutenir la mise en production. Ils visent essentiellement à automatiser les phases dans le pipeline de la mise en production, ce qui réduit le travail manuel et rend le processus reproductible et rapide. Par exemple, Puppet est l’un des outils les plus populaires pour Infrastructure-as-Code (IaC), ce qui automatise le processus de mettre en place une nouvelle infrastructure (par exemple, une machine virtuelle ou un conteneur dans lequel une application peut être compilée, testée et déployée) selon des spécifications textuelles. IaC a évolué rapidement en raison de la croissance de l’infonuagique. Cependant, de nombreux problèmes existent encore pour la mise en production. Par exemple, alors que de nombreux outils de la mise en production gagnent en popularité, le choix de la technique la plus appropriée exige des praticiens d’évaluer empiriquement la performance des techniques dans un contexte réaliste, avec des données représentatives. Pire encore, à un niveau plus haut, les ingénieurs de la mise en production doivent analyser le progrès de l’organisation dans chaque phase de la mise en production, afin de savoir si la prochaine date de sortie peut être respectée ou si des obstacles se produisent. De nouveau, il n’y a pas de méthode cohérente et établie pour ce faire. Pour aider les praticiens à mieux analyser leur processus de la mise en production, nous explorons la façon selon laquelle la fouille de référentiels logiciels (Mining Software Repositories; MSR) est capable d’analyser le progrès d’une phase de la mise en production ou d’évaluer la performance d’un outil de mise en production. MSR agit sur les données stockées dans des référentiels de logiciels tels que les systèmes de gestion de versions, les référentiels de bogues ou des environnements de révision technique. Au lieu que les développeurs, les testeurs et les examinateurs utilisent ces référentiels juste pour enregistrer des données de développement (telles que les changements de code, rapports de bogues ou des révisions techniques), MSR rend ces données actionnables en les analysant. Par exemple, en faisant l’extraction de l’information des changements de code source et de rapports de bogues, on peut recréer l’ensemble du processus de développement d’un projet ou de la mise en production. De nos jours, de nombreux référentiels logiciels à source libre sont disponibles au public, offrant des possibilités pour l’analyse empirique de la mise en production en utilisant des technologies MSR. Dans cette thèse, on a fait des analyses MSR pour deux phases critiques de la mise en production, c.-à-d. l’intégration et le provisionnement de l’environnement d’un logiciel (avec IaC), sur plusieurs projets larges à source libre. Cette série d’études empiriques ciblait de comprendre le progrès du processus de la mise en production et d’évaluer la performance des outils de point. Nous nous sommes concentrés principalement sur ces deux phases parce qu’elles sont essentielles dans la mise en production, et un grand nombre de données est disponible pour elles. D’abord, nous avons constaté que la révision technique et l’intégration de changements de code sont impactées par de différents facteurs. Nos résultats suggèrent que les développeurs réussissent à faire passer leurs contributions à travers la révision technique plus rapidement en changeant moins de sous-systèmes à la fois et de diviser une grande contribution en plusieurs contributions plus petites. En outre, les développeurs peuvent faire accepter leurs contributions plus facilement et plus rapidement en participant davantage dans la communauté de développeurs à source libre et d’acquérir plus d’expérience dans des sous-systèmes similaires. Dans cette étude sur le noyau Linux, nous avons trouvé que l’un des défis majeurs de MSR dans le contexte de la mise en production logicielle est de relier les différents référentiels nécessaires. Par exemple, le noyau Linux ou le projet Apache HTTPD utilisent tous les deux des listes de diffusion pour effectuer le processus de révision technique. Les experts examinent des contributions par courriel, et un fil de courriels est utilisé pour ramasser toutes les différentes versions d’une contribution. Cependant, souvent un nouveau fil de discussion, avec sujet différent, est utilisé pour soumettre une nouvelle révision, ce qui signifie qu’aucun lien physique étroit n’existe entre toutes les révisions d’une contribution. En plus, les versions révisées d’une contribution non plus n’ont de lien physique avec la version acceptée dans le référentiel de gestion de versions, à moins qu’un identifiant de validation soit affiché dans un courriel. Surtout quand une contribution a été révisée plusieurs fois et a beaucoup évolué en comparaison avec la version initiale, le suivi à partir de sa toute première version est difficile à faire. Nous avons proposé trois approches de différente granularité et de rigueur différente, dans le but de récupérer les liens physiques entre les révisions de contributions dans le même fil de courriels. Dans notre étude, nous avons constaté que la technique au niveau des lignes individuelles fonctionne le mieux pour lier des contributions entre différents fils de courriels, tandis qu’une combinaison de cette approche avec celle à base de sommes de contrôle réalise la meilleure performance pour relier les contributions dans un fil de courriels avec la version finale dans le référentiel de gestion de versions. Être capable de reconstituer l’historique complet de contributions nous a permis d’analyser le progrès de la phase de révision du noyau Linux. Nous avons constaté que 25% des contributions acceptées prennent plus de quatre semaines pour leur révision technique. Deuxièmement, pour évaluer la capacité de MSR pour analyser la performance des outils de mise en production, nous avons évalué dans un projet commercial une approche d’intégration hybride qui combine les techniques de branchement et de “feature toggles”. Des branches permettent aux développeurs de travailler sur différentes fonctionnalités d’un système en parallèle, en isolation (sans impacter d’autres équipes), tandis qu’un feature toggle permet aux développeurs de travailler dans une branche sur différentes tâches en cachant des fonctionnalités sous développement avec des conditions “if” dans le code source. Au lieu de réviser leur processus d’intégration entièrement pour abandonner les branches et de passer aux feature toggles, l’approche hybride est un compromis qui tente de minimiser les risques des branches tout en profitant des avantages des feature toggles. Nous avons comparé la performance avant et après l’adoption de l’approche hybride, et avons constaté que cette structure hybride peut réduire l’effort d’intégration et améliorer la productivité. Par conséquent, l’approche hybride semble une pratique valable. Dans la phase de provisionnement, nous nous sommes concentrés sur l’évaluation de l’utilisation et de l’effort requis pour des outils populaires de “Infrastructure-as-Code” (IaC), qui permettent de spécifier les requis d’environnement dans un format textuel. Nous avons étudié empiriquement les outils IaC dans OpenStack et MediaWiki, deux projets énormément larges qui ont adopté deux des langues IaC actuellement les plus populaires: Puppet et Chef. Tout d’abord, nous avons comparé l’effort de maintenance lié à IaC avec celui du codage et des tests. Nous avons constaté que le code IaC prend une partie importante du système dans les deux projets et change fréquemment, avec de grands changements de code. Les changements de code IaC sont étroitement couplés avec les changements de code source, ce qui implique que les changements de code source ou des tests nécessitent des changements complémentaires au code source IaC, et pourrait causer un effort plus large de maintenance et de gestion de complexité. Cependant, nous avons également observé un couplage léger avec des cas de test IaC et les données de provisionnement, qui sont de nouveaux types d’artéfacts dans le domaine de IaC. Par conséquent, IaC peut nécessiter plus d’effort que les ingénieurs expectent. D’autres études empiriques devraient être envisagées. L’ingénierie de la mise en production moderne a développé rapidement, tandis que de nombreux nouvelles techniques et outils ont émergé pour le soutenir de différentes perspectives. Cependant, le manque de techniques pour comprendre le progrès des phases de la mise en production ou d’évaluer la performance d’outils de la mise en production rend le travail difficile pour les praticiens qui ont à maintenir la qualité de leur processus de mise en production. Dans cette thèse, nous avons mené une série d’études empiriques en utilisant des techniques de fouille des référentiels logiciels sur des données de larges projets à source libre, qui montrent que, malgré des défis, la technologie MSR peut aider les ingénieurs de la mise en production à mieux comprendre leur progrès et à évaluer le coût des outils et des activités de la mise en production. Nous sommes heureux de voir que notre travail a inspiré d’autres chercheurs pour analyser davantage le processus d’intégration, ainsi que la qualité du code IaC.---------- ABSTRACT Release engineering (Releng) is the process of delivering integrated work from developers as a complete product to end users. This process comprises the phases of Integration, Building and Testing, Deployment and Release to finally reach the market. While traditional software engineering takes several months to deliver a complete software product to customers, modern Release engineering aims to bring value to customer more quickly, receive useful feedback faster, and reduce time wasted on unsuccessful features in development process. A wealth of tools/techniques emerged to support Release engineering. They basically aim to automate phases in the Release engineering pipeline, reducing the manual labor, and making the procedure repeatable and fast. For example, Puppet is one of the most popular Infrastructure-as-Code (IaC) tools, which automates the process of setting up a new infrastructure (e.g., a virtual machine or a container in which an application can be compiled, tested and deployed) according to specifications. Infrastructure-as-Code has evolved rapidly due to the growth of cloud computing. However, many problems also come along. For example, while many Release engineering tools gain popularity, choosing the most suitable technique requires practitioners to empirically evaluate the performance of the technique in a realistic setting, with data mimicking their own setup. Even worse, at a higher level, release engineers need to understand the progress of each release engineering phase, in order to know whether the next release deadline can be met or where bottlenecks occur. Again, they have no clear methodology to do this. To help practitioners analyze their Release engineering process better, we explore the way of mining software repositories (MSR) on two critical phases of Releng of large open-source projects. Software repositories like version control systems, bug repositories or code reviewing environments, are used on a daily basis by developers, testers and reviewers to record information about the development process, such as code changes, bug reports or code reviews. By analyzing the data, one can recreate the process of how software is built and analyze how each phase of Releng applies in this project. Many repositories of open-source software projects are available publicly, which offers opportunities for empirical research of Release engineering. Therefore, we conduct a series of empirical studies of mining software repositories of popular open-source software projects, to understand the progress of Release engineering and evaluate the performance of state-of-the-art tools. We mainly focus on two phases: Integration and Provisioning (Infrastructure-as-Code), because these two phases are most critical in Release engineering and ample quantity data is available. In our empirical study of the Integration process, we evaluate how well MSR techniques based on version control and review data explain the major factors impacting the probability and time taken for a patch to be successfully integrated into an upcoming release. We selected the Linux kernel, one of the most popular OSS projects having a long history and a strict integration hierarchy, as our case study. We collected data from reviewing and integration tools of the Linux kernel (mailing lists and Git respectively), and extracted characteristics covering six dimensions. Then, we built models with acceptance/time as output and analyzed which characteristics have impact on the reviewing and integration processes. We found that reviewing and integration are impacted by different factors. Our findings suggest that developers manage to get their patch go through review phase faster by changing less subsystems at a time and splitting a large patch into multiple smaller patches. Also, developers can make patches accepted more easily and sooner by participating more in the community and gaining more experience in similar patches. In this study on the Linux kernel, we found that one major challenge of MSR is to link different repositories. For example, the Linux kernel and Apache project both use mailing lists to perform the reviewing process. Contributors submit and maintainers review patches all by emails, where usually an email thread is used to collect all different versions of a patch. However, often a new email thread, with different subject, is being used to submit a new patch revision, which means that no strong physical links between all patch revisions exist. On top of that, the accepted patch also does not have a physical link to the resulting commit in the version control system, unless a commit identifier is posted in an email. Especially when a patch has been revised multiple times and evolved a lot from the original version, tracking its very first version is difficult. We proposed three approaches of different granularity and strictness, aiming to recover the physical links between emails in the same thread. In the study, we found that a line-based technique works best to link emails between threads while the combination of line-based and checksum-based technique achieves the best performance for linking emails in a thread with the final, accepted commit. Being able to reconstruct the full history of a patch allowed us to analyze the performance of the reviewing phase. We found that 25% of commits have a reviewing history longer than four weeks. To evaluate the ability of MSR to analyze the performance of Releng tools, we evaluated a hybrid integration approach, which combines branching and toggling techniques together, in a commercial project. Branching allows developers to work on different branches in parallel, while toggling enables developers on the same branch on different tasks. Instead of revising their whole integration process to drop branching and move to toggling, hybrid toggling is a compromise that tries to minimize the risks of branching while enjoy the benefits of toggling. We compared the performance before and after adopting hybrid toggling, and found that this hybrid structure can reduce integration effort and improve productivity. Hence, hybrid toggling seems a worthwhile practice. In the Provisioning phase, we focus on evaluating the usage and effort of the popular tools used in modern Release engineering: Infrastructure-as-Code (IaC). We empirically studied IaC tools in OpenStack and MediaWiki, which have a huge code base and adopt two currently most popular IaC languages: Puppet and Chef. First, we study maintenance effort related to the regular development and testing process of OpenStack, then compare this to IaC-related effort in both case studies. We found that IaC code takes a large proportion in both projects and it changes frequently, with large churn size. Meanwhile, IaC code changes are tightly coupled with source code changes, which implies that changes to source or test code require accompanying changes to IaC, which might lead to increased complexity and maintenance effort. Furthermore, the most common reason for such coupling is “Integration of new modules or service”. However, we also observed IaC code has light coupling with IaC test cases and test data, which are new kinds of artifacts in IaC domain. Hence, IaC may take more effort than engineers expect and further empirical studies should be considered. Modern Release engineering has developed rapidly, while many new techniques/tools emerge to support it from different perspectives. However, lack of knowledge of the current Release engineering progress and performance of these techniques makes it difficult for practitioners to sustain high quality Releng approach in practice. In this thesis, we conducted a series of empirical studies of mining software repositories of large open-source projects, that show that, despite some challenges, MSR technology can help release engineers understand better the progress of and evaluate the cost of Release engineering tools and activities. We are glad to see our work has inspired other researchers to further analyze the integration process as well as the quality of IaC code

    Investigating Modern Release Engineering Practices

    Get PDF
    Modern release engineering has moved from longer release cycles and separate development and release teams to a continuous and integrated process. However, release engineering practices include not only integration, build and test execution but also a better management of features. The goal of this research is to investigate the modern release engineering practices which cover four milestones in the field of release engineering, i. understanding rapid release by measuring the time and effort involved in release cycles, ii. feature management based on feature toggles iii. the impact of toggles on the system architecture, and iv. the quality of builds that contain ignored failing and flaky tests. This thesis is organized as a “manuscript” thesis whereby each milestone constitutes an accepted or submitted paper. First, we investigate the rapid release model for two major open source software projects. We quantify the time and effort which is involved in both the development and stabilization phases of a release cycle where, we found that despite using the rapid release process, both the Chrome Browser and the Linux Kernel have a period where developers rush changes to catch the current release. Second, we examine feature management based on feature toggles which is a widely used technique in software companies to manage features by turning them on/off during development as well as release periods. Developers typically isolate unrelated/unreleased changes on branches. However, large companies, such as Google and Facebook do their development on single branch. They isolate unfinished features using feature toggles that allow them to disable unstable code. Third, feature toggles provide not only a better management of features but also keep modules isolated and feature oriented which makes the architecture underneath the source code readable and iiieasily extractable. As the project grows, modules keep accepting features and features cross-cut into the modules. We found that the architecture can be easily extracted based on feature toggles and provides a different view compared to the traditional modular representations of software architecture. Fourth, we investigate the impact of failing tests on the quality of builds where we consider browser-crash as a quality factor. In this study we found that ignoring failing and flaky tests leads to dramatically more crashes than builds with all tests passing

    Evolution of Integration, Build, Test, and Release Engineering Into DevOps and to DevSecOps

    Get PDF
    Software engineering operations in large organizations are primarily comprised of integrating code from multiple branches, building, testing the build, and releasing it. Agile and related methodologies accelerated the software development activities. Realizing the importance of the development and operations teams working closely with each other, the set of practices that automated the engineering processes of software development evolved into DevOps, signifying the close collaboration of both development and operations teams. With the advent of cloud computing and the opening up of firewalls, the security aspects of software started moving into the applications leading to DevSecOps. This chapter traces the journey of the software engineering operations over the last two to three decades, highlighting the tools and techniques used in the process

    Aligned and collaborative language-driven engineering

    Get PDF
    Today's software development is increasingly performed with the help of low- and no-code platforms that follow model-driven principles and use domain-specific languages (DSLs). DSLs support the different aspects of the development and the user's mindset by a tailored and intuitive language. By combining specific languages with real-time collaboration, development environments can be provided whose users no longer need to be programmers. This way, domain experts can develop their solution independently without the need for a programmer's translation and the associated semantic gap. However, the development and distribution of collaborative mindset-supporting IDEs (mIDEs) is enormously costly. Besides the basic challenge of language development, a specialized IDE has to be provided, which should work equally well on all common platforms and individual heterogeneous system setups. This dissertation describes the conception and realization of the web-based, unified environment CINCO Cloud, in which DSLs can be collaboratively developed, used, transformed and executed. By providing full support at all steps, the philosophy of language-driven engineering is enabled and realized for the first time. As a foundation for the unified environment, the infrastructure of cloud development IDEs is analyzed and extended so that new languages can be distributed on-the-fly. Subsequently, concepts for language specialization, refinement and concretization are developed and described to realize the language-driven engineering approach, in a dynamic cluster-based environments. In addition, synchronization mechanisms and authorization structures are designed to enable collaboration between the users of the environment. Finally, the central aligned processes within the CINCO Cloud for developing, using, transforming and executing a DSL are illustrated to clarify how the dynamic system behaves

    Simplifying Release Engineering for Multi-Stacked Container-Based Services

    Get PDF
    Today, large and complex solutions are needed to provide the services that users require, where cloud-based solutions have been the panacea to help solve this problem. However, these solutions, are complex in the nature of their implementation, and the need for a standardized way of handling the services are in order. This thesis aims to explore the possibilities of simplifying release engineering processes, with the usage of multi-stacked container-based services. A model is designed with the goal of reducing the complexity in release engineering processes. It enables restriction of possible outcomes by enabling constraints to specify the wanted state of an environment, and enforces a single method approach towards achieving a more uniform environment. The model has been implemented in a prototype, which enables the documentation, configuration and orchestration of the deployed services, that are deployed with the usage of Docker containers. Through the implementation, the validity of the designed model is verified, and the complexity of the release engineering processes are reduced

    Ohjelmistokehityssyklien kiihdytys osana julkaisutiheyden kasvattamista ohjelmistotuotannossa

    Get PDF
    In recent years, companies engaged in software development have taken into use practices that allow the companies to release software changes almost daily to their users. Previously, release frequency for software has been counted in months or even years so the leap to daily releases can be considered big. The underlying change to software development practices is equally large, spanning from individual development teams to organizations as a whole. The phenomenon has been framed as continuous software engineering by the software engineering research community. Researchers are beginning to realize the impact of continuous software engineering to existing disciplines in the field. Continuous software engineering can be seen to touch almost every aspect of software development from the inception of an idea to its eventual manifestation as a release to the public. Release management or release engineering has become an art in itself that must be mastered in order to be effective in releasing changes rapidly. Empirical studies in the area should be helpful in further exploring the industry-driven phenomenon and understanding the effects of continuous software engineering better. The purpose of this thesis is to provide insight into the habit of releasing software changes often that is promoted by continuous software engineering. There are three main themes in the thesis. A main theme in the thesis is seeking an answer to the rationale of frequent releases. The second theme focuses on charting the software processes and practices that need to be in place when releasing changes frequently. Organizational circumstances surrounding the adoption of frequent releases and related practices are highlighted in the third theme. Methodologically, this thesis builds on a set of case studies. Focusing on software development practices of Finnish industrial companies, the thesis data has been collected from 33 different cases using a multiple-case design. Semi-structured interviews were used for data collection along with a single survey. Respondents for the interviews included developers, architects and other people involved in software development. Thematic analysis was the primary qualitative approach used to analyze the interview responses. Survey data from the single survey was analyzed with quantitative analysis. Results of the thesis indicate that a higher release frequency makes sense in many cases but there are constraints in selected domains. Daily releases were reported to be rare in the case projects. In most cases, there was a significant difference between the capability to deploy changes and the actual release cycle. A strong positive correlation was found between delivery capability and a high degree of task automation. Respondents perceived that with frequent releases, users get changes faster, the rate of feedback cycles is increased, and product quality can improve. Breaking down the software development process to four quadrants of requirements, development, testing, and operations and infrastructure, the results suggest continuity is required in all four to support frequent releases. In the case companies, the supporting development practices were usually in place but specific types of testing and the facilities for deploying the changes effortlessly were not. Realigning processes and practices accordingly needs strong organizational support. The responses imply that the organizational culture, division of labor, employee training, and customer relationships all need attention. With the right processes and the right organizational framework, frequent releases are indeed possible in specific domains and environments. In the end, release practices need to be considered individually in each case by weighing the associated risks and benefits. At best, users get to enjoy enhancements quicker and to experience an increase in the perceived value of software sooner than would otherwise be possible.Ohjelmiston julkaisu on eräänlainen virstanpylväs ohjelmiston kehityksessä, jossa ohjelmiston uusi versio saatetaan loppukäyttäjille käyttöön. Julkaistu versio voi sisältää ohjelmistoon uusia toiminnallisuuksia, korjauksia tai muita päivityksiä. Ohjelmiston julkaisutiheys säätelee kuinka tiheästi uusia versioita julkaistaan käyttäjille. Ohjelmistojen julkaisutiheys voi vaihdella sovelluksesta ja toimintaympäristöstä riippuen. Kuukausien tai vuosien pituinen julkaisuväli ei ole alalla tavaton. Viime vuosina tietyt ohjelmistoalalla toimivat yritykset ovat ottaneet käyttöön jatkuvan julkaisemisen malleja, joilla pyritään lyhentämään julkaisuvälejä kuukausista aina viikkoihin tai päiviin. Jatkuvan julkaisemisen mallien käyttöönotolla on merkittäviä vaikutuksia niin ohjelmistokehitysmenetelmiin kuin työn sisäiseen organisointiin. Jatkuvan julkaisun mallien myötä julkaisunhallinnasta on tullut keskeinen osa ohjelmistokehitystä. Väitöstyössä käsitellään julkaisutiheyden kasvattamiseen liittyviä kysymyksiä kolmen eri teeman alla. Työn ensimmäinen teema keskittyy julkaisutiheyden kasvattamisen tarkoitusperien ymmärtämiseen. Toisessa teemassa suurennuslasin alla ovat ohjelmistokehityksen käytänteet, jotka edesauttavat siirtymistä kohti jatkuvaa julkaisua. Kolmannessa teemassa huomion kohteena ovat työn organisointiin ja työkulttuurin muutokseen liittyvät seikat siirryttäessä jatkuvaan julkaisuun. Väitöstyössä esitettyihin kysymyksiin on haettu vastauksia tapaustukimusten avulla. Tapaustutkimusten kohteena ovat olleet suomalaiset ohjelmistoalan yritykset. Tietoja on kerätty haastattelu- ja kyselytutkimuksin yli kolmestakymmennestä tapauksesta. Tutkimusten tulosten perusteella julkaisutiheyden kasvattamiselle on edellytyksiä monessa ympäristössä, mutta kaikille toimialoille se ei sovellu. Yleisesti ottaen tiheät julkaisut olivat harvinaisia. Monessa tapauksessa havaittiin merkittävä ero julkaisukyvykkyyden ja varsinaisen julkaisutiheyden välillä. Julkaisukyvykkyys oli sitä parempi, mitä pidemmälle sovelluskehityksen vaiheet olivat automatisoitu. Jatkuvan julkaisun käyttöönotto edellyttää vahvaa muutosjohtamista, työntekijöiden kouluttamista, organisaatiokulttuurin uudistamista sekä asiakassuhteiden hyvää hallintaa. Parhaassa tapauksessa tiheät julkaisut nopeuttavat niin muutosten toimittamista käyttäjille kuin palautesyklejä sekä johtavat välillisesti parempaan tuotelaatuun
    corecore